13 minute read


The third chapter to contain lab assignments is Chapter 5: “IDA PRO”. As the name suggests, this chapter discusses the use of the Interactive Disassembler tool. IDA’s true power comes from its interactive ability, and the book gives tips and tricks to assist in performing analysis with IDA. Some of the things discussed are: the IDA Pro interface and how to navigate around it, useful windows for analysis, utilizing the power of cross-references, named constants and redefining code and data. There is one lab assignment in this chapter that contains a total of 21 questions. I will be using the latest free version of IDA Pro (version 8.0.220802 at the time of writing) for this lab.

Lab 5-1

Analyze the malware found in the file Lab05-01.dll using only IDA Pro. The goal of this lab is to give you hands-on experience with IDA Pro. If you’ve already worked with IDA Pro, you may choose to ignore these questions and focus on reverse-engineering the malware.

1. What is the address of DllMain?
Answer: 1000D02E.

2. Use the Imports window to browse to gethostbyname. Where is the import located?
Answer: 100163CC.

3. How many functions call gethostbyname?
Answer: If we open the xrefs window for “gethostbyname” by pressing X in IDA, we can see a total of 18 function calls to it.

4. Focusing on the call to gethostbyname located at 0x10001757, can you figure out which DNS request will be made?
Answer: It will make a DNS request to pics.practicalmalwareanalysis.com.

5. How many local variables has IDA Pro recognized for the subroutine at 0x10001656?
Answer: If we browse to the subroutine at 0x10001656 we can see that IDA has recognized 23 local variables. We can tell that these are local variables because of their negative offset to EBP.

6. How many parameters has IDA Pro recognized for the subroutine at 0x10001656?
Answer: Just one (lpThreadParameter). We know that this is a parameter because of its positive offset to EBP.

7. Use the Strings window to locate the string \cmd.exe /c in the disassembly. Where is it located?
Answer: Memory address 10095B34.

8. What is happening in the area of code that references \cmd.exe /c?
Answer: There’s only one area of code that references this string. It is the function located at 1000FF58:
Within this function, we can see that an offset to the string is being pushed onto the stack:
I then looked at a graph view of the function to get a global idea of what it is doing. The entire graph is way too big to post here, but here are some interesting strings that I found:

  • closesocket
  • minstall
  • robotwork
  • cd
  • mhost
  • inject
  • svchost.exe
  • Get Install Way
  • xinstall.log
  • Detect VM
  • Inject To Process Sucessfully
  • Robot_Worktime
  • Machine IdleTime:

Oh, and then there is this:
Now we know that the code that references to the string is responsible for setting up a remote shell. Knowing this, I changed the name of the function to “setupRemShell”:

9. In the same area, at 0x100101C8, it looks like dword_1008E5C4 is a global variable that helps decide which path to take. How does the malware set dword_1008E5C4? (Hint: Use dword_1008E5C4’s cross-references.)
Answer: At 100101C8 we can see that the value in the ebx register gets compared to dword_1008E5C4. If the values are equal, the offset to \cmd.exe /c gets pushed onto the stack. If the values are not equal, the offset to \command.exe /c gets pushed onto the stack:

If we look at the xrefs to dword_1008E5C4, the first entry is of our interest:

This mov instruction copies whatever is stored in the eax register into the dword_1008E5C4. Let’s jump to this instruction and see what else we can find.
Ok, from the snippet above we cannot tell what exactly is being stored into eax yet. Lets look at the call sub_10003695 instruction right above the mov instruction and see if that tells us anything:
This function is using the OSVERSIONINFOA structure of the Win32 API. It looks like it is loading the OS version of the victim and comparing this to a PlatformID of 2. If we look up the values of PlatformID, we learn that it is checking if the OS version is Windows NT or later:
Based on this, the malware decides which path to take.

With this new information, I changed the name of the sub_10003695 function to “checkOSversion”:

10. A few hundred lines into the subroutine at 0x1000FF58, a series of comparisons use memcmp to compare strings. What happens if the string comparison to robotwork is successful (when memcmp returns 0)?
Answer: If the string comparison to robotwork is succesful and memcmp returns 0, we will not take the conditional jump at 100145C (Jump if not zero). This means that we follow the red arrow:

The red arrow leads us to a new function, located at memory address 100052A2:

When we view this function, it looks like it is accessing registry keys at SOFTWARE\Microsoft\Windows\CurrentVersion:
After looking at the graph of this function, we can assume that it accesses SOFTWARE\Microsoft\Windows\CurrentVersion\WorkTimes and SOFTWARE\Microsoft\Windows\CurrentVersion\WorkTime.

11. What does the export PSLIST do?
Answer: PSLIST is located at 10007025. If we jump here, we can see that it takes a path based on the result of the sub_100036C3 function:
If we look at this sub_100036C3 function, we can see that the first block of instructions looks awfully similar to a function we investigated for question 9 (which we renamed to “checkOSversion”):
So, similar to the “checkOSversion” function, it first checks if the victim is running an operating system of Windows NT or later. If this is the case, the zero flag (ZF) will be set (ZF=1). If the victim is running an older version, ZF will not be set (ZF=0). If ZF is set to 0, we jump to 100036FA and return the value 0 (xor eax, eax gives 0 as a result). If ZF is set to 1 (victim OS is running Windows NT or later), we follow the first red arrow (memory location 100036EC), and another check is performed:
This check compares the “dwMajorVersion” of the victim to the value 5. If we go back to the Microsoft documentation on the OSVERSIONINFOA structure of the Win32 API, there is a table that explains what this value stands for:
A “dwMajorVersion” of 5 means that the victim is running either Windows 2000, Windows XP, Windows Server 2003, or Windows Server 2003 R2. There is a jump below instruction. If the “dwMajorVersion” is less than 5, we jump to 100036FA and return the value 0 (xor eax, eax gives 0 as a result). If the “dwMajorVersion” is not below 5 (so 5 or greater), we follow the second red arrow, where the value 1 is pushed onto the stack and popped into the eax register. So the return value is 1 instead:

Returning from the call to sub_100036C3, we are now back in the PSLIST function at memory address 10007034:

The “test eax, eax” instruction basically compares the value in the eax register to 0. If the value in eax is equal to 0, ZF is set (ZF=1). If the value in eax is not equal to 0, ZF is not set (ZF=0). From the previous function call right before this “test eax, eax” instruction, we learned the following:

  • If the victim OS is not below “dwMajorVersion 5” (equal or later than Windows 2000, Windows XP, Windows Server 2003, or Windows Server 2003 R2), the value 1 gets placed in the eax register.
  • If the victim OS is below “dwMajorVersion 5”, the value 0 gets placed in the eax register.

After the “test eax, eax” instruction there is a Jump if Zero instruction at 1007036. If the value in eax was 0 (victim OS below “dwMajorVersion 5”), we jump to 1000705B. There does not seem to be anything interesting happening here. But if the value in the eax register was 1 (victim OS not below “dwMajorVersion 5”), there are 2 different paths that we can take. These paths call either function sub_10006518 or sub_1000664C:
Looking at these functions, they seem to get a list of processes running on the victim computer. Here are some snippets of the function graphs that indicate this (the entire graph is too big to post here):


12. Use the graph mode to graph the cross-references from sub_10004E79. Which API functions could be called by entering this function? Based on the API functions alone, what could you rename this function?
Answer: The API functions in this function can be seen in the graph below. The most interesting one is GetSystemDefaultLangID. It looks like this function sends the victim’s language ID to the malware operator.
We could rename this function to “getLanguageID”.

13. How many Windows API functions does DllMain call directly? How many at a depth of 2?
Answer: To get the Windows API functions that DllMain directly calls, I created a graph with a recursion depth of 1.
We can see a total of 4 API functions being called directly by DllMain:
At a recursion depth of 2, the graph becomes a lot bigger. There is a total of 33 API functions being called. Some of the interesting ones are: Sleep, gethostbyname, closesocket, WinExec, send, recv, socket.

14. At 0x10001358, there is a call to Sleep (an API function that takes one parameter containing the number of milliseconds to sleep). Looking backward through the code, how long will the program sleep if this code executes?
Answer: Going backwards through the code, we can see that the number of milliseconds to sleep gets loaded in the eax register just before calling the API function:
Let’s see what happens in this loc_10001341 section:

  • mov eax, off_10019020
    • First, the string [This is CTI]30 is copied into EAX
  • add eax, 0Dh
    • 0xD is 13 in decimal. It took me quite some time to figure this out, but 13 gets added to eax here, which apparently means that it now points at 30 since the text [This is CTI] is 13 characters long
  • push eax
    • The string “30” gets pushed onto the stack
  • call ds:atoi
    • The atoi function is called, meaning the string “30” will get converted to the integer 30
  • imul eax, 3E8h
    • 0x3E8 is 1000 in decimal. The value in eax (30) gets multiplied by 1000 and stored in eax
  • pop ecx
    • The value at the top of the stack gets popped into the ecx register
  • push eax
    • The value in the eax register (which is now 30000) gets pushed onto the stack
  • call ds:Sleep
    • The Sleep API function is called

Now we know that the program will sleep for 30000 milliseconds (30 seconds)!

15. At 0x10001701 is a call to socket. What are the three parameters?
Answer: The three parameters are: 6, 1, and 2:

16. Using the MSDN page for socket and the named symbolic constants functionality in IDA Pro, can you make the parameters more meaningful? What are the parameters after you apply changes?
Answer: After looking at the Microsoft documentation for the socket function, we learn that:

  • The TCP protocol (6) is being used

  • The type of socket is SOCK_STREAM (1)

  • The address family of the socket is IPv4 (2)

We can make the parameters more meaningful by renaming them to the correct named symbolic constants in IDA:

17. Search for usage of the in instruction (opcode 0xED). This instruction is used with a magic string VMXh to perform VMware detection. Is that in use in this malware? Using the cross-references to the function that executes the in instruction, is there further evidence of VMware detection?
Answer: I searched for all occurences of 0xED by pressing ALT+B:
This leads to 157 results, but only one of these results contains the actual “in” instruction, located at 100061DB:
Jumping to the memory location, we do see mentions of the “VMXh” magic string. This indicates that VMware detection is being used by the malware.
The function that uses this “in” instruction is sub_10006196. There are 3 xrefs to this function:
Looking at the first xref, we can already tell that VM detection is being used by the malware: ![]/assets/images/Pasted image 20220915113023.png)

18. Jump your cursor to 0x1001D988. What do you find?
Answer: Jumping to this address reveals random byte data:

19, 20 & 21. If you have the IDA Python plug-in installed (included with the commercial version of IDA Pro), run Lab05-01.py, an IDA Pro Python script provided with the malware for this book. (Make sure the cursor is at 0x1001D988.) What happens after you run the script?
With the cursor in the same location, how do you turn this data into a single ASCII string?
Open the script with a text editor. How does it work?

Answer: Questions 19, 20, and 21 require the commercial version of IDA Pro. Since I am using the free version of IDA, I will be doing this part manually. This is what the Python script looks like:
The script goes through each byte and XORs it with 0x55 to decode them, so I copied all of the bytes from question 18 into notepad (80 lines):

And pasted them in CyberChef to XOR by 0x55:

Woops. That output wasn’t very useful. Looking back at the questions, it looks like this needed to be converted to an ASCII string…

So let’s first convert our bytes to ASCII values in IDA:
After getting rid of some of the junk, we can construct a message that is actually readable:
The actual message should’ve been: xdoor is this backdoor, string decoded for Practical Malware Analysis Lab :)1234. That would’ve been a lot easier if we could run the Python script from within IDA, but we got close to the expected result!

Comparing my answers to the Lab 5-1 solutions

Conclusions after comparison:

  • For question 3, we answered that there are a total of 18 function calls to gethostname. This is somewhat correct, but it is important to note that IDA double-counted the xrefs here. The “p” type is a reference because the function is being called, and the “r” type is a reference because it is a “read” reference. So there is a total of 9 cross-references for “gethostbyname” in this program. Anyhow, we did not read this question correctly. The question was how many FUNCTIONS call “gethostbyname”, not how many times “gethostbyname” gets called. If we look closely, we see that “gethostbyname” is called by 5 different functions (see the color differences for each function below):

  • All other solutions look similar to the answers that we provided. Awesome!

Sources