9 minute read


Chapter 6 is all about recognizing C code constructs in x86 assembly. A code construct defines a functional property within code but not the details of its implementation. Examples of code contructs are: loops, if statements, switch statements, and more. As a malware analyst, you must be able to obtain a high-level picture of code functionality by analyzing instructions as groups, focusing on individual instructions only as needed. This skill takes time to develop, and recognizing how C code constructs look like in x86 assembly will prevent you from getting lost in details.

Lab 6-1

In this lab, you will analyze the malware found in the file Lab06-01.exe.

1. What is the major code construct found in the only subroutine called by main?
Answer: The sub_401000 function that gets called by main appears to be a simple if statement:
The call to ds:InternetGetConnectedState will likely have its return value stored into EAX. After the call, we see that the value in EAX gets copied into [ebp+var_4]. Then, 0 gets compared to the value stored in [ebp+var_4]. If the value in [ebp+var_4] is 0, we jump to loc_40102B (Error 1.1: No Internet). If the value in [ebp+var_4] is not 0, we jump to 0041017 (Success: Internet Connection).

2. What is the subroutine located at 0x40105F?
Answer: Looking at the xrefs to 0x40105F, we see only two results:
Both results call sub_40105F right after pushing a string onto the stack. Viewing sub_40105F itself does not give me any additional information:

Based on the fact that this function is only called after pushing a string onto the stack, I’m going to assume that this is a printf function.

3. What is the purpose of this program?
Answer: The program checks if there’s an active internet connection. If there is an internet connection, it returns 1 (see the mov eax, 1 instruction @ 0041024). If there is no internet connection, it returns 0 (see the xor eax, eax instruction @ 00401038).

Comparing my answers to the Lab 6-1 solutions

Conclusions after comparison:

  • No differences in answers here, yay :)

Lab 6-2

Analyze the malware found in the file Lab06-02.exe.

1. What operation does the first subroutine called by main perform?
Answer: This subroutine is exactly the same as the first subroutine in Lab 6-1. It is an if statement that checks if there is an active internet connection.

2. What is the subroutine located at 0x40117F?
Answer: Same as question 2 in Lab 6-1. This is an unlabeled printf function.

3. What does the second subroutine called by main do?
Answer: It is using imports from WININET.dll to read a command from http://www.practicalmalwareanalysis.com/cc.htm as can be seen in the snippets below:

4. What type of code construct is used in this subroutine?
Answer: After the call to InternetReadFile, we see a bunch of cmp instructions being performed:
These cmp instructions compare 1 value at a time to the value in Buffer, var_20F, var20E, and var_20D. It looks like Buffer is an array of characters. The program is checking

<!--

against the first 4 characters in the Buffer array. This is the start of a HTML comment. If the first four characters in Buffer[] match the start of a HTML comment, the 5th character in Buffer[] will be moved into AL.

5. Are there any network-based indicators for this program?
Answer: The URL http://www.practicalmalwareanalysis.com/cc.htm and the user agent Internet Explorer 7.5/pma can be used as a network-based indicator.


6. What is the purpose of this malware?
Answer: It first checks if there is an active internet connection, if this is the case, it will open the URL http://www.practicalmalwareanalysis.com/cc.htm and read a HTML comment from it. If it succesfully reads the HTML comment, it will print the command that was read and sleep for a total of 60 seconds.

Comparing my answers to the Lab 6-2 solutions

Conclusions after comparison:

  • The buffer array is of size 512 and has not been properly labeled by IDA. To fix this, we first need to press [CTRL+K] within the function. Then, we need to right click on Buffer and convert it to an Array:


    • Going back to the function, we see that it is now properly labeled:
  • The technique of hiding commands in HTML comments is used by attackers to send commands to malware while having the malware appear as if it were going to a normal web page.

Lab 6-3

In this lab, we’ll analyze the malware found in the file Lab06-03.exe.

1. Compare the calls in main to Lab 6-2’s main method. What is the new function called from main?
Answer: The only new function called from main is sub_401130:

2. What parameters does this new function take?
Answer: The function takes two parameters: argv and the char parsed from the HTLM comment.

3. What major code construct does this function contain?
Answer: As we can already see from the comments added by IDA, this is a switch statement that uses jump tables. We can confirm this ourselves by looking at the instructions. At memory address 00401146, var_8 gets compared to 4.
If var_8 is above 4, we jump to def_401153 and the text “Error 3.2: Not a valid command provided” is printed:
This seems to be the default case of the jumptable. If var_8 is not above 4, the variable is used as an index into the jump table, which defines offsets to additional memory locations as we can see in the screenshots below.

In this example, the ECX register contains the switch variable, and 0x61 (‘a’ in ASCII) is subtracted from it at memory address 00401140. The jump table shows us that there is a total of 5 different switch cases. The assembly code must adjust this so that it goes from 0 through 4, so that the jump table can be properly indexed. The jump instruction at memory address 00401153 is where the target is based on the jump table. In this jump instruction (00401153), EDX is multiplied by 4 and added to the base of the jump table to determine which case code block to jump to. It is multiplied by 4 because each entry in the jump table is an address that is 4 bytes in size.

4. What can this function do?
Answer: If we look at each case in the switch statement, there are multiple things that the function can do:

  1. Create the directory C:\Temp

  2. Copy the file C:\Temp\cc.exe

  3. Delete the file C:\Temp\cc.exe

  4. Place a new registry value in Software\Microsoft\Windows\CurrentVersion\Run named Malware which contains the path to C:\Temp\cc.exe

  5. Sleep for a total of 100 seconds

5. Are there any host-based indicators for this malware?
Answer: The host-based indicators are:

  • The file location: C:\Temp\cc.exe
  • The registry key: Software\Microsoft\Windows\CurrentVersion\Run\Malware

6. What is the purpose of this malware?
Answer: It first checks if there is an active internet connection, if this is the case, it will open the URL http://www.practicalmalwareanalysis.com/cc.htm and read a HTML comment from it. If it succesfully reads the HTML comment, the first character of the comment is used in the switch statement. Based on this character, the malware either: creates a directory, copies a file, deletes a file, sets a registry key, sleeps for 100 seconds, or prints an error message.

Comparing my answers to the Lab 6-3 solutions

Conclusions after comparison:

  • The comparison to the number 4 at 00401146 checks if the command character is a, b, c, d, or e. Any other result will force the ja instruction to jump to the error (default case).

Lab 6-4

In this lab, we’ll analyze the malware found in the file Lab06-04.exe.

1. What is the difference between the calls made from the main method in Labs 6-3 and 6-4?
Answer: The function calls appear to be the same, but it seems like a loop was added to the main method. Notice the upward arrow from loc_401251 to loc_40125A (bottom left):

2. What new code construct has been added to main?
Answer: A for loop has been added. We enter this for loop if we have an active internet connection. For loops always have four components:

  • Initialization, can be seen at 00401248 (var_C is set to 0, which could indicate the “for(i=0” part of a for loop”):

  • Comparison, can be seen after the jump to loc_40125 is taken. If the counter is greater than or equal to 0x5A, the loop will end:

  • Execution instructions, while we’re in the for loop, we see multiple execution instructions. Such as the HTML comment being parsed and the execution of the switch statement (I renamed these functions to make analysis easier):

  • Increment or decrement, we see that var_C gets copied into the EAX register, EAX gets incremented, and the new value gets copied back into var_C. After this, we go back to the compare at loc_40125A (upward arrow shown in question 1 of this lab). Because of all these factors, we know that this is a for loop.

3. What is the difference between this lab’s parse HTML function and those of the previous labs?
Answer: Instead of user agent Internet Explorer 7.5/pma, the user agent Internet Explorer 7.50/pma%d is being used.
The fact that the parse HTML function is now in a for loop, and there being a %d format specifier indicates that it will use a different user agent for each attempt it makes to parse the HTML comment.

4. How long will this program run? (Assume that it is connected to the Internet.)
Answer: From the comparison in the for loop, we know that there are 1440 cycles (0x5A0 is 1440 in decimal). Since the program sleeps for 60 seconds after each call to the switch statement, we know that the program will run for at least 86400 seconds (24 hours). There is a switch case that makes the program sleep for another 100 seconds, if this case is executed the program could run longer.

5. Are there any new network-based indicators for this malware?
Answer: Yes, the user agents that keep changing (Internet Explorer 7.50/pma%d).

6. What is the purpose of this malware?
Answer: Same as before, but now using an unique user agent for each attempt to parse a HTML comment. The program will run for at least 24 hours.

Comparing my answers to the Lab 6-4 solutions

Conclusions after comparison:

  • The mechanism of changing the user agent for each time that the for loop counter increases is used so that an attacker can monitor how long the malware has been running.

Sources