Learning of malware analysis. Solving labs from the "Recognizing C Code Constructs in Assembly" chapter from the "Practical Malware Anlysis" book

Hi there!

The topic of the labs is fascinating. Recognizing C code constructs in Assembly is useful in malware analysis without any doubt. Therefore I am not going to use Ghidra disassembler since I would like to improve my skills in reading Assembly code, but in the real scenario, I would probably use Ghidra + IDA stack to analyze exemplary malware more quickly. Now I'm inviting you to deep into different malware examples and maybe learn something new with me. As we all know, the best way to learn something is to do this through fun, so I wish you a lot of fun while the malware analysis process just before we start. :)



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Lab 6-1
In this lab, you will analyze the malware found in the file Lab06-01.exe.

Questions:

1. What is the major code construct found in the only subroutine called by main?

2. What is the subroutine located at 0x40105F?

3. What is the purpose of this program?

My answers:

1) What is the major code construct found in the only subroutine called by main?

I was looking at this malware on Windows 10 using IDA Free version 7+, but this tool didn't find the main function. For the purpose of this lab, I've decided to install IDA 5.0 on Windows XP virtual machine to check if the older version of IDA is able to find the main entry inside the executable written for Windows XP. As you can see the older IDA version can find main in our executable:



The only subroutine called by main is sub_401000 obviously. Let's jump inside it and see what happens:



It looks like the major code construct in this subroutine is the if statement occurring after the InternetGetConnectedState call. From MSDN we can learn that this function "retrieves the connected state of the local system" and it returns TRUE if at least one Internet connection is possible and FALSE if the local system cannot establish an Internet connection. So the, if construct, is like this in this case:



The above if statement is the major code construct inside the subroutine.

2) What is the subroutine located at 0x40105F?

The subroutine from the address 0x40105F is called in the if statement when the code takes TRUE branch as well as FALSE.


The key to understanding the goal of this function is to investigate what is going on within the sub_401282 subroutine. But after jumping directly into this function I was shocked by the size of its code. Instead of wasting time on the exact analysis, I'm going to check what is going on after running this executable from the console.


I've run it on my main machine where there is no Internet connection indeed. As you can see sub_401282 is primarily responsible for printing its format string arguments to the console. For now on I would guess that this subroutine is simply printf function, but IDA doesn't recognize it - this situation might happen if an executable is statically linked. Therefore, I've renamed this subroutine to printf.

3) What is the purpose of this program?

If the subroutine is indeed printf then this program gives the information if any Internet connection is possible on the local system.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Lab 6-2
Analyze the malware found in the file Lab06-02.exe.

Questions:

1. What operation does the first subroutine called by main perform?

2. What is the subroutine located at 0x40117F?

3. What does the second subroutine called by main do?

4. What type of code construct is used in this subroutine?

5. Are there any network-based indicators for this program?

6. What is the purpose of this malware?

My answers:

1) What operation does the first subroutine called by main perform?



As you can see the first subroutine called by main is sub_401000. This subroutine is called without any arguments. Let's see what this subroutine really do:



The subroutine is the same as in Lab6-1 - it checks if the Internet connection is possible on the victim's machine. The subroutine does the "if operation" which checks the return value of the InternetGetConnectedState function - same as earlier.

2) What is the subroutine located at 0x40117F?

The subroutine is the same as in Lab6-1 so this is a printf.

3) What does the second subroutine called by main do?

The second subroutine called by main is sub_401040. This function is quite large so I've decided to not to paste the screenshot here. You can firstly check the xrefs-from graph of this subroutine to check what functions this method calls. Here they are printf, InternetOpenUrlA, InternetOpenA, InternetReadFile. InternetCloseHandle so just by looking at xrefs we can suppose that this subroutine is responsible for downloading a file from some domain.    



The above picture shows the first significant lines of code that are executed by the subroutine. They mean that the subroutine establishes the HTTP connection with http://www.practicalmalwareanalysis.com/cc.htm and the client of this connection is Internet Explorer 7.5/pma browser. 



After a successful HTTP connection establishment, the subroutine reads the cc.htm file from the practicalmalwareanalysis.com domain and writes the content to the variable called Buffer by IDA. I've renamed it to the cc_file_data.


Here we have the series of checks on the first, second, third, and fourth element of the cc_file_data buffer respectively. If the first four characters of the buffer are exactly "<!--" then the subroutine jumps to this part of the code:



[ebp+var_20C] is the first character from the cc.htm file after the "<!--" sequence. This character sequence means the beginning of the comment in HTML language. It follows that the subroutine returns the first character of the comment from practicalmalwareanalysis.com/cc.htm file. I think that this comment technique is used to hide the malicious command from the victim since the comments written in HTML are not visible on the site. To be able to investigate them the victim should look into the source code of the exemplary page. This is the C code of the analyzed subroutine:



4) What type of code construct is used in this subroutine?

This subroutine fills a character array with the first character after the beginning of the HTML comment. Before this action, it downloads the whole HTML file using HTTP protocol and checks first 4 bytes of the downloaded data for "<!--" sequence and if this sequence exists then it grabs the first character after it and treats this character as a command.

5) Are there any network-based indicators for this program?

The obvious network-based indicator is the practicalmalwareanalysis.com/cc.htm domain. We can check for the HTTP connection to this host to see if an exemplary machine is infected by the malware. Another network-based indicator is the User-Agent Internet Explorer 7.5/pma. These two indicators can be gathered from strings that reside within the binary or from reading the binary code using IDA. A different way to check for these indicators is to do basic dynamic analysis with the real network (we can do this since we reversed the binary and didn't find anything dangerous like network exploits) or fake network with the properly set HTTP server. I've used fake network but without setting HTTP server to return the malware's domain and Wireshark didn't print anything useful so that's why I think that real network would work. In the end, I prefer using reverse engineering for finding such information.

6) What is the purpose of this malware?

At the beginning of the execution this malware checks if the system has an Internet connection available. If the malware can not establish an Internet connection then it basically terminates. Otherwise, Lab06-02.exe connects to the practicalmalwareanalysis.com/cc.htm using the HTTP protocol and downloads the cc.htm file, this action has to be taken by unique Internet Explorer 7.5/pma User-Agent.
If downloading is successful then the malware checks if first, four bytes of the downloaded HTML file are exactly this <!-- sequence.  So the malware checks for the HTML comment at the beginning of the cc.htm file and if our binary finds the comment then it parses the first character after <!-- sequence as the command.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Lab 6-3
In this lab, we’ll analyze the malware found in the file Lab06-03.exe.

Questions:

1. Compare the calls in main to Lab 6-2’s main method. What is the new function called from main?

2. What parameters does this new function take?

3. What major code construct does this function contain?

4. What can this function do?

5. Are there any host-based indicators for this malware?

6. What is the purpose of this malware?

My answers:

1) Compare the calls in main to Lab 6-2’s main method. What is the new function called from main?

A new function call in Lab06-03.exe inside the main function is sub_401130.

2) What parameters does this new function take?



This new function takes two arguments. The first argument is the command taken from the practicalmalwareanalysis.com/cc.htm file and the second argument is the argv[0]. argv is character pointers array with the program's console arguments inside. argv[0] is always set to the filename of the executed file inside the console, argv[1] is the first console argument of the program, argv[2] is the second argument and so on. Therefore the second argument of the new function is the filename of the executed malware, in our case it's Lab06-03.exe. That's how the call to the new function looks like - sub_401130(command, argv[0]);.

3) What major code construct does this function contain?

Let's investigate the content of this function. The very first code construct appearing to us is simple if statement.


The above screenshot shows this if statement. It takes the one-byte character from the parsed HTML file and subtracts the ASCII value of 'a' character. This trick is useful when someone wants to convert a character into an int value representing the character position in the alphabet. We have to remember that this conversion begins with 0 to 25 -> 'a' is 0 and 'z' as the last letter in the lowercase English alphabet is 25. This if statement checks if the command is from the range ['a'-'e']. If the result of this if statement is false then the processor jumps to the code that informs about an error, otherwise CPU goes into another code construct - switch case representing by the jump table.




I have to admit that HTML_parsed_command isn't an appropriate name for this variable since after the first instruction we have the alphabet position of the HTML_prased_command in the edx register, not a character. The position of the parsed command in the alphabet determines which path the code will take so the position in the alphabet of the parsed command is the case in the switch case code construct. This is how the two major code constructs look like in the C code:



I know that the command_alphabet_position is unsigned int since in the if statement there was ja (jump if above) instruction that is used with unsigned variables. With ints, the compiler should use jg (jump if greater) instruction.

Note that this code has unsigned int overflow. Suppose that the command parsed from HTML file is 'A'. Then we have this kind of subtraction -> 0x41 - 0x61. ('A' - 'a') After mov [ebp+HTML_parsed_command], ecx operation the variable has the very big value as a result since HTML_parsed_command is an unsigned int.

4) What can this function do?

I've reversed the first part of this function, but there is the second part - what is going on inside each case within the switch case? Let's check it.


This is the jump table used by the switch case inside the malware. Let's rename each of the loc_ to the appropriate cases to make the further analysis more convenient.


Now we can check what code is executed in each of the cases.

Case 0 (command 'a'):


When the character command alphabet position is 0 so when the character command is the letter 'a' (supposing that the malware doesn't actually have unsigned int overflow) then the malware creates "C:\\Temp" directory with the security descriptor inherited from C directory.

Case 1 (command 'b'):



When the command is 'b' then the malware copies itself into C:\\Temp\\cc.exe for persistence. TRUE value is passed as the last argument to the CopyFileA WinAPI function which means that if C:\\Temp\\cc.exe file already exists then the CopyFileA function fails.

Case 2 (command 'c'):



If the command parsed from HTML is the letter 'c' then the malware simply deletes C:\\Temp\\cc.exe file.

Case 3 (command 'd'):



If the command is the letter "d" then the malware does some operation inside the registry. First of all, I've converted magic numbers into standard constants before the RegOpenKeyExA call.



As you can see the malware opens HKLM\Software\Microsoft\Windows\CurrentVersion\Run key with KEY_ALL_ACCESS privileges. Next, it sets the value of this key to "Malware" and data of this value to C:\\Temp\\cc.exe. If the malware doesn't set the registry entry correctly it prints "Error 3.1: Could not set Registry value" to the console.

Case 4 (command 'e'):




When the command is the letter "e" the malware sleeps for 100 minutes.



After each case the malware returns from the function. This the sub_401130 written in C:


Depending on the command this function can: 

- For the command 'a': create C:\\Temp directory

- For the command 'b': copy the malware into C:\\Temp\\cc.exe file for the persistence

- For the command 'c': delete the copy of the malware

- For the command 'd':  write malware entry into the HKLM\Software\Microsoft\Windows\CurrentVersion\Run registry key to make sure that the malware will be always launched at the boot-time

- For the command 'e': force the malware to sleep for 100 minutes

5) Are there any host-based indicators for this malware?

The first host-based indicator for this malware is obviously the C:\\Temp\\cc.exe file. Another host-based indicator is the registry entry inside the HKLM\Software\Microsoft\Windows\CurrentVersion\Run set to Malware: C:\\Temp\\cc.exe.

6) What is the purpose of this malware?

The action taken by this malware is depending on the command downloaded from the practicalmalwareanalysis.com/cc.htm HTML file. This network request is done using the HTTP protocol, therefore, before the GET request the malware checks if an Internet connection is available on the victim's machine. If there is an Internet connection then the malware tries to get the cc.htm file which is the HTML file with an embedded comment at the beginning of its content. If this comment exists the malware parses the first character after the comment as the command. Depending on this command the malicious program creates the "C:\\Temp" directory, copies itself into "C:\\Temp\\cc.exe" file, deletes self-copy, writes itself into Run registry key or sleeps for 100 minutes.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Lab 6-4
In this lab, we’ll analyze the malware found in the file Lab06-04.exe.

Questions:

1. What is the difference between the calls made from the main method in Labs 6-3 and 6-4?

2. What new code construct has been added to main?

3. What is the difference between this lab’s parse HTML function and those of the previous labs?

4. How long will this program run? (Assume that it is connected to the Internet.)

5. Are there any new network-based indicators for this malware?

6. What is the purpose of this malware?

My answers:

1) What is the difference between the calls made from the main method in Labs 6-3 and 6-4?

The difference between the calls made from the main method in Labs6-3 and 6-4 lies in the new structure of the same calls. The functions called in the main of the Lab06-04.exe are the same as in the Labs06-03.exe binary.

2) What new code construct has been added to main?

If we take a look at the Graph View of the main function inside the binary we can see that there is an interesting branch that shows the loop. This new code construct has been added to the main function and its index is used inside the get_command function which downloads the HTML file from the attacker site and parses the content in search of a command. The index of the loop is passed as the only argument for this function. Look:



3) What is the difference between this lab’s parse HTML function and those of the previous labs?

As I said before, the most meaningful change in this lab compared to the previous ones is the loop. Inside this loop, there are almost the same calls as earlier but primarily there is the difference inside parse HTML function. The loop has the index that I renamed to i. This index is passed to the parse HTML function and I renamed this method to get_command. Cross-references table for this variable is here:


You can see that the index is used once inside get_command function in this piece of code:


This time IDA recognized function from the glibc library since we have _sprintf function here. The index of the loop is used as the argument in the arguments list passed to the %d format specifier and after this operation the whole "Internet Explorer 7.50/pma%d" is placed inside the local buffer [ebp+szAgent] by the sprintf function. Let's get back to the main function and investigate the range of the loop:



loc_4012AF is the asm label and in this exemplary program when the CPU jumps into it then the program returns from the main function with 0 exit code. So the initial value of i is 0 and when i is greater or equal to 1440 the loop breaks. In C code the loop is exactly like this:



In this lab's HTML parse function the User-Agent is generated dynamically by the loop's index and multiple "versions" of Internet Explorer 7.50/pma are in use (from pma0 to pma1439). In the previous labs, only one User-Agent was enough for the malware.

4) How long will this program run? (Assume that it is connected to the Internet.)

To be able to get the approximate program's duration, we should look for calls to the Sleep function for example.


The IDA tells us that the call to the Sleep function is executed inside the for loop. As we know from earlier analysis this for loop runs exactly 1440 times and in each iteration, the program will sleep for 60000 milliseconds that is 60 seconds which gives us exactly 1 minute of sleep in each iteration. Due to this fact, the program will run for over 1440 minutes which is exactly 24 hours == 1 day.

5) Are there any new network-based indicators for this malware?

Yes, there is a new network-based indicator for this malware, specifically - "Internet Explorer 7.50/pma%d" where %d is the number of minutes the program has been running.

6) What is the purpose of this malware?

First, the program checks for an Internet connection. If none is found, then the program terminates. Next, the malware downloads HTML file from this particular domain - "practicalmalwareanalysis.com/cc.html" using unique User-Agent which contains tracking information about the number of minutes the program has been running. The downloaded HTML file should have the comment embedded in the beginning. If the malware finds the "<!--" character sequence then it parses the first character after it and treat this parsed value as the command. The first character in the HTML comment of the downloaded file is used in the switch statement to determine the action to take on the victim's machine. Depending on the parsed command the malware can create the "C:\\Temp" directory, copy the malware into "C:\\Temp\\cc.exe" file for persistence, delete "C:\\Temp\\cc.exe" copy, set the Run registry key to malware's path, and turn itself to sleep for 100 seconds.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is the end of the labs from "Recognizing C Code Constructs in Assembly". In my opinion, these tasks were exciting and I'm looking forward to taking the next challenge to get better and better in the malware analysis. I hope that you enjoyed reading my solutions and learned something new. Thanks for reading.
Cheers!


Comments

  1. Learning Of Malware Analysis. Solving Labs From The "Recognizing C Code Constructs In Assembly" Chapter From The "Practical Malware Anlysis" Book >>>>> Download Now

    >>>>> Download Full

    Learning Of Malware Analysis. Solving Labs From The "Recognizing C Code Constructs In Assembly" Chapter From The "Practical Malware Anlysis" Book >>>>> Download LINK

    >>>>> Download Now

    Learning Of Malware Analysis. Solving Labs From The "Recognizing C Code Constructs In Assembly" Chapter From The "Practical Malware Anlysis" Book >>>>> Download Full

    >>>>> Download LINK bE

    ReplyDelete

Post a Comment

Popular posts from this blog

Learning of malware analysis. Solving 9-1 lab from the "OllyDbg" chapter. ("Practical Malware Analysis" book)

PicoCTF 2018 - Reverse Engineering writeups