Learning of malware analysis. Solving 9-2 lab from the "OllyDbg" chapter. ("Practical Malware Analysis" book)

 





Hi again

Obviously, today's topic is about advanced dynamic analysis again since I'm still in the same chapter as earlier. But each subsequent task should be harder and I hope that I will learn something new during today's analysis process. So now, let's "jump into catacombs". 


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Lab 9-2

Analyze the malware found in the file Lab09-02.exe using OllyDbg to answer the following questions.

Questions:

1. What strings do you see statically in the binary?

2. What happens when you run this binary?

3. How can you get this sample to run its malicious payload?

4. What is happening at 0x00401133?

5. What arguments are being passed to subroutine 0x00401089?

6. What domain name does this malware use?

7. What encoding routine is being used to obfuscate the domain name?

8. What is the significance of the CreateProcessA call at 0x0040106E?


My answers:

1. What strings do you see statically in the binary?

To see the strings of any binary it's very helpful to load it into some tool made for static analysis such as PE Studio for example. Let's investigate the strings that reside probably inside some read-only sections of the PE file. After looking into the Strings section inside the PE Studio tool there aren't many interesting strings besides some imports from the socket library. To answer the question properly I would say that the imports tell me that the malware might communicate with some hosts across the network. Moreover, the malicious executable manipulates the file-system with the help of WriteFile or CreateProcess. The GetCommandLine indicates that the malware has a console subsystem:


As you can see the malware is GUI-based, not Console which is quite interesting.

2. What happens when you run this binary?

This question is perfect for basic dynamic analysis. Thus, I'm going to launch the Process Explorer and Process Monitor. Process Monitor is for checking the internal calls done by the binary at runtime and Process Explorer is for investigating the "life" of the binary in the system after launching it.

Process Explorer:



The Process Explorer showed exactly 64 syscalls done by the malware. This number is obviously very small compared with the average executables. Our malware has loaded needed DLLs and after these operations, it exits with the help of the CloseFile function. Process Monitor shows exactly this situation - the malware's process closes itself almost immediately after start. We have to investigate why the binary exits without doing anything else with the IDA Free and Immunity Debugger. 

3. How can you get this sample to run its malicious payload?

And this is the question that has to be answered with advanced analysis techniques. We have to check the binary code with IDA and understand how the malware works using Immunity Debugger (in some complex cases) Here is the start of the main function:

Some buffer is initialized with two strings - "1qaz2wsx3edc" and "ocl.exe".


Then some rep operations are being done. Let's see what the malware does inside a debugger. Using a debugger you can see that the malware writes some random bytes from the offset unk_405034 inside the [ebp+var_1F0] in the first rep operation. The second rep operation writes 0 bytes into the buffer of length 0x43. This buffer begins at [ebp-0x2FF]. The next operation is done by the GetModuleFilename function that returns the path to the malware's file and you can see it on the below screenshot.


_strchr function is executed right after getting the full path to the malware's executable.


This call returns the pointer to the last occurrence of "\" character in the malware's path. The address of the found character is placed inside edx register and after add edx, 1 instruction the pointer is moved to the beginning of the malware's filename. The malware's filename is then passed to the strcmp function as the second parameter and is compared with the content of [ebp+var_1A0] buffer.


If the malware's filename is exactly the same as the content of [ebp+var_1A0] the malware jumps into its malicious payload and runs it. Otherwise, the malware exits. Thus, to answer the question we have to check what lies inside [ebp+var_1A0] at the moment of _strcmp execution.


It's now clear that the first argument of that strcmp call is ocl.exe. Therefore, this sample will run its malicious payload only if its filename is ocl.exe. Another way to force the malware to run its malicious payload is to patch it and change the jz instruction to jmp. In that case, the sample would execute its malicious payload with any filename set.

4. What is happening at 0x00401133?


As I've mentioned earlier this set of mov instructions with ASCII values as the second operands is the buffer initialization. After executing these operations the buffer has two strings inside -> "1qaz2wsx3edx" and "ocl.exe". We can convert it to the array to make the whole code cleaner:


5. What arguments are being passed to subroutine 0x00401089?

The call to this subroutine is here:


The first argument passed to this function is the pointer to the beginning of [ebp+str] buffer so this is simply the "1qaz2wsx3edc" string. I have no idea what data reside inside the [ebp+var_1F0] which is the second argument passed to the subroutine, thus the best approach for getting this knowledge is to use a debugger and set the breakpoint at the function call. Then we can "Follow in dump" the second element on the stack to see the real content and we're done. Obviously, from the analysis process done in the third question, we know that to be able to run the malicious payload of this malware we have to change the malware's filename to "ocl.exe".


To see the content of the memory at the address 0x12FD90 right-click at the pointer and click "Follow in dump". 


Some random data lies in the memory with the address of 0x12FD90. Thus, the answer to the question is - the arguments being passed to subroutine 0x00401089 are the string "1qaz2wsx3edc" and pointer to some buffer with random data. The address of the buffer is exactly 0x12FD90.

6. What domain name does this malware use?

From the strings section in PE Studio we can't see any domain indicator. Thus, we have to look for the function from some networking library that takes a domain name as the parameter. One of those methods is gethostbyname and the call to this function is right after the call to the sub_401089.



We can get the domain name really quickly by using a debugger and investigating the stack.


The name of the domain used in the call to the gethostbyname is "practicalmalwareanalysis.com". But the real domain that the malware uses is passed to the connect function in the further binary execution. To be able to hit the breakpoint at the call to the connect method we have to set the fake network, otherwise, the malware will sleep for 30 seconds after each gethostbyname failure.

This is the IP address of the host at the moment of the call to the connect function:


The IP address is exactly 10.0.0.1 and this is the address of my Ubuntu machine with the HTTP server launched. I used the ApateDNS tool to redirect DNS response.

 

The above screenshot from the ApateDNS tool is evidence that the malware uses the practicalmalwareanalysis.com domain. The DNS server had been asked about the practicalmalwareanalysis.com domain and has responded with the IP address of 10.0.0.1. This IP address is then used by the connect method.

 7. What encoding routine is being used to obfuscate the domain name?

If we want to find the subroutine that encodes the domain name we can do this by checking the arguments passed to each of the interesting functions inside the malware's module. Obviously, this subroutine used for the obfuscation has to be user-defined, so we need to find the function marked as "blue" in IDA.

Look at this piece of code for example:


The sub_401089 returns a string that is used as the domain name inside the WinAPI gethostbyname function. From the earlier analysis process, we are sure that the eax register contains the fully-qualified domain name. The subroutine takes two arguments - the first is the buffer with the random data and the second one is "1qaz2wsx3edc" which is probably the key used for the decoding process, not encoding. The buffer with "random data" contains probably the obfuscated domain name. So we are dealing with the call such that -> sub(buffer_with_encrypted_domain, key) where the key is "1qaz2wsx3edc". Now, we can jump into the subroutine and check its internals.


This sample is pretty straightforward and easy to follow. As you can see, the malware uses XOR operation on the buffer from the second argument with the key as the right-hand operand. Obviously, this operation is performed inside the loop since each byte of the key is XORed with each byte of the buffer. The result is the correct domain name.

8. What is the significance of the CreateProcessA call at 0x0040106E?


This raw call to the CreateProcessA function creates a cmd.exe process, but this information is not enough to understand what the malware really does. To gain more knowledge about the behavior of our malicious program let's have a look at the StartupInfo structure modifications done right before the CreateProcessA execution. This structure is used for the configuration of the created process. Obviously, this configuration is set before the execution of CreateProcessA method.


The malware modifies the I/O descriptors of the created process, in our case, this process is the system's console. The wShowWindow variable is set to 0 which means that the newly created console window would not be visible to the victim. The most important part for us is to find out what lies inside the [ebp+arg_10] in memory. arg_10 is the argument passed to the subroutine so we can check its value by examining the calls.


This is the Xrefs to graph generated by the IDA. We now know that the subroutine is executed inside the main function.


Here it is. The argument passed to the subroutine is the socket already connected to the host named practicalmalwareanalysis.com - in a real-world scenario this host would be most likely the C&C server.

Therefore, the malware redirects the I/O descriptors to the socket. Now it's clear that the malicious program creates a smart reverse shell. Every data that is sent from the attacker's host is stored inside the socket. If the stdin is redirected to the socket, the console will take each byte from the socket and execute it. Stdout and stderr are also redirected, so each time the console prints something the printed data is placed inside the socket and transported to the attacker. 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Lab 9-3

Analyze the malware found in the file Lab09-03.exe using OllyDbg and IDA Pro. This malware loads three included DLLs (DLL1.dll, DLL2.dll, and DLL3.dll) that are all built to request the same memory load location. Therefore, when viewing these DLLs in OllyDbg versus IDA Pro, code may appear at different memory locations. The purpose of this lab is to make you comfortable with finding the correct location of code within IDA Pro when you are looking at code in OllyDbg.

Questions:

1. What DLLs are imported by Lab09-03.exe?

2. What is the base address requested by DLL1.dll, DLL2.dll, and DLL3.dll?

3. When you use OllyDbg to debug Lab09-03.exe, what is the assigned based address for: DLL1.dll, DLL2.dll, and DLL3.dll?

4. When Lab09-03.exe calls an import function from DLL1.dll, what does this import function do?

5. When Lab09-03.exe calls WriteFile, what is the filename it writes to?

6. When Lab09-03.exe creates a job using NetScheduleJobAdd, where does it get
the data for the second parameter?

7. While running or debugging the program, you will see that it prints out three pieces of mystery data. What are the following: DLL 1 mystery data 1, DLL 2 mystery data 2, and DLL 3 mystery data 3?

8. How can you load DLL2.dll into IDA Pro so that it matches the load address used by OllyDbg?

My answers:

1. What DLLs are imported by Lab09-03.exe?


Lab09-03.exe imports kernel32.dll, netapi32.dll, DLL1.dll, and DLL2.dll. This information can be easily gathered with the help of the Dependency Walker tool.

2. What is the base address requested by DLL1.dll, DLL2.dll, and DLL3.dll?

Let's examine these libraries with the PEStudio program used for the static analysis process.

The base address requested by DLL1.dll:


At the moment of answering this question, I've got stuck because I actually didn't know the difference between entry-point and image-base. Therefore I would like to share my already gained knowledge with you. Without further ado, the entry-point of an image is the starting point of its code in the memory. It's basically the beginning of the .text section according to the PE file format. image-base is the starting point of the whole executable thus it's the beginning of the DOS Header in case of the PE file format. An entry-point is the RVA - it's relative to the image-base.
To sum-up: we have to take the image-base into consideration in case of answering the question.

The base address requested by DLL2.dll:


The base address requested by DLL3.dll:



Each of the mentioned dynamic linked libraries has the 0x10000000 base address. So we can be sure that we will be dealing with the relocation process.

3. When you use OllyDbg to debug Lab09-03.exe, what is the assigned based address for: DLL1.dll, DLL2.dll, and DLL3.dll?

It appears that when you open the file within the Olly/Immunity Dbg only two of three DLLs are loaded - DLL1.dll and DLL2.dll. DLL3.dll has to be loaded dynamically by the process using the LoadLibrary function. These are the base addresses of the first two DLLs:


0x00330000 and 0x10000000 - the address of the DLL2.dll and DLL1.dll respectively. Now let's have a look at the code of the program. We need to find the address of the call to the LoadLibrary("DLL3.dll") function.


As you can see the address is 0x401041. Let's launch the debugger and put a breakpoint after the call to see the base address of DLL3.dll. Hit F9 to run the code and check the Memory Map section of your debugger.


DLL3.dll is loaded at 0x00390000. Obviously, these addresses are virtual and they are mapped to the physical addresses with the help of MMU. This mechanism is known as the protection mode since the processes cannot affect each other's memory. Only the OS and drivers that work inside the kernel-mode have access to physical memory. 

4. When Lab09-03.exe calls an import function from DLL1.dll, what does this import function do?

There is only one import from DLL1.dll in the malware's executable Import Table. This function is named DLL1Print. This name is pretty straightforward so let's execute this function inside the debugger - maybe we will save some time. 


This is the output from the DLL1Print function but it doesn't tell us much. We have to analyze this method inside the IDA.


From the "Names window" in IDA we know that the address of the mentioned function is 0x10001020. Let's jump into this address by left click on the DLL1Print in the "Names window".

The code of the DLL1Print:


dword_10008030 is equal to 1 and this is a global variable in general. As the picture shows, there is a call to the sub_10001038 with two arguments, therefore the execution of this method is exactly this -> sub_10001038("DLL 1 mystery data %d\n", 1) - the first argument is the main part of the string printed to the console earlier. We have to understand how the "mystery" value is created inside the subroutine.



I've renamed the subroutine to printf since the code inside it is quite popular. The calls to __stbuf and __ftbuf are the strong indicators that we are dealing actually with the real printf function - I know this from my earlier experience with malware analysis that I've gathered with the help of this book. :) Anyway, the whole code of the  DLL1Print function looks much more cleaner:


But if the subroutine called by the function is just a printf why the printed value isn't 1? As we already know, dword_10008030 is the global variable. That being said it can be modified from anywhere in DLL1. At least one function is called before DLL1Print method - that's a fact. This function is DllMain.

DllMain is called when a dynamic linked library is successfully loaded to the memory. Obviously, dword_10008030 is fully accessible from inside the DllMain. Let's investigate the xrefs to our global variable and see if we can find something useful.


And here we go! The global variable is modified while the loading process. Let's see what we can find inside the DllMain.


The mystery solved! The value printed to the console is simply the process id of the malware's executable. (since the DLL1.dll is loaded to the malware's process memory) So the answer to the question is: The imported function from DLL1 prints "DLL1 mystery data <The ID of the process the library is loaded in>".

5. When Lab09-03.exe calls WriteFile, what is the filename it writes to?

The interesting thing in the piece of code significant for this question is the handle passed to the WriteFile.



This handler is returned from the DLL2ReturnJ method that resides within the DLL2.dll. We have to analyze the mentioned function to answer the question correctly. Thus, let's see what we can find inside the DLL2. But before it, I would like to remind you that the base address of DLL2 is changed from its default state due to the relocation. That being said, it's a good practice to set the base address of DLL2 (in IDA) to its address after the relocation. Click Edit -> Segments -> Rebase program and change the Target value to 0x00330000 since this is the DLL2's base address in Olly/Immunity Dbg after the relocation.


The DLL2ReturnJ function only returns the handle created somewhere else in the DLL2. The best option to find where this handle is created is to use xrefs to. Here we have a beautiful chart generated by the IDA:


As you can see the handle we are interested in is modified inside the DllMain and the DLL2Print except for the DLL2ReturnJ. Pay attention to the first screenshot provided in this question. From the code in this picture is clear that there was no call to the DLL2Print before the execution of the DLL2ReturnJ. Therefore, all magic has to be done inside the DllMain function that is executed while the DLL2's loading process. So let's jump into that function.


 And that's what we are looking for. Now we know that the DllMain creates the temp.txt file inside the malware's directory and the function DLL2ReturnJ returns the handle to this file. To sum up - the WriteFile function writes to the file named temp.txt.

6. When Lab09-03.exe creates a job using NetScheduleJobAdd, where does it get
the data for the second parameter?


The red rectangle marks the appropriate code to analyze to gain information on where the malware gets the data for the NetScheduleJobAdd. The GetProcAddress function is the key to understand this concept. As the MSDN states - the GetProcAddress function retrieves the address of the exported function or variable from the specified DLL. In our case, the specified DLL is the DLL3. That being said, the GetProcAddress function returns the address of the DLL3GetStructure function from the DLL3.dll. Then the malware calls this function with the address of the Buffer at call [ebp+var_10] line. So the malicious program gets the data for the NetSheduleJobAdd's second parameter by executing the DLL3GetStructure function. The question isn't about investigating what this function really does, but let's find out how this method works just out of the curiosity.

In addition to question 6. DLL3GetStructure function analysis.

The whole logic of this function is like this:


[ebp+arg_0] is the address of the Buffer variable that we are interested in. Thus, we have to see what lies inside the dword_1000B0A0 since the value from this global variable is written inside the buffer. The global variable is uninitialized when you look at it in IDA just by left-clicking at dword_1000B0A0. Therefore it has to be initialized before the call to DLL3GetStructure is made. This is how the xrefs to flow chart looks like:



This picture clearly indicates that the dword_1000B0A0 variable has to be initialized inside the DllMain function, so obviously it is done before calling DLL3GetStructure


The variable is initialized after the LoadLibraryA execution. If we are sure how the malware actually sets the variable of our interests, let's take a look inside the DllMain of the DLL3.dll.



Inside the DllMain we have to deal with converting UTF-8 into UTF-16 string. As you can see there is the command ping malwareanalysisbook.com command that is stored in the memory after our dword_1000B0A0 variable. It appears that the best option to examine the Buffer that is passed to the NetScheduleJobAdd function is to launch a debugger and look at the bytes within that buffer.


In the red rectangle, we have the 0x39B0C000 saved as a little-endian. From the MSDN we know that the NetScheduleJobAdd function takes AT_INFO structure as the second parameter. So our Buffer variable is actually the AT_INFO structure that describes the job to schedule for executing by the OS in the feature. 0x39B0C000 is the pointer to the command to execute. This cmd will be executed on Mondays and Fridays according to the 11 UCHAR value. 0x7f represents the days of the month on which the command will be executed. Therefore, the command will be executed from 1st to 23th of each month. 0x0036EE80 is the JobTime value. To sum up, the ping malwareanalysisbook.com command will be executed on Mondays and Fridays, and from 1st to 23th of each month for 0x0036EE80 milliseconds, that is for 1h.
As you can see, this piece of code that we analyzed in addition to the 6th question is actually the core of the Lab09-03.exe malware.

7. While running or debugging the program, you will see that it prints out three pieces of mystery data. What are the following: DLL 1 mystery data 1, DLL 2 mystery data 2, and DLL 3 mystery data 3?

The mystery data from the DLL1 is the malware's process ID as I've mentioned in the answer to the fourth question. More precisely, this mystery data is the ID of the process the library is loaded in. Due to this, I'm going to take a look straight into DLL2 and DLL3 respectively.

The function responsible for printing the mystery data in the DLL2 is DLL2Print. Here is how it looks like:



The key to solving the mystery is the sub_1000105A method which takes the DLL2 mystery data %d\n string as the first argument and the dword_1000B078 global variable as the second argument. This variable is probably an integer. If you look at the code of the sub_1000105A you will see that this subroutine is just a printf function. Therefore, our mystery data lies inside the dword_1000B078 variable. From the very helpful IDA feature, xrefs to we can rapidly deduce that this important number is modified inside the DllMain function. Obviously, the whole operation is done while DLL's loading process, long before the call to the DLL2Print function is made.



Based on the below code of the DllMain, the answer to the question is fairly simple.


Answer 1: DLL2 mystery data is the open handle to the "temp.txt" file or INVALID_HANDLE_VALUE in case of an error.

Now we have to examine the DLL3 mystery data.


As the code shows we are dealing with a similar situation to the DLL2. Let's take the same path and check the xrefs to the WideCharStr variable.


They are just too predictable. Again, the mystery data is set inside the DllMain function.


MultiByteToWideChar maps a character string to a UTF-16 (wide character) string. Thus, the mystery data should be a "ping www.malwareanalysisbook.com" UTF-16 string. We can check it by calling the DLL3 using a debugger. One more thing - the address of the call to the DLL3Print is 0x40105C relative to the 0x401000 address of the entry point. This call is done by getting the address of the exported function with the help of the GetProcAddress



I have to admit that I've made a serious mistake writing about the DLL3's mystery data as the "UTF-16 string". And I'm in a hurry to explain why.


The marked word is taken from the MASM syntax and it is self-explanatory.  The whole instruction pushes the OFFSET of the WideCharStr WCHAR buffer onto the stack and not the VALUE/CONTENT of this buffer! Therefore, the mystery data printed by the DLL3 is the ADDRESS of the "ping www.malwareanalysisbook.com" in the process memory. Obviously, this address is printed as an integer due to the "%d" format.

8. How can you load DLL2.dll into IDA Pro so that it matches the load address used by OllyDbg?

From the third question, we know that the Immunity Dbg (you can check it by using OllyDbg but I simply prefer Immunity) loads DLL2 at the 0x00330000 address. If we want to load DLL2.dll into IDA so that it matches the 0x00330000 load address we have to rebase the image.

Click Edit -> Segments -> Rebase program...



Then the Target has to be equal to the address which will be our new entry point in IDA.


Now, as you can see the origin is set exactly to the 0x00330000 and obviously, this is our new entry DLL2's entry point.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The end. Next time I will share with you my solutions for the tasks from Chapter 10. The labs are based mostly on the kernel-mode. I'm excited since we will be jumping right into the OS internals. I hope that you are also learning with me and that my solutions for the tasks are helpful for you.
Have a nice day and keep wh1t3 h4ck1ng! :)

Picture Pete Linforth from Pixabay

Comments

Popular posts from this blog

Learning of malware analysis. Solving 9-1 lab from the "OllyDbg" chapter. ("Practical Malware Analysis" book)

Learning of malware analysis. Basic static analysis labs from "Practical Malware Analysis" book