Learning of malware analysis. Advanced static analysis labs from "Practical Malware Analysis" book

Hello everyone!

Finally, the time has come to improve our malware analysis process with advanced static analysis techniques. This type of investigating malicious programs contains reverse engineering of the suspected binaries and for now on we will be able to dig deeper into analyzed malware. I must admit that I missed this aspect before and now I feel free. I'm sure that from this moment malware analysis process will be more interesting as well as challenging. Without further ado, I bring to you my solutions of the advanced static analysis lab done using reverse engineering tool called IDA.

So as always I invite you to read my solutions of tasks and wish you have fun while learning new things!




~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Lab 5-1
Analyze the malware found in the file Lab05-01.dll using only IDA Pro. The goal of this lab is to give you hands-on experience with IDA Pro. If you’ve already worked with IDA Pro, you may choose to ignore these questions and focus on reverse-engineering the malware.

Questions:

1. What is the address of DllMain?

2. Use the Imports window to browse to gethostbyname. Where is the import located?

3. How many functions call gethostbyname?

4. Focusing on the call to gethostbyname located at 0x10001757, can you figure out which DNS request will be made?

5. How many local variables has IDA Pro recognized for the subroutine at 0x10001656?

6. How many parameters has IDA Pro recognized for the subroutine at 0x10001656?

7. Use the Strings window to locate the string \cmd.exe /c in the disassembly. Where is it located?

8. What is happening in the area of code that references \cmd.exe /c?

9. In the same area, at 0x100101C8, it looks like dword_1008E5C4 is a global variable that helps decide which path to take. How does the malware set dword_1008E5C4? (Hint: Use dword_1008E5C4’s cross-references.)

10. A few hundred lines into the subroutine at 0x1000FF58, a series of comparisons use memcmp to compare strings. What happens if the string comparison to robotwork is successful (when memcmp returns 0)?

11. What does the export PSLIST do?

12. Use the graph mode to graph the cross-references from sub_10004E79. Which API functions could be called by entering this function? Based on the API functions alone, what could you rename this function?

13. How many Windows API functions does DllMain call directly? How many at a depth of 2?

14. At 0x10001358, there is a call to Sleep (an API function that takes one parameter containing the number of milliseconds to sleep). Looking backward through the code, how long will the program sleep if this code executes?

15. At 0x10001701 is a call to socket. What are the three parameters?

16. Using the MSDN page for socket and the named symbolic constants functionality in IDA Pro, can you make the parameters more meaningful? What are the parameters after you apply changes?

17. Search for usage of the in instruction (opcode 0xED). This instruction is used with a magic string VMXh to perform VMware detection. Is that in use in this malware? Using the cross-references to the function that executes the in instruction, is there further evidence of VMware detection?

18. Jump your cursor to 0x1001D988. What do you find?

19. If you have the IDA Python plug-in installed (included with the commercial version of IDA Pro), run Lab05-01.py, an IDA Pro Python script provided with the malware for this book. (Make sure the cursor is at 0x1001D988.) What happens after you run the script?

20. With the cursor in the same location, how do you turn this data into a single ASCII string?


21. Open the script with a text editor. How does it work?

My answers:

I do have some experience with IDA but I would like to recall the interface of this amazing tool. So first of all I'm going to answer the above questions and then as a bonus do reverse engineering over the whole binary with help from IDA and Ghidra. One more thing. I have to tell you that I've created a new virtual machine with Windows 10 as the OS to do reverse engineering on it. RE is a static method of analysis, therefore, the binary will not be launched and I will be able to analyze it on the other system than Windows XP. Now let's start the fun with reverse engineering.

1) What is the address of DllMain?



DllMain function is called within DllEntryPoint function which begins at the entry point of the malware. The code in DllEntryPoint is not important for us because these are compiler generated instructions. The function from the virtual address 0x1000D02E in the .text section is DllMain indeed. The method from this virtual address wasn't exactly DllMain when it comes to naming since the name of this method was sub_1000D02E. If you found some function that comes from Win API then it's worth changing the name generated by IDA to standard. Then the process of further reverse engineering is simpler.

2) Use the Imports window to browse to gethostbyname. Where is the import located?



The gethostbyname resides at location 0x100163CC in the .idata section of the binary.


3) How many functions call gethostbyname?

To know how many functions call gethostbyname we have to look at cross-references of this method. Cross-references indicate which functions called the function which we are investigating. Let's have a look at x-refs of gethostbyname.


There are exactly 18 cross-references. In the case of that question, we need to focus on the Type of each cross-reference. According to this site, p stands for calls of the investigated function (gethostbyname in our case) and r means "read" reference. I wondered why the r type appears on that list and I read from the book that r exists within the cross-references table of the function since the CPU has to read import table first and then jump to the address of the function within the library.
Back to the question - there are nine cross-references of type p which means, that the gethostbyname method is called nine times by exactly five different functions.

4. Focusing on the call to gethostbyname located at 0x10001757, can you figure out which DNS request will be made?

To be able to analyze this exact location in the binary we have to jump directly to 0x10001757 address. It can be done with pressing G and typing 0x10001757.



We are exactly at the location for which authors of the book ask. In the calling convention used in this binary arguments are pushed onto the stack. From the MSDN we know that the gethostbyname function accepts one argument called "name" which is a string and strings are actually the addresses of NULL-terminated characters in memory. So in this case, we have the address of the NULL-terminated characters in eax register which is then pushed onto the stack as the argument for the gethostbyname function. Let's double click on off_10019040 to see the characters hidden behind that address.


It has to be noticed that the address passed to gethostbyname is actually (off_10019040 + 13) so the whole string is pics.practicalmalwareanalysis.com after shifting the pointer by exactly 13 bytes. (1 character = 1 byte so we basically move the pointer by 13 characters) Therefore gethostbyname will make DNS request for pics.practicalmalwareanalysis.com.

5. How many local variables has IDA Pro recognized for the subroutine at 0x10001656?

Let's press G on the keyboard to analyze the subroutine.


Local variables for the subroutine are those of them whose offset from the esp+0x688 is negative since a stack grows towards lower addresses and local variables must be within the stack frame of the subroutine. Variables with positive offsets are the parameters of the subroutine. So IDA Pro recognized exactly 23 local variables for the subroutine at the specified address.

6. How many parameters has IDA Pro recognized for the subroutine at 0x10001656?

Only one variable has a positive offset, therefore, IDA Pro recognized only one parameter for the subroutine at 0x10001656.

7. Use the Strings window to locate the string \cmd.exe /c in the disassembly. Where is it located?

To see strings located within the binary using IDA check View -> Open Subviews -> Strings. Then to be able to find exact string Ctrl + F will do the work.


As you can see the string is located at 0x10095B34 within the xdoors_d section.

8. What is happening in the area of code that references \cmd.exe /c?

To jump to the exact location in executable press G and type the correct address to jump. Let's do this thing in order to find "\cmd.exe /c" within the binary.


Now after setting the cursor on that string press Ctrl+X to open the Cross-References window.


The area of code that references "\cmd.exe /c" is within the sub_1000FF58 subroutine. After double-clicking at the Address member of the Cross-References window we jump to the code that uses the string.


Now we can start reverse-engineering the code of the subroutine that uses our string. Let's start from the beginning.


The best way not to get lost in tons of code is to look at the function calls. You can see here 3 different WinAPI function calls and I've decided to focus on these methods and not get into __alloca_probe for example since I think that this function is not that interesting as the others.

GetTickCount as MSDN says - Retrieves the number of milliseconds that have elapsed since the system was started, up to 49.7 days.  It looks like this function is used for measure time between taken actions by the malware but I'm not sure about it and I think that it's not something to worry about.

GetCurrentDirectoryA is much more interesting than the above method. It gives information to the malware about the directory in which it operates. Next, we have a call to the GetLocalTime function and in this case, the result of this method is stored within the SYSTEMTIME structure as the MSDN states. IDA also gives us this information by changing the name of the local variable to the SystemTime. Obviously, this exact function returns the local time of the system in the appropriate format. Then there is get_idle_time function which earlier was named sub_<the address of the subroutine> but I changed its name to make the code more readable.  Let's have a look at the internals of this function:


This function is quite interesting. GetLastInputInfo and GetTickCount functions are responsible for detecting the time of the user inactivity in the system. The difference between the number of milliseconds that have elapsed since the system was started and the number of milliseconds that have elapsed since the user made an input is exactly the user's idle time. In this case, the milliseconds are converted to seconds at the end of the function.

Then after the call to the above function, we have a lot of code with div instructions and it's not a good idea to deep into it since we can't use a debugger to do dynamic analysis for now.


In this code, the most important part is the string "Hi, Master...". Double-clicking at it allows us to jump to the memory location of that string and see the whole text.


This whole bunch of text can be the welcome command sent from the victim's computer to the attacker. From this string, we can suppose that this DLL might be responsible for creating a remote shell session. The series of div instruction before the call to sprint is probably for calculating the "Encrypt Magic Number" to make the remote shell session unique.


After strlen function call, there is a call to the subroutine. The code of this subroutine is responsible for sending the passed string probably to the C&C server of the attacker. So I've changed the name of the subroutine to send_command.

Now we are even more convinced that the area of the code that references the "\\cmd.exe /c" is responsible for remote shell session. After checking the rest of the code inside the analyzed function cursorily a lot of memcmp functions appeared alongside with customed shell commands such as "quit", "cd", "idle", "exit" and so on. If one of these memcmp functions return 0 then the malware sends appropriate information to the attacker. For example, let's have a look at what will happen if the received command is "language":


When memcmp returns 0 then the malware call this function in case of "language" command:


As you can see the information about the language on the victim's machine is sent to the attacker own server. Now we can be sure that the area of the code that references to "\\cmd.exe /c" is responsible for creating a remote shell session between the victim's host and attacker's C&C server.

9) In the same area, at 0x100101C8, it looks like dword_1008E5C4 is a global variable that helps decide which path to take. How does the malware set dword_1008E5C4? (Hint: Use dword_1008E5C4’s cross-references.)



After double-clicking at dword_1008E5C4 we are able to jump to the exact memory location of this global variable. Then click Ctrl+X to see the variable x-refs and find the code which is responsible for setting dword_1008E5C4 to some value.


We are lucky since the only code that sets this variable to some value is mov dword_1008E5C4, eax. Let's jump to the address of this instruction and see if we find something useful there. Maybe we can check the exact value of the global variable?


The value of this global variable is set by the malware using the result of sub_10003695. This subroutine is called without any argument. This is the code of the subroutine:



The analyzed subroutine calls GetVersionExA function which supplements OSVERSIONINFO structure with information about the system version. The three instructions after the GetVersionExA are the most important here in case of investigating the value of dword_1008E5C4 variable. The instructions mean that if the dwPlatformId as the member of the structure supplemented by GetVersionExA is equal to 2 then the eax is equal to 1 since the whole register was set to 0 before executing the if statement. In other words - (from MSDN) if the operating system is Windows 7, Windows Server 2008, Windows Vista, Windows Server 2003, Windows XP, or Windows 2000  then eax is set to 1 and as the result of this, dword_1008E5C4 is also set to 1. So the malware set dword_1008E5C4 by checking what operating system a victim is running.

10. A few hundred lines into the subroutine at 0x1000FF58, a series of comparisons use memcmp to compare strings. What happens if the string comparison to robotwork is successful (when memcmp returns 0)?

If the string comparison to "robotwork" is successful then this function is called with the socket passed as an argument:


The first instructions of the function set some buffers to 0. The beginning of the investigation should be here in my opinion:


The very first argument passed to the RegOpenKeyExA is the value of hKey. IDA gives us value in the hexadecimal format but the number doesn't tell us much about the key. The best thing we can do here is to click Use standard symbolic constant over this hex value.


From the above table, it's clear that the hKey argument is HKEY_LOCAL_MACHINE. Another significant argument passed to the RegOpenKeyExA function is samDesired that specifies the desired access rights to the key to be opened. I'm going to use Use standard symbolic constant feature one more time to make the code cleaner.


Now we know that the malware opens HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion key with the KEY_ALL_ACCESS rights. The handle to the key is stored within the phkResult variable, therefore, I've changed the name to something more convenient -> CurrentVersion_key_handle. Let's see what happens next if RegOpenKeyExA function returned without any error:


This piece of code gets the value of HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\WorkTime registry key and sends this value to the attacker through the Internet connection.


Here we have the same situation but for the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\WorkTimes registry key this time.

So the "robotwork" command is used for getting the information about some work time and work times.

11. What does the export PSLIST do?

To be able to analyze this function we have to go into Exports and do double-click on the PSLIST. Here is the code to investigate:



It seems that we have a couple of calls to unknown functions so we have to analyze them since we can't use a debugger in this reverse engineering task.



The very first subroutine in our function checks if the victim's OS is above Windows98 or not. If not then this method returns 0 otherwise 1.

From the PSLIST code, we now know that if the victim's OS is below or equal to Windows98 this exported function returns without doing any further action. Otherwise, it checks the length of the string passed as a third argument. If the string is longer than 0 then we have the call to the function like this -> sub_1000664C(0, Str);. This time I will try to convert the assembly code of the function into C. We can skip the _alloca_probe function that is called on the beginning of the function since this method is simply used for allocating as many bytes on the stack as there are in the eax register before the call. So on UNIX like systems, it would be something like sub esp, 0x1634 in our case.

If the attacker specifies Str in PSLIST export then this code runs:



Now we know that Str is the specified file the attacker wants to get information about. Let's see what happens if the attacker didn't specify any file to list. In this situation, sub_10006518 is called. Conversion of the assembly code of this subroutine to C is something like this:


As you can see the difference between these two functions is the number of processes listed and sending work. If the attacker passes the string into PSLIST export then the malware sends the information about the specified process to the attacker otherwise it lists all processes running on the system but doesn't send information about them to the attacker - this is a bit strange.

12. Use the graph mode to graph the cross-references from sub_10004E79. Which API functions could be called by entering this function? Based on the API functions alone, what could you rename this function?


I have IDA 7.0 freeware version installed without the "Graph View" ability since I don't have appropriate files to do that provided with the installation pack. But there is another great tool to do this kind of job called Ghidra. Unfortunately, Ghidra has the known issue which doesn't allow me to disassemble the Lab05-01.dll file -> https://github.com/NationalSecurityAgency/ghidra/issues/1371. So, in this case, I have to check the calls to WinAPI functions without any graph view.



From the main code of sub_10004E79 we have GetSystemDefaultLangID WinAPI function and inside send_command user-defined function we have send WinAPI call so with these indicators in mind I would rename this function to get_system_language for example.

13. How many Windows API functions does DllMain call directly? How many at a depth of 2?

Again, I can't get the graph overview of x-refs from the DllMain so I have to count these WinAPI function calls from looking at the clean code of the DllMain without any helpers.

So at a depth 1, we have these Windows API functions directly called by DllMain -> CreateThread, strnicmp, strlen and strncpy.

At a depth 2 ->
X-refs from sub_10001074 function: strchr, memset, WinExec, inet_addr, gethostbyname, memcpy, inet_ntoa, strcpy, strncpy, atoi and Sleep. 
X-refs from sub_10001365 function: strchr, memset, strlen, WinExec, inet_addr, gethostbyname, memcpy, strcpy, inet_ntoa, strncpy, atoi and Sleep.

14. At 0x10001358, there is a call to Sleep (an API function that takes one parameter containing the number of milliseconds to sleep). Looking backward through the code, how long will the program sleep if this code executes?

Let's jump into 0x10001358 to see what happens by pressing G. We are inside the DllMain function which is called after Dll initialization and loading to the process memory.


To answer the above question we have to know what is the value stored in the eax register just before calling Sleep. The first five instructions settle everything. First, we have to check the value of the off_10019020 variable.


By double-clicking on this variable we can see that this is the string localized within the .data section of the DLL and its value is "[This is CTI]30". Before execution of the second instruction from the above screenshot eax stores the address of this exact string. A string is simply an array of characters where each element is one byte in size. So after adding 0x0D to the beginning of that array we will have the address of "30" within the eax register. Next, the atoi is called with the "30" as an argument and it returns 30 as an integer, not a string. Therefore, the malware can do math operations over this value and here it is -> imul eax, 0x3E8 is like this -> eax *= 0x3E8. So there is a value 30000 in the eax register just before calling Sleep. The malware executes Sleep in this example as Sleep(30000). It means that the program will sleep exactly 30000 milliseconds and it's 30 in seconds.

15. At 0x10001701 is a call to socket. What are the three parameters?

Obviously, before analyzing the code we should first jump into it. :) Press G and type 0x10001701.


The parameters in x86 32 bit Windows binary are most often pass to the functions through the stack. In the case of the malware we have cdecl calling convention where the caller clears the stack after calling callee and in this calling, convention arguments are passed through the stack indeed. To make assembly code cleaner we should change the magic numbers into symbolic constants used in WinAPI. Right-click on the magic numbers we have before the call to the socket method and next click the Use standard symbolic constant to find useful symbols.

And this is the clean answer what the three arguments passed to the socket call are:


16. Using the MSDN page for socket and the named symbolic constants functionality in IDA Pro, can you make the parameters more meaningful? What are the parameters after you apply changes?

Ok, so it looks like I've answered this question in the earlier one. :)

17. Search for usage of the in instruction (opcode 0xED). This instruction is used with a magic string VMXh to perform VMware detection. Is that in use in this malware? Using the cross-references to the function that executes the in instruction, is there further evidence of VMware detection? 

If we know the opcode for the instruction we can use Search -> Sequence of Bytes and type ED as the byte to be found. Before clicking "Ok" we have to check the Find all occurrences option and after this, the new window will appear.


As you can see the in instruction exist at the 0x100061DB address in .text section. Let's jump into it at use Cross-References to see if more calls to the function responsible for VMware detection exist.


From the Cross-References table for this function, it follows that there is further evidence of VMware detection since the VMware detection is done three times across the malware.

18. Jump your cursor to 0x1001D988. What do you find?

At 0x1001D988 address, there is probably the start of the character buffer with random data inside it.

19. If you have the IDA Python plug-in installed (included with the commercial version of IDA Pro), run Lab05-01.py, an IDA Pro Python script provided with the malware for this book. (Make sure the cursor is at 0x1001D988.) What happens after you run the script?

Unfortunately, I don't have IDA Pro and I'm unable to install IDAPython plugin on the freeware version.

20. With the cursor in the same location, how do you turn this data into a single ASCII string?

It's possible to turn this data into a single ASCII string by pressing 'A' on the keyboard. The cursor has to be at the beginning of the character buffer to do it.

21. Open the script with a text editor. How does it work?



Let's start to analyze the script from the very first instruction. ScreenEA() function is called to retrieve the current position of the cursor. Then we have for loop with exactly 0x50 iterations. In the body of this loop, there is code that takes the byte from the buffer using Byte function XORes this taken byte with 0x55 value and writes the result of XOR operation into this byte by PatchByte function. The script is used probably for making randomly data readable.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I think the tasks were fine and I learned some new IDA's features that can help me in further reverse engineering. I hope you enjoyed my solutions. Thanks for reading.
Cheers!

Comments

  1. Learning Of Malware Analysis. Advanced Static Analysis Labs From "Practical Malware Analysis" Book >>>>> Download Now

    >>>>> Download Full

    Learning Of Malware Analysis. Advanced Static Analysis Labs From "Practical Malware Analysis" Book >>>>> Download LINK

    >>>>> Download Now

    Learning Of Malware Analysis. Advanced Static Analysis Labs From "Practical Malware Analysis" Book >>>>> Download Full

    >>>>> Download LINK et

    ReplyDelete

Post a Comment

Popular posts from this blog

Learning of malware analysis. Solving labs from the "Analyzing malicious Windows programs" chapter from the "Practical Malware Anlysis" book

PicoCTF 2018 - Reverse Engineering writeups

Learning of malware analysis. Solving 9-1 lab from the "OllyDbg" chapter. ("Practical Malware Analysis" book)