Reverse Engineering task from mini CTF
me a lot of fun. Let's try to figure it out one more time.
Rex (re 100)
I received a program, which checks the correctness of an input text with the flag. Unfortunately I can't get a flag. Could you help me? You can download file from here
So when we already get a file on which we will work first of all we need to use
is a file program, which will tell us about the specify of the analysed file.
Now we know that the analysed file is 32-bit ELF executable. I have 64-bit Linux and I had a problem with open this downloaded app. How I solved that problem? Here is a solution:
ia32-libs are just a libraries, which allow to open 32-bits applications on 64-bit operation system. After the successful installation we can open our mystery file. So let's do it and see what we can do.
As we can see this file is very simple in use - we have to write a password and that's it. But the problem is - what is the correct password? Most of the people would say "I don't know. That's impossible to guess.", but we are a hackers and we will say "I don't know too. That's impossible to guess, right, but wait... I can crack a file like that!" Get focused, because we will do some magic.
The last thing which we are going to do is to complicate our life. So at the start I recommend to find a words in the memory of that program, maybe one of them will be look like a flag. Let's use a strings program in the command line.
Despite appearances that command gives us some very useful informations. For example we already know which functions our analysed file use. But unfortunately we didn't find a string which looks like a flag. And now it started really funny, because the next thing what we are going to do is to decompile our mystery file in IDA Free and understand the assembly code. I assume that everyone can decompile a file itself. When we click a main in the left side of IDA, more precisely - in the functions table, we will see the assembly code of the main function. Why we have to look at the main first? Because everything in the program starts from main.
First lines tell us about the local variables. Words with var are always local variables of analysed function. Amazing! We already know that our main function has six local variables. Each function must have a stack frame - her own area in the memory. If you don't know what is the stack you should read this article (stack) The assembly instructions which create a stack frame are usually the same for the different functions. Stack frame instructions for main are:
But we don't care about the stack frame. When you are going to understand how the application works you should looking for the call, jmp and cmp instructions. Call instruction means - "jump into the function and execute her code and then jump to the instruction after our call". Cmp is just compare and it means - "compare the value of the source registry with the value of the destination registry", jmp means - "jump into the specified address in the memory". OK, fine. What really stands out? Of course the names of the C functions!
What will this bit of code do? What do you think? This is as simple as you think it is. This code will print a "Password: " to stdout and function fflush will clear the stream buffer.
After the fflush call we have some familiar C functions one more time. And first of them is scanf.
This block of code reserves 8 bytes on the stack for the argument of scanf. You have to remember that arguments of functions are placed on the stack before call the function. That is always the same way. And this code do exactly what I wrote moment ago:
lea (Load effective address) is a special assembly instruction (not processor instruction) which put a memory address of operand into register. So in the eax register we will have the address of the var_70. push eax pushes this address on the stack and next instruction pushes the offset of "%s" string. So now our stack looks like this:
Because scanf takes two arguments from the stack our scanf will looks like scanf("%s", var_70). So now we already know that var_70 is a password_buffer in fact. Fortunately IDA has a great function which changes the name of the marked variable. So let's do it. We have to click on the var_70 and press N on a keyboard. Now we can change the name of the variable and I will change her name for password_buffer.
After scanf we can see this block of the code. If we keep mind in important instructions we will see that local variable var_76 is now equal to 1. But in this moment it tell us nothing so we are not going to change name of this variable. Next two instructions are the same as before so I'm not going to explain this again. And there is another C function called strlen. Strlen takes string as an argument and returns the length of that string. So this strlen returns the length of our input and place her into eax register. Before the last instruction we can see compare between edx and eax register. Edx stores returned length of an input and eax stores dword_804A038... but wait.... what is this?! This is just a space in the memory which can store 32-bit value. (because of dword extension) When you click on this "address" twice you can check what value is in that memory space.
1Ah is the same as 0xA in C but we are people and we prefer a decimal numbers. IDA knows that and we can simply change hex values to dec. Just click on the hex 1Ah value and press H on a keyboard. Now we know that dword_804A038 is equal to 26. Ok, so now we can click on the blue left arrow at the top of the IDA and return at the previous position. If I want to see the graph view, but you don't see this you have to press SPACE on a keyboard. We can return to the cmp edx, eax. So again - in the edx we have returned length of an input and in eax we have 26 decimal value and we compare input length with 26. cmp is just an if expression. The next instruction is jz and it means - "jump if zero" so "jump if (length_of_an_input - 26) == 0" and finally "jump if (length_of_an_input == 26)". If this expression will be true a ZERO_FLAG will be set. You can read more about ZERO_FLAG in this article. (https://en.wikipedia.org/wiki/Zero_flag) If our input had 26 characters we will go through the green branch, if hadn't we will go through the red branch. And this is the time to rewrite our assembly code to C code because it is an easier way to understand what our application exactly do.
Ok we turn our analysed assembly code into more understandable C. But what is the is_valid_password variable? Look into the assembly code again. mov [ebp+var_76], 1 set var_76 to 1. If we go through the red branch we will see this instruction - mov [ebp+var_76], 0 so I assume that var_76 is a boolean variable (we will write bool var_76; in C++). It's logical. If the length of our password is not 26 is_valid_password variable will set to 0 and we can't get a flag. It sounds good, right? So we can add one more instruction to our C code and change name for var_76 to is_valid_password.
Imagine that our input has exactly 26 characters. We are going to check what will happen if this expression will be true. So let's go through green branch in IDA.
var_74 is set to 0 - that's a first instruction in our true expression. But we have not enough informations about that variable to be sure about her destiny. Let's see what we have next.
Look at the blue branch. A branch starts at the top and at the bottom returns to the top. So that's a loop and var_74 is an iterator, because of instruction add [ebp+var_74], 1 in the end of the loop. Now we can change a name of that variable from var_74 to i. Ok, I assume that you did this. After that we have to check a loop expression and more precisely - until when iterator is added.
This block of instructions has informations about the end of the loop. I will not explain all of this instructions, because you have to know that in RE tasks sometimes you will have to assume what instructions could do. For example when I see set of code like this I know that the loop will jump on each character from an input. That is visible, because of variable password_buffer, i and this line movzx eax, byte ptr [eax]. Size of one char in memory is one byte. So this processor instruction means "move one byte value from 32-bit eax to 32-bit eax and write 0's to the rest of the bits". We should note this important thing in our C code.
It's natural that now we should check what loop does with each character in our password. So let's go through red branch in IDA and let's analyse the assembly code.
Look at the top of this block. Isn't this code familiar? Of course it is! We have address of password_buffer in edx and address of iterator of the loop is in the eax register. And another instruction is movzx eax, byte ptr [eax] so like I wrote, in eax we have character from the password like before. After that an application reserves 12 bytes (0xC) on the stack and push one ascii value of character from the password on the stack (because eax is 4 bytes register). The next instruction is call sub_80486D9 so our ascii value of character is an argument for this function. At the end of this block of instruction we have compare between al register and var_75 and after this we have jump if zero - jz instruction so that will be another if expression. But first we need to take a look into call of the function.
Let's click on the name of this function at the top and press Y on a keyboard. In this window you can change the function name and name of the parameters, but the most important thing what you could do with that IDA functionality is that you could see that our function returns an int variable and accepts an int variable as a parameter. So now we are sure that our function takes ascii value of analysed character from our input. What is the most significant instruction in this function? Of course it is call sub_80485D7. This called function doesn't accept any argument, because we have nothing on the stack. When we are focused on our task we should come up that we don't have to analyse sub_80485D7 to know what our function returns. So let's think about it. From now on I will avoid describing stack frame instructions, because I assume that you know what set of instructions creates a stack frame for each function. So in eax we have an address of ascii_value_of_char. To the var_C, application moves one byte of this value. After call, eax have one byte value from var_C. Ok... and look at that. Returns instruction is mov eax, ds:dword_804A080[eax*4]. When you will see square brackets you will know that it is about an array like in C/C++. So dword_804A080 is an address in memory of a global array. In eax we have a byte value of our character. Application multiplies our byte to 4 bytes value and that is an ascii_value_of_char. So our variable ascii_value_of_char is an index in global array. Our function returns this - global_array[ascii_value_of_char] to eax register (as always). Now we should upgrade our C code.
We should think about if expression.
Above code is simply password[i], as we agreed. We can focus on the right side of if expression - this block of code:
sub_8048D9 returns a character from global array to al register. (because al is one byte register) We put this char in var_75. So now we can change the name of var_75 to char_from_global_array. add eax, 804A03Ch - this instruction looks really interesting. Random value is added to eax where is an iterator of our loop. And now you have to focus for a while. This looks like we have allocated memory which starts from 804A03Ch (0x804a03c) address. And this is memory allocated for chars, because this instruction movzx eax, byte ptr [eax] moves byte from value in eax to eax and the rest of 8 bytes will be 0's. This might be a global array of chars. So in al register we have char from this memory and as we agreed var_75 is char_from_global_array. At the end, we compare char from new allocated memory with char from global array. Awesome! We already have our if condition. We have to update C code. But first we should look what will execute if this expression will be true and what will execute if this expression will be false.
Fine. If only one character from new allocated memory is not equal to character from global array our password is bad and we don't get a flag. So if we want to get a flag each character from memory must equal to character from global_array[password[i]]. And now, if we really understand it we can write this if statement in our C code.
Look into the sub_80486D9, you can look into this function in your C code, because it's simpler than assembly code and that's why we rewrote assembly code to C code. In this function we have sub_80485D7();, but we don't know what this called function exactly do. So we have only one way to check it. IDA is our friend! Let's get into sub_80485D7 and try to analyse it. First of all we should click on the name of the function and press Y on a keyboard. IDA will give you this - int __cdecl sub_80485D7(); So this function returns int and doesn't accept any parameters. Amazing. Let's go deeper.
This function have three local variables and of course his own stack frame. And there is a srand function from C. As we can see srand accepts one argument and this argument is seed which is probably time( NULL ). After this var_14 is set to 0. We have to look down and see why is this variable really created.
This is just a loop and var_14 is an iterator. Let's change the name of our variable from var_14 to i. If we take a look at the top of the loop blocks, we can see that our i is compare with 0xFF and it's 255 in dec (you can already change the hex value to dec). Now we should change C code.
We can easily go to analyse what the app do in this loop.
Ok, that's a lot of code here. In cases like this one we don't have to analyse the whole block. We should focus only on the most interesting instructions. So let's look for calls first. We have 4 rand function calls and one called function is created by our analysed file. Sometimes we are lazy and we don't want to do anything, for example in this case we don't want to analyse many rand functions. So we should look for compares. But there is no compares. Look at the end of our block. There are some addresses and pushes. We will try to understand what arguments are placed on the stack before call the function sub_8048802. We have to look only on this part of the code right now:
dword_804A080[] is the address of the global array with chars and app uses this address twice. In first instruction and in add eax, 804A080h. Let's try to solve what values or what type of variables are pushed on the stack. Ask yourself a question - "What is on the ecx and eax register in the moment of pushed?" In the ecx we have the address of global_array[eax], so there is an ascii value of char in global array. We don't care what exactly char it is right now. Ecx register is not changed before push on the stack so we can think about eax. Program moves var_10 to the eax, extends it to 8 bytes, shifts bits to the right for 24 in dec position blah blah blah... This is a waste of time. We have only one interesting instruction which tell us about what is in eax. This is - add eax, 804A080h. 804A080h is the address of first byte of global array of chars and when we add X value to this address we will get X byte from global array. It seems that in eax we have another char from global array. So the function sub_8048802 accepts two chars from global array as her parameters. Let's look into this function and press Y on a keyboard to check what type this function returns and what type are her parameters.
Let's analyse this block of code. We move ascii of first char to eax and after this we put eax into memory space of var_4. var_4 is just first_char so let's change the name of this local variable. We can assume that our two chars are: "A" and "B" so 65 and 66 in ascii. I will do instruction after instruction and I will write comments with values which are in the register. Take a look at the picture below.
We have something new in this function. In the comments you can see something like (DWORD*)(EAX) and instruction mov [eax], edx. What does it mean? It means "move the address from edx to the address of VALUE which is placed in the eax". This is pretty nice, because it explain us what the function really does.
This is separated code from our function where we can see the whole action of the function. First, we have address of ascii of second char in eax. Our char in global array is 'B' so the value of variable ascii_char2 is 66. We move this value to edx. And the next instruction moves address of ascii_char1 to eax register. After that our program moves address of edx to memory space of ascii_char1... That's it! Now we can see that the function changes character from global array with another character from the same array! This is cool. But let's finish our analyse. In eax we will have the address of ascii_char2 and in the edx we will have the address of first_char. And the last instruction means that the function puts address of first_char into memory space of ascii_char2 so this is another and last change between this two characters. Fine. Now we can change the name of this function from sub_8048802 to change_two_chars_from_global_array you can do it by click on the name of the function and press N on a keyboard. After this we should update our C code.
First instruction is important, because it is compare. So our app compares the local variable is_valid_password with 0. If it is 0 we will see "Nope!" on standard output and if is_valid_password is not 0 we will see "Yep!" on standard output and when we see this string we will know that we get a flag. We have to change C code right now.
Now we can just debugg our program or... crack. :) But first we should learn some basics commands. At the beginning of our adventure with gdb, we have to know how to run a program in this debugger. It's very simple, because all we have to do is to write r or run. When we type this the program will execute from the beginning to the end. To exit gdb we need to type q or quit. Let's write this two commands.
But who uses debugger without breakpoints? This is no sense. We love RE and we love low-level programming so I will exaplain what is breakpoint a little bit more technical. Have a look at the picture below.
Let's imagine that there are compiled instructions in this picture. So AB CD EF GH are instructions and the numbers below them are hexadecimals values which represent the instructions. Assume that programer sets the software breakpoint at the address of CD instruction. What will debugger does? Debugger will save 49 into the memory and changes this value for CC. (in hex) CC is the opcode. When processor executes our program instruction after instruction and meets CC, it aborts this execute and waits on the address of CC.
What we can do when we set breakpoint in the instruction? For example we can look into the register and we can look into memory. It sounds great, right? So now you know how useful is gdb. We want to use breakpoint, but where is the best place for this? We need to look into C code of our file and more precisly - into main function.
Which instruction is the best instruction to set the breakpoint? Of course the best instruction is the if statement which is placed on the loop. Everything depends on this. When it is true we get a flag. So we have to look for this C instruction in assembly code. Let's use IDA one more time.
Ok, we found this compare. So now if we want to set a breakpoint at this instruction we will need to get its address. Let's press SPACE on a keyboard and search for our instruction (you can click on the cmp instruction to find it easier). There is our if statement.
On the left side of instruction there is an address in memory. We can use this to set our breakpoint exactly at this instruction. To set the breakpoint just type break *<address of instruction>. Let's try to do it.
Now when we type run our program will stop execution at the instruction where the breakpoint was hit. But there are only words, let's prove it!
Like I said - processor aborts the execute of program at the address of 0x80487AD. I think one of the best option in gdb is looking into process memory. We can look into register in the place where the breakpoint was hit. Let's write info reg $al and we will get information what value is on the al register right now. When you would like to check all of registers you need to type info registers.
al stores 0x7e in hex and 126 in dec. As we agreed, this value represents some char. You have to know that in gdb we can use some C or Python functions, for example print. We are curious and we want to get the mystery char from al register. We can look into ascii table - ok I agree with that, but I think a gdb command will be faster than this way. Let's type print (char) $al.
This char is "~", amazing! gdb is a really useful tool and has got a lot of functions that can be helpful for us in RE in many problems. print was pretty nice, but one of the best command in gdb is find. This awesome function searches a memory for values from user so we can just type this values. And what we can do with that? We can solve our problem! As you remember our analysed file put char from mystery memory which starts in 0x804A03C address into al register and compares this char with char from global_array. This if statement is on the loop and it will be checked as many times as many letters we have in our inputed password. First of all we need to find a char in global_array which is equal to char from al in each itteration of the loop. So we can type something like this - find <start_address>, +<length>, <value>. Length is the last byte in the searched memory. Take a look at this operand in assembly - dword_804A080[eax*4]. This our global array with chars of course, but what is in eax register? In eax register is char as we agreed before. Char is multiplied by 4, because size of int (size of int is 4 bytes, size of char is 1 byte). Our index must be int, right? Ok, so let's assume that our char has 255 value in ascii - this is maximum value of unsigned char (0-255). So global_array has 1020 elements, because of [255*4]. And that's why our length will be exactly 1020. When we understand this, we can finally type our find command. find 0x804A080, +1020, $al.
Great! We found one value in global_array which is equal to value from al register. If the value from al register is equal to value from global_array then the index of value from global_array is an ascii code of char from correct password. The last thing what we are going to do is to get this index and print it as a char and we will repeat this until we have exactly 26th char, because this is the last char of the correct password. This instruction will gives us what we want - print (char) ((<address_of_char_from_global_array> - <address_of_global_array>) / 4). So let's type it and look what we get.
pwn{rc4_j3st_dl4_b13dnych}.
We have solved this task by dynamic analyse, but we could also write a script in Python for example, which could crack the password. But to do this we would have to analyse the sub_80485D7 function which is pretty long and complicated. Maybe I will show how to do it in the future, but now we can enjoy that we did good job!
This article was a quite long, but I hope that some people could learn something from this.
Reverse Engineering Task From Mini Ctf >>>>> Download Now
ReplyDelete>>>>> Download Full
Reverse Engineering Task From Mini Ctf >>>>> Download LINK
>>>>> Download Now
Reverse Engineering Task From Mini Ctf >>>>> Download Full
>>>>> Download LINK 8o