JIT compiler for Brainfuck language - optimization of an interpreter PART 1

Hello guys!

About two weeks ago I started the project which is called "JIT compiler". The JIT compiler is the Just-In-Time compiler which means that it can compile our code faster than the standard compiler. This statement can sound very hard, but you will understand it better when I show you my way to write it in the C language. Now I can tell you that this project is curious for me and I've learned a lot while writing this so I recommend you to create such projects as JIT compiler for example. :) Assuming that you already know what compiler is and what it does I will show you the code of my JIT compiler for Brainfuck language.



The template of the main function looks like this:

This code handles errors - for example, File Not Specified or Brainfuck code does not exist so this is the standard main function from Brainfuck interpreter program. But there are some new methods with new optimizations.

Let's take a look at get_number_of_instructions method first:

This method gives us some protection from the index error. The size of the memory in Brainfuck is exactly 30000 bytes so the above function returns the size of the brainfuck instructions in bytes.

And the if expression from the main function checks if the returned size of the instructions is less or equal to the limit of the allocated memory. Let's see the code responsible for that allocation. the bf_init function is what we are looking for.

When I was writing an interpreter I used an array of integers as a memory and that was an engineering mistake. Why? Because as you know arrays are allocated on the stack and the stack has the constant size. This size is different between operating systems. When we care about the portability of the code we have to know that one OS may have a 1MB size of the stack but another may have 30000 bytes of the stack size. So the solution is very simple because we can allocate the memory for the Brainfuck on the heap! And that is what the above method does. But don't kid yourself - the best part of the whole code is the jit method responsible for all optimization of interpreter code. Let's dive into it.

In the first two parameters, this method accepts an object with the memory of Brainfuck values and the second parameter is the pointer to the Brainfuck code from the text file.

It's logical that when we want to write any compiler we are going to use some machine code. So we need some memory to store those instructions. Our memory will be the vector - same data structure like in C++, but implemented by me in C. All the code of the vector is on the vector.h and vector.c file on my GitHub. So the first of the above instructions is a declaration of that memory to store future machine code. The second line is the initialization of the simple stack data structure which is here to store the addresses of each [ Brainfuck instruction.

When you want to see the code of vector and stack data structures you can take a look at my GitHub.

JIT compiler is all about machine code and executable memory. The simplest example to write JIT compiler is:

1) Take the assembled code from objdump and throw it to the dynamic array.

2) Copy the content of this array to the executable memory

3) Execute machine code from this memory

And we have a great optimization of the code! So as you can see this is not a rocket science. (but you have to admit that the "JIT compiler" sounds pretty scary ;) )

"Take the assembled code from objdump"... ok, but we need to write instructions first. My last article was about the stack frame - the basic element of each function on our programs so I assume that you know what the stack frame is and why it is useful. I used this "structure" in the prologue because our memory with the machine code will be basically a function which processor will execute. So I constructed machine code like the function with stack frame and in addition, we have to save the state of general purpose registers. memory_segment variable in the memory structure is the pointer to all Brainfuck values so we will use it in the machine code to execute instruction correctly. We can simply pass the address of the values structure to our function and by mov , [values+8] get the memory_segment.

So this is the code of prologue:

Since we have the pointer to the memory with the Brainfuck char values we can create all Brainfuck instructions in the machine code. Let's see what we can do:

Above instruction is as simple as possible. inc rcx will basically increment pointer which will point to the next value in the memory. That's all.

< means decrement pointer and this is exactly what this instruction does by dec rcx.

Square brackets in the assembly are the same as *() sequence. So with the above code, it increments the value (size of the value is equal to 1 byte of course) from the pointer which is stored in the rcx register. Simply it is equal to *(memory_segment)++;

This is the same like an incrementing value but now the instruction decrements it. Same as *(memory_segment)--;

As you can see I decided to use Linux syscalls to print the char on the stdout. It can be done by printf function but if you would like to do this you have to pass the address to this function and "tell" the processor to execute it.

If you are not familiar with the Linux syscalls you can read more about it here

The same situation as earlier, but in this case, we want to get char from the user. Ideal solution for that is to use read syscall from stdin. Pay attention to the size of general purpose registers. When we want to move some integers into registers we can use 32-bit sizes of them because the size of the one integer is exactly 32-bits.

Now I recommend you to focus because I will show you how the loops interpretations are implemented. This is the code of the start of the loop which is [ Brainfuck instruction:

As you know Brainfuck loops are really simple but first I will explain you my implementation of the [. So this JIT compiler is all about that direct jumps. This functionality can give us a big advantage over the interpreter. The jump should be executed when the checked memory cell is 0. So we have to type two main instructions - cmp byte [rcx], 0 -> compare the byte from memory cell with 0. je -> and if it's true jump to the offset after ] Brainfuck instruction. And this is exactly the content of our char array. Then this array of two machine code instructions is added to the op_instructions. (which is our vector data structure)
But wait... How to count the offset of ] instruction?! And this is the hardest part of the whole problem. First of all, you need to know that the whole machine code is placed in the executable memory before the execution of the program - and that's why we can update the offset "later". Usually, for counting the offset, we have to know where is the "start" address and the "final" address. Look at this picture:



To count relative offset we have to subtract THE HIGHER address from THE LOWER address and the result will be the offset between them. So that's why we need to push the index (index == address) of the last byte of the je (which is 0x00) machine code onto the stack. We have to save this address in the special place and the stack is the perfect structure for such operations.

These two instructions are clear - cmp byte [rcx], 0, jne (relative_offset) means: When the value from the memory cell is not 0 then jump to the instruction after [. As I wrote above, we need to have the "start" address and "final" address to count the relative offset. (the number of bytes to jump) This is what we have:

start address -> index of the last byte from [ machine code instruction which is on the stack, at the end of the loop
final address -> index of the last byte from ] machine code instruction

Final address is the relocation_site and relocation_table is our stack structure with the addresses of the last byte from the [ machine code instructions. (relocation table) The number of bytes to jump is:

relative_offset = final_address - start_address;

So in our program it is:

relative_offset = op_instructions.size - relocation_site; 

When we already know how many bytes are enough to do the correct jump we can simply update our relative_offsets in the machine code of jmp instructions and after this, the processor will execute our conditional jumps correctly. 

The above methods from the vector.c file are responsible for update the relative offsets. Each relative offset has to be placed in the memory as little endian. I will describe the rest of the files from my GitHub repository in the next article.

The epilogue basically updates the register to the state before the execution of the program, removes the stack frame and jumps to the return address. It is the exit of the JIT compiler.

We have all machine codes in the op_instructions data structure but this is the raw data without any behavior. So we must "tell" the processor to execute these instructions. The processor executes instructions from the EXECUTABLE memory. The above lines of code are responsible for:

1) Creating the executable memory
2) Move the machine code from raw memory to executable memory
3) Execute
When we call the execute_func (casted pointer to the executable memory) the whole machine code will be executed... And this is JIT compiler!

The last instructions clear the heap from the object which we allocated there earlier. This is the protection from the memory leak . (Memory leak)

And this is the big different between linked-list based Brainfuck interpreter and directly jumps based JIT compiler:



Link to the whole code of the program: GitHub

Comments

Popular posts from this blog

Learning of malware analysis. Solving labs from the "Analyzing malicious Windows programs" chapter from the "Practical Malware Anlysis" book

PicoCTF 2018 - Reverse Engineering writeups

Learning of malware analysis. Solving 9-1 lab from the "OllyDbg" chapter. ("Practical Malware Analysis" book)