avatarZaid Khaishagi

Summary

This context provides a comprehensive Ghidra tutorial that guides users through reverse engineering various C programs, including loop analysis, function calls, user input handling, and identifying a logic vulnerability in a game.

Abstract

The Ghidra tutorial series continues with a focus on using the tool for more complex programs beyond a simple "Hello World." It covers the analysis of loops, including both while and for loops, and demonstrates how Ghidra interprets these structures in decompiled code. The tutorial also guides users through the process of examining function calls, particularly how Ghidra represents them and allows for renaming for clarity. User input is explored by analyzing a program that requests and processes user data, with attention to buffer handling and input validation. Finally, the tutorial culminates in a practical example of reverse engineering a simple game to uncover a logic vulnerability, illustrating how Ghidra can be used to understand and exploit flaws in software logic. Throughout the tutorial, the importance of variable renaming, function identification, and code flow analysis is emphasized to aid in the comprehension of decompiled binary code.

Opinions

  • The author believes that Ghidra's ability to decompile binary code and make educated guesses about the original source code is a powerful feature for reverse engineering.
  • It is implied that understanding the decompiled code and being able to rename variables and functions significantly enhances the analysis process.
  • The tutorial suggests that recognizing patterns and making assumptions based on coding experience is crucial when working with partial information in reverse engineering.
  • The author conveys that reverse engineering can reveal logic vulnerabilities, demonstrating the importance of such analysis in software security.
  • There is an opinion that Ghidra's graph views for code flow, function calls, and block flows are valuable tools for understanding program structure and behavior.
  • The tutorial seems to assume that readers have a basic understanding of C programming, as it does not explain fundamental programming concepts.
  • The author highlights the quirks and limitations of decompiled code, advising that not every line needs to be understood thoroughly for effective analysis.

GHIDRA TUTORIAL: USAGE

This is a continuation of the Ghidra Tutorial series. In the previous article, we discussed what Ghidra is and what it is used for. We went over installing Ghidra, doing the initial setup and then using Ghidra to analyse a simple Hello World program that we wrote and compiled ourselves.

In this part, we will discuss using Ghidra for more complex programs that have more logic implemented than simply outputting a line of text to the user.

Analysis

We will be analysing multiple example programs which have different functionalities and logic implemented in them. We will go over things like loops, using format strings, function calls, making use of user input and then, in the final example, we will use Ghidra to identify a logic vulnerability in an example program.

Loop

The first example we will look at is a program that uses loops to count numbers and print them out using format strings (the printf function).

To begin, write a program which uses a while-loop and a for-loop to count numbers up to 10. You can copy the provided source code into a file and save it using an appropriate name such as loop.c.

#include<stdio.h>
// Program for loop and fstring
int main(int argc, char** argv) {
int counter = 0;

// while loop
while (counter < 10) {
    printf("Counting... %d \n", counter);
    counter++;
}

printf("Counted to %d using while-loop...\n", counter);

// for loop
for (counter = 0; counter < 10; counter++) {
    printf("Counting... %d \n", counter);
}

    printf("Counted to %d using for-loop...\n", counter);

return 0;
}

Then, to produce an executable binary, you will have to compile it. Use the following command from the terminal to compile it. You can also directly download the file executable binary from here: loop.out

$ gcc -g loop.c -o loop.out -no-pie

Now, let’s execute the program and see what output we get…

$ ./loop.out 
Counting... 0 
Counting... 1 
Counting... 2 
Counting... 3 
Counting... 4 
Counting... 5 
Counting... 6 
Counting... 7 
Counting... 8 
Counting... 9 
Counted to 10 using while-loop...
Counting... 0 
Counting... 1 
Counting... 2 
Counting... 3 
Counting... 4 
Counting... 5 
Counting... 6 
Counting... 7 
Counting... 8 
Counting... 9 
Counted to 10 using for-loop...

Next thing to do is to open this binary file in Ghidra’s Code Browser using the instructions described in the previous article of this series and then analyse the file.

Have a look in the right pane showing the decompiled code. Here, we see that the function being used to print the text (string) to the user is printf instead of puts which was automatically replaced in the Hello World example we saw previously. You can also confirm this in the disassembly view in the centre pane. This is because puts does not deal with format strings which are those strings that can be formatted by placing the value of variable in them when printing, such as the %d being replaced with the value of the counter variable.

Notice the two loops that we used. The first loop that we coded was a while-loop and the second loop we coded was a for-loop. In the decompiled view, it shows both loops as for-loops. When analysing the binary, Ghidra determined that the assembly code for the loop was similar to how for-loops look and so it decompiled the while-loop as a for-loop. This is an example of how Ghidra makes a best guess of what the original source code for the binary might have looked like; it does not extract the actual source code out of the binary. The functionality in this for-loop, however, is identical to the while-loop we used in the actual source code.

You may have also noticed that the variable name counter we used is not present in the decompiled view. Instead, it shows a variable named local_c being used; the exact name may be different for you. If we read through the decompiled code, we can understand that this variable is being used as a counter in both loops. One thing that Ghidra allows us to do is to rename variables in the decompiled code. This greatly assists in analysing binaries because it helps keep track of where and how which variables are being used. When reading through the decompiled code for any binary, when you have an idea of what the variable’s role is in the program, you can rename it to better reflect this role. You can do this by right-clicking on the variable name and choosing ‘Rename Variable’. Try setting the variable name to count.

So, we see that we were able to fully understand what the program is doing by analysing the decompiled code that Ghidra produced. We were also able to change the variable name to better help us understand the code.

Function Call

Our second example deals with defining a function that we call from within another function, in this case, it’s the main function which calls a secondary function. We will write the source code, compile it and then analyse it in Ghidra.

The source code we are using for this example looks like the code given below. You can copy it into a file and save it using an appropriate name such as function.c.

#include<stdio.h>
// Program for function call

void quit() {
puts("\nWrapping up...\n");
puts("\tGoodbye!\n");
}

int main(int argc, char** argv) {
int counter = 0;

while (counter < 10) {
printf("Counting... %d \n", counter);
counter++;
}

printf("Counted to %d \n", counter);

quit();
return 0;
}

Next, compile it and run the executable binary using the below commands.

$ gcc -g function.c -o function.out -no-pie
$ ./function.out
Counting... 0 
Counting... 1 
Counting... 2 
Counting... 3 
Counting... 4 
Counting... 5 
Counting... 6 
Counting... 7 
Counting... 8 
Counting... 9 
Counted to 10 
Wrapping up...
        Goodbye!

Now, open and analyse the binary in Ghidra.

You will notice that the decompiled code is almost the same as the source code. Let’s replace the local_c variable name with counter. The decompiled code shows that the main function runs a for-loop to count numbers to 10 and then calls the function quit. However, we don’t see the decompiled code for this function at the moment.

In order to see the decompiled code for the quit function, we can double-click on the function call in the decompiled code of main. This will take us to the location of the quit function in the disassembly and also show us its decompilation. You could also go to the quit function using the symbol tree but it’s actually more useful to do this by following the function by double-clicking its function call because in stripped binaries without symbols, the functions names are not present. So, it’s not always clear which function is which. Using this approach allows you to analyse the function while knowing where it is used and with what arguments, which helps to more easily understand what the purpose of the function is.

Following the quit function by double clicking its function call in main, we see its decompilation.

We see that it uses puts to output some text to the user and then returns. Since this is at the end of main, it will also return after this function call and then end the program.

Another important feature to make use of while analysing binaries is that you can rename functions in Ghidra. This is useful because, after analysing functions and understanding what they are doing and used for, you can give it a more appropriate name so that when you encounter the same function elsewhere being called in different locations and with different arguments, you can easily understand what is happening. This is, again, very useful while analysing stripped binaries.

Let’s try changing the name of this function to end_message. To do this, right-click on the function name and select ‘Rename Function’. In the prompt, enter the new name end_message and leave the other options on their defaults. You should see the name of the function change in Ghidra, in the decompiled view, the disassembly and the symbol tree.

If we want to check all the places where the function is used, we can do that by looking for the references to the function. This is useful for a few things. Firstly, it helps with a better analysis of the function. By looking at the different places it is referenced such as places where it is called and the arguments that are passed to it, we can get a better understanding of how it is being used and what its role and purpose is. This becomes more useful when the function itself is complicated, Ghidra’s decompilation is confusing or if the function is used for seemingly multiple functionalities.

Let’s try to look for references and use that to go back to the main function. To do this, right-click on the function name and select ‘Reference’ and then ‘Find References to end_message’. You can also do this using the function call in other places, it does not have to be using the decompiled function definition. You will be presented with a new window that shows all the references to the function. You should see ‘CALL end_message’ as one of the code units where it is referenced. This ‘CALL end_message’ code unit is actually the disassembly code unit where it is being called. If you click on it, you will be taken to the location where it is called. In this example, this will take you back to where it is called in the main function. You should also notice that the function that was named quit is now end_message even inside the main function.

One more utility that Ghidra provides is the ability to view graphs for code flow, function calls and block flows. These can be viewed from the top menu bar and going to Graph > Code Flow, Block Flow or Calls. I highly recommend exploring these on your own.

User Input

In this next program we will analyse, the program requests the user to input their name. It then receives the input up to a certain length and stores this input into a buffer. After that, it removes the trailing newline character from the input. It then uses this to output a message addressing the user by the provided name.

The source code for the program is as follows. You may copy this into a file and save it using a name such as name.c

#include<stdio.h>
#include<stdlib.h>
#include<string.h>

// User inputted name used in output fstring
int main(int argc, char** argv) {

char name[30];
size_t buflen = sizeof(name);
int inplen;
printf("Please enter your name:\n");
fgets(name, buflen, stdin);

inplen = strlen(name);

// remove newline
if (name[inplen-1] == '\n')
name[inplen-1] = '\0';

printf("Hello, %s! Have a good day!\n", name);
return 0;
}

Compile it and then run it using the below commands.

$ gcc -g -no-pie name.c -o name.out
$ ./name.out 
Please enter your name:
John Doe 
Hello, John Doe! Have a good day!

Let’s open the binary in Ghidra and analyse it.

Let’s try to analyse the program only in Ghidra first and then afterwards compare it to the source code. We see that the main function has a few variables declared at the beginning which are not named. There seems to be something being done with the variable local_10 which isn’t immediately clear, so let’s skip that line. Next, we see the program printing a string asking the user to give an input. After that, it calls the function fgets to get an input string from the user. If we look at the documentation for fgets, we see that the first argument local_38 is where to store the string, the second argument 0x1e is the length to take as input and the third argument stdin is where the input is coming from. So, we can say that local_38 is the input string buffer which is also supported by looking at the variable declaration which is an array of characters. Let’s rename this to input_buffer. The second argument is 0x1e which is the hexadecimal representation of 30. The third argument stdin refers to the standard input which is the standard stream for input from the terminal.

In the next line, we see that the function strlen is being used to get the length of the string stored in the buffer; the documentation is here: strlen. The result is stored into the variable sVar1; let’s rename this variable to input_length. After that, we see that there is a condition check to see if the last character is a newline character and if it is, to change it to a null character to terminate the string. Notice that the way the code indexes to the last character is input_length + -1. This is a quirk in how Ghidra analysed the program. We then see printf being used to print out a line with the user’s input.

At the end of the main function we again see local_10 being used in a condition check. If the condition is true, then it will call the function __stack_chk_fail. Analysing the condition tells us that it calls this function if the local_10 assigned at the beginning is not the same at this point, then it will call the function. The function is not clear about what it is doing, so we can skip it. We have already understood the primary functionality in the main function that we can observe in the terminal when we run the program. Skipping over complicated decompiled code is something that can be useful when analysing binaries because every single line of code does not need to be understood thoroughly as long as the bigger picture is well-understood. Sometimes some code may not even be accurate to the program because, remember, the decompiled code is an approximation of the source code as analysed by Ghidra. In this case, the code is accurate because we can see the function __stack_chk_fail being called in the disassembly as well, but it is not relevant because we can understand the primary functionality without it.

Even though we do not need to understand the __stack_chk_fail function and the local_10 variable, they are actually part of the stack canary check. The variable local_10 is assigned at the beginning and then checked again at the end to see if it was corrupted. If it was, then the program detects it and immediately aborts the entire program using the __stack_chk_fail function to prevent malicious interactions. This was placed there by the compiler even though we did not code it in the source code.

Now, let’s compare it with the source code. For the most part, the decompiled code is the same. One notable difference is the variable declarations. The decompiled code omits the buflen variable we used to store the size of the buffer. Instead of storing the buffer length in a variable, it directly uses a literal value of 0x1e in the fgets function call. Another notable thing is that the buffer is of length 40 instead of 30 which we defined in the source code. This is most likely because of the compiler aligning the buffer length on the stack.

So, we see that we were able to analyse the functionality of an interactive program using Ghidra without needing access to the source code. When we compare it with the source code, we see that it is fairly accurate with some compiler introduced differences.

Fibonacci Game

In our final example, we will look at a simple game where the user has to answer according to the fibonacci series. The user’s score starts at 5 and increases by 1 at every correct answer and decreases by 1 at every wrong answer. The goal of the game is to reach the winning score of 4294967290. The idea is that the user is allowed to be wrong 5 times, which is why we give the user a starting score of 5 points.

We will analyse the program using Ghidra and then find a vulnerability in it which will allow us to win the game without properly playing it.

Analysis

Let’s start by writing the source code, compiling it and doing a test run of the program.

// fibonacci game
#include<stdio.h>

int main(int argc, char** argv) {
unsigned int score = 5, winning = 4294967290;
unsigned int num1 = 0, num2 = 1, inp = 0;

puts("Welcome to the Fibonacci Game!");
printf("You have to keep answering correctly to reach the winning score %u\n", winning);
printf("Your score starts at %d. Good Luck\n\n", score);

while (score != winning) {
printf("### Score: %u\n", score);
printf("Enter the answer, %d + %d = ?\n", num1, num2);
if (scanf("%d", &inp) == 1 && num1 + num2 == inp) {
num1 = num2;
num2 = inp;
inp = 0;

score++;
continue;
}
else {
puts("Incorrect answer, your score is reduced\n");
score--;
}
}

printf("Congratulations, you win! Your final score is %u\n", score);

return 0;
}
$ gcc -g -no-pie fibonacci.c -o fibonacci.out 
$ ./fibonacci.out 
Welcome to the Fibonacci Game!
You have to keep answering correctly to reach the winning score 4294967290
Your score starts at 5. Good Luck

### Score: 5
Enter the answer, 0 + 1 = ?
1
### Score: 6
Enter the answer, 1 + 1 = ?
2
### Score: 7
Enter the answer, 1 + 2 = ?
3
### Score: 8
Enter the answer, 2 + 3 = ?
0
Incorrect answer, your score is reduced

### Score: 7
Enter the answer, 2 + 3 = ?
0
Incorrect answer, your score is reduced

### Score: 6
Enter the answer, 2 + 3 = ?

We see that the program asks the user to enter the correct answer. Every correct answer increases the score by 1 and every incorrect answer decreases the score by 1.

Let’s now open it and analyse it in Ghidra.

Looking at the decompiled code, we see that it starts by declaring and initialising a few variables. We will leave these for now and address them as they get used in the program. The next thing we see is that it prints out a few lines to the user which describe what the game is and what the winning score is.

After that, it starts a while loop where the condition is local_20 != local_14. If we look at local_14, it is initialised to 0xfffffffa which, when we convert it to decimal, is 4294967290. We know from the printed output that this is the winning score the user needs to reach. So, let’s rename this variable to winning_score in Ghidra.

Now, reading the code inside the while-loop, we see that the first line prints out the user’s current score using printf. The variable it uses is local_20, so we can assume that this variable keeps track of the user’s score. Let’s rename this variable to current_score in Ghidra. This variable is of type uint. After this, the program presents the user with the question that they need to answer correctly. The program then gets the input from the user using a function called __isoc99_scanf. Now, we don’t know what this function exactly is, but we can see that it has scanf as part of its name. We can make an assumption based on that that it may be related to the regular scanf function or one of its variants.

To understand what this line is doing, let’s look at its arguments. The first argument is labelled as &DAT_004020c7. If we double-click on this, we can follow it to its address. We see, in the disassembly pane, that it takes us to a location which has “%d” in it. If we look in the adjacent region, we see that there are other strings such as “Enter the answer, %d + %d = ?\n” and “Incorrect answer, your score is reduced\n”. This supports our analysis that it is a string.

So, we can safely assume that the first argument is “%d”. Now for the second argument, we see that it is &local_24 which was declared at the start as a uint type. We can say that this function is getting an integer using scanf and storing it in local_24. The resulting value is assigned to iVar1. If we look at scanf documentation, we see that the return value is the number of items matched. In the next line, this is checked iVar1 == 1. So, we can say that it is checking to see if the number of matched items is exactly one or not. Let’s rename these to something more appropriate; local_24 can be renamed to user_input and iVar1 can be renamed to inp_result.

The other condition being checked is local_1c + local_18 == user_input, which is similar to the question that the user is presented with. Inside the condition block, we see that if the condition matches, it adjusts the values in a way that matches the fibonacci sequence and resets user_input to 0; after this, it increments the user’s score. Based on this, we can say that these two variables are storing the 2 most recent numbers in the fibonacci sequence. So, let’s rename them to number1 and number2 respectively.

If the conditions do not match, it goes into the ‘else’ block. Here, it prints a message that the answer was incorrect and decrements the score.

Now, coming out of the while-loop, we see that when the loop ends when the score reaches the required amount, it prints a message to the user for the winning condition. After this, the program ends.

Vulnerability

In our analysis, we saw that current_score is of a uint type. This means that it is an unsigned integer and cannot be negative. We also saw that the program decrements the current_score when an incorrect answer is given. However, the program does not actually contain any losing state nor any check for whether the score has reached 0. What do you think will happen if the user gives more than 5 incorrect answers at the beginning? Go ahead and try this using the binary you compiled.

$ ./fibonacci.out 
Welcome to the Fibonacci Game!
You have to keep answering correctly to reach the winning score 4294967290
Your score starts at 5. Good Luck

### Score: 5
Enter the answer, 0 + 1 = ?
0
Incorrect answer, your score is reduced

### Score: 4
Enter the answer, 0 + 1 = ?
0
Incorrect answer, your score is reduced

### Score: 3
Enter the answer, 0 + 1 = ?
0
Incorrect answer, your score is reduced

### Score: 2
Enter the answer, 0 + 1 = ?
0
Incorrect answer, your score is reduced

### Score: 1
Enter the answer, 0 + 1 = ?
0
Incorrect answer, your score is reduced

### Score: 0
Enter the answer, 0 + 1 = ?
0
Incorrect answer, your score is reduced

### Score: 4294967295
Enter the answer, 0 + 1 = ?

You’ll see that if the score decreases below zero, it changes into a very big number. This is because of something called an integer underflow. When an unsigned integer goes below 0, it loops back to the highest value that can be represented with an unsigned integer. In this case, this is 4294967295. If you continue decreasing your score, you will eventually end up at the winning score and win the game without playing it properly.

So, we were able to find and exploit a logic vulnerability in the program by reverse engineering the program in Ghidra.

Conclusion

In conclusion, we were able to reverse engineer a variety of example programs. We analysed programs with different types of loops, used format strings for printing output, performed function calls, received user input and used it in its functionality, implemented a simple game and finally we were able to find a logic bug in a program and exploit it to subvert the rules of the game.

Reverse engineering relies on being able to identify and recognise functionality using partial information. One of the things that is important is to make educated guesses about the program. For example, when analysing a program, it is good to use your previous coding and software development experience to form a rough picture of how a program may have been developed, what the source code might look like, and so on. We can then analyse the program to get more insights and also check if our assumptions were either correct or not using Ghidra and other tools.

Cybersecurity
Reverse Engineering
Recommended from ReadMedium