I have been working with C all of my professional and student life. There have been times when I had to look a little deeper to understand what is going on with my buggy program.
During my experience, I have learned that there are many tools and techniques that one can use to examine an executable and this post is about that. Furthermore, I will cover a bit about how to employ some reverse engineering practices.
As an example, I wrote a pretty basic piece of code with some intentional inclusions.
There are two global variables msga
and msgb
.
Two user-defined routines allow
and deny
get executed inside the main
function.
One conditional call to an external program is executed using execvp
.
The idea here is to examine the executable this program creates. Find out where the code I wrote lands in the executable and what compiler adds on top of it.
Later I'll showcase some basic reverse engineering that can be done by pretending we haven't seen the code.
#include <stdio.h>
#include <unistd.h>
char *msga = "Allow";
char *msgb = "Deny";
void allow() {
printf("%s\n", msga);
}
void deny() {
printf("%s\n", msgb);
}
int main(int argc, char **argv) {
deny();
int runExternal = 0;
if (runExternal) {
char* lsargs[] = {"ls", "-l", NULL};
execvp("ls", lsargs);
}
}
While dealing with executable codes we'll encounter a lot of hexadecimal values-- I prefer using Python to do quick arithmetic whenever the need arises.
Also, some of the output is going to be too big to paste here in the post so I'll link them in the end.
Let's compile the program and get our a.out
.
gcc examinebin.c
As we see in the code the execution of the program basically runs the deny
and halts.
The goal I am creating for myself is that I'll identify the instruction inside the executable and change it to make sure that allow
is called and then 'lsis executed with
-a` argument.
If we look at the objdump
output, it is very neatly divided into segments and clearly labeled with symbol names. The code we are interested in is the one we wrote but it's nice to know what everything else is.
The short version is every C program needs a main
routing that marks the start and end of user written code.
C runtime executes main
within its framework and takes care of all static and runtime dependencies.
The order of execution can be determined very easily by hooking up the executable with gdb
and adding a breakpoint to all symbols defined in .text
section and _init
& _fini
.
Let's see what happens.
breakpoint : _init, _start, deregister_tm_clones, register_tm_clones, __do_global_dtors_aux, frame_dummy, allow, deny, main, __libc_csu_init, __libc_csu_fini, _fini
Below is the call sequence labelled by me based on my understanding of the usual meaning of these symbols:
// Initialisation
_init (argc=1, argv=0x7fffffffdfd8, envp=0x7fffffffdfe8)
_start ()
__libc_csu_init ()
_init ()
frame_dummy ()
register_tm_clones ()
// User Code
main ()
deny () // we want to call allow and execvp here instead
// Deconstruction and finalisation
__do_global_dtors_aux ()
deregister_tm_clones ()
deregister_tm_clones ()
_fini ()
Now that we know what we don't have to explore we can focus on the task at hand, calling to allow and ls
with -a
.
To do that we will specify our goal properly, basically, we want to:
Call allow instead of deny.
Change runExternal
flag value to non-zero.
Change "-l"
to "-a" in
lsargs`
To do that we have to know where these values are in binary and then change them manually without disturbing everything else.
deny
The hexadecimal code calling deny
from objdump
output
0000000000001189 <allow>:
00000000000011a3 <deny>:
00000000000011bd <main>:
11e4: e8 ba ff ff ff callq 11a3 <deny>
11e9: c7 45 dc 00 00 00 00 movl $0x0,-0x24(%rbp)
From the callq reference, we know that opcode e8
takes the operand ba ff ff ff
(0xffffffba) which is basically the offset from the next instruction 0x11e9
. So, it should point to (0x11a3).
offset = hex(0xffffffba - 0x100000000) # getting the negative value
deny_addr = hex(0x11e9 + int(offset, 16))
print(deny_addr)
To call allow
instead, we will have to change (0xffffffba) to something that gives (0x1189) instead.
allow_addr = hex(0x1189)
offset = hex(int(hex(int(allow_addr, 16) - 0x11e9), 16) + 0x100000000)
print(offset)
# 0xffffffa0 -> a0 ff ff ff
So all we need to do is change ba
to a0
in the binary.
runExternal
This is quite simple, all we need to do is locate the mov
instruction that is putting the value in the flag.
11e9: c7 45 dc 00 00 00 00 movl $0x0,-0x24(%rbp)
Then change the value to any non-zero one. ref
00 00 00 00 -> 01 00 00 00
"-l"
We basically want to change the arguments going into execvp
function call.
In the assembly, we can see the location where the callq
to execvp
has been made and there should be push
or lea
instruction before that to add the argument into the stack.
Since these values are hardcoded in binaries all we need to do is get the location of -l
and change it to -a
.
11f6: 48 8d 05 12 0e 00 00 lea 0xe12(%rip),%rax # 200f <_IO_stdin_used+0xf>
11fd: 48 89 45 e0 mov %rax,-0x20(%rbp)
1201: 48 8d 05 0a 0e 00 00 lea 0xe0a(%rip),%rax # 2012 <_IO_stdin_used+0x12>
1208: 48 89 45 e8 mov %rax,-0x18(%rbp)
121b: 48 8d 3d ed 0d 00 00 lea 0xded(%rip),%rdi # 200f <_IO_stdin_used+0xf>
1222: e8 69 fe ff ff callq 1090 <execvp@plt>
The lea instruction is basically calculating the effective address which in every case here is an offset to the next instruction pointer.
So we have three addresses, which can be calculated or seen in the objdump
output as well.
print(hex(0xe12 + 0x11fd)) # 0x200f
print(hex(0xe0a + 0x1208)) # 0x2012
print(hex(0xded + 0x1222)) # 0x200f
From the hexdump
output we can clearly see that our strings are really there.
00002000: 0100 0200 416c 6c6f 7700 4465 6e79 006c ....Allow.Deny.l
00002010: 7300 2d6c 0000 0000 011b 033b 5400 0000 s.-l.......;T...
Changing the fourth byte from the right 6c -> 61 will make l->a
.
Let's summarize and do all the necessary changes to the text output provided by xxd
utility.
Changes for allow
000011e0: 0000 0000 e8(ba) ffff ffc7 45dc 0000 0000
000011e0: 0000 0000 e8(a0) ffff ffc7 45dc 0000 0000
Changes for runExternal
flag
000011e0: 0000 0000 e8ba ffff ffc7 45dc (00)00 0000
000011e0: 0000 0000 e8ba ffff ffc7 45dc (01)00 0000
Changes for l -> a
00002010: 7300 2d(6c) 0000 0000 011b 033b 5400 0000
00002010: 7300 2d(61) 0000 0000 011b 033b 5400 0000
Using xxd
Utility
xxd -r modified-xxd.txt > a2.out
Change Permission
chmod +x a2.out
I can tell you that it actually works but it's better to try yourself. The output is:
Deny
Allow
. .. a.out a2.out
Tools
Let’s review some tools that tell us about the file from the outside.
File
Utility that gives the file name, file type, and other format-related information.
Sum
Get the checksum and number of blocks in the file. Once we do some reverse engineering this output will tell us that the new executable is not genuine.
ldd
Gives the list of shared objects required by the executable.
There are some utilities that give a quick peek about the executable if an in-depth examination is not something you need.
strings
Displays all printable characters and strings in the file. Works on any file, not just executable.
nm
Lists all the symbols present in the executable file address map.
Now comes the in-depth analysis of executable, this includes interpreting the machine code into human-readable form and also figuring out a way to edit the file.
objdump
Using the -d
option you can get the detailed version of each section and segment of your executable along with the interpreted assembly instruction.
xxd
or hexdump
These are plain read-write tools to deal with binary files and not just executables.
The reading part creates a text file showing hexadecimal values at each byte and if possible there is a printable version side by side.
Any changes to this output text file can be fed back to the tool, which can then create a binary file.
I am using xxd
for reading and writing the executable here.
Also published on: https://makeall.dev/notepad/examine-executable/