16/32/64 Bit Assembly Programs

Written by mvuksano | Published 2020/08/01
Tech Story Tags: x86 | assembly | virtualization | programming | hardware | virtual-machine | cpu | memory-management | web-monetization

TLDR We will use radare2 (https://rada.re/n/) to compile a simple program for each of those architectures. The program that we will compile and analyse will be extremely simple. This is so we can focus on important aspects of 16, 32 and 64 bit programs rather than complexities in the programs themselves. To produce a binary we only need to invoke a nasm compiler. To explore the program we will use Radare2 to explore the disassembly. The code for all examples and compiling them is available in GitLab repo: https://gitlab.com/mvuksano/kvm-playground/master/05-assemly-compiling.via the TL;DR App

Ok so before we move on and jump into writing and running 64 bit programs (because all programs today are 64bit these days no? j.k.) let's have a look how to compile a simple program for each of those architectures. This knowledge should also help you better understand how to start a nano VM and "debug" problems that might arise.
To produce binaries we will use nasm compiler and to explore them we will use radare2 (https://rada.re/n/).
Program that we will compile and analyse will be extremely simple. This is so we can focus on important aspects of 16, 32 and 64 bit programs rather than complexities in the programs themselves.
Source code for all examples and compiling them is available in GitLab repo: https://gitlab.com/mvuksano/kvm-playground/-/tree/master/05-assemly-compiling.
Let's jump into it.

16 bit assembly

Let's compile the following program:
BITS 16
mov ax, 'A'
add ax, '0'
mov dx, 0x3f8
out dx, al
mov eax, 0xc000
jmp eax
As you can see, it starts with
BITS 16
directive. This tells compiler that we are want to produce a 16 bit binary.
To produce a binary we only need to invoke
nasm
compiler.
nasm -o a16.bin a16.asm
There are a lot of tools that we could use at this point look into the executable. I'll use radare2 as it comes with some bells and whistles (e.g. nicely formatted and colored disassembly)
The first line (
mov ax, 0x41
) is represented using 3 bytes. b8 is the opcode for mov instruction while the following two bytes is number 0x41 written using two bytes in little endian format.
Next line (
0x83c030
) encodes instruction to add
0x30
to whatever value is in
ax
register and store it back into
ax
. Notice how ax is 16 bit register and the immediate operand (0x30) is 8 bits in size. ADD instruction (
0x83
) has numerous options that customize its operation. In this case we use
0xc0
to tell ADD op to use
ax
register as source and destination.
Following that is
0xbaf803
. This instruction does similar thing to
0xb84100
in the first line. Notice how the second byte changed from
8
to
a
. This is because opcode of this instruction encodes which register is used. In this case this instruction tells CPU to use register
dx
.
0xee
instruction is a simple one. It's a one byte instruction telling our CPU.
Following is
0x66b800c00000
. We've seen
0xb8
before. We also know that
0x00c0
is target of our jump instruction written in little endian format. Two questions that remain are: 1. What is that
0x66
in front of
0xb8
? and what is that
0x0000
at the end of the instruction?
0x66
is part of the instruction which tells CPU to use non-default instruction length. In our 16bit program default length for
MOV
instruction is 16 bits (e.g.
ax
register). Here we are putting value
0xc000
into 32bit register
eax
. In our case
0xc000
can be written as
0x0000c000
. We just added a number of zeros to get a 32bit number. Now if we write
0x0000c000
in little endian format we get
0x00c00000
. This is exactly sequence of digits that we see following
0x66b8
.
0x66ffe0
is the last instruction in our program. 0x66 serves same purpose as in previous instruction - it tells CPU to "switch to" using 32 bit operands.
0xffe0
is opcode for
jmp eax
instruction.
Before we move on to looking at 32bit version of the same program keep two things in mind:
  1. CPU expects operands to be 2bytes in size usually
  2. if we want to use 32bit operand size we need to use opcode prefix 0x66 to tell x86 CPU to do so.

32 bit assembly

Let's have a look at what the same program looks like compiled using 32 bits:
BITS 32
mov eax, 'A'
add eax, '0'
mov edx, 0x3f8
out dx, al
mov eax, 0xc000
jmp eax
You can notice a few changes. First line is directive that tells
nasm
compiler that we want to output 32 bit binary. Also notice that instead of using
ax
as operand we use
eax
.
eax
is name of 32 bit register while
ax
is used to name 16 bit register. As a matter of fact
ax
represents lower 16 bits of
eax
register.
Looking at disassembly we can see that it's very similar.
Another thing worth pointing out is that
0x66
prefix in front of second last instruction is not there any more. In 32bit mode default operand size is 32 bits so there's no need for prefix in this case.
Size of binary has also slightly increased. Instead of being 13 bytes in size its not 15 bytes.

64 bit assembly

Lastly let's look at equivalent 64 bit program:
BITS 64
mov rax, QWORD 'A'
add rax, QWORD '0'
mov rdx, QWORD 0x3f8
out dx, al
mov rax, QWORD 0xc000
jmp rax
Besides
BITS
directive telling compiler that it should output 64 bit program we use
QWORD
to explicitly specify that immediate operands should be 8 bytes in size.
Looking at the disassembly we can see that it grew in size again. This is expected as size of immediate operands is doubled.
Besides the operand size this disassembly looks very similar to 32 bit one. One difference we can immediately notice is that a lot of instructions have
0x48
prefix. This is so called REX prefix. It's available in 64 bit mode only and in this case tell CPU that the instruction should use 64 bit operand size. Keep in mind that some instructions, in 64 bit mode, do not use 64 bit operand by default but instead use 32 bit ones. MOV is an example of such an instruction.

Conclusion

In this article we have taken a look at a very simple program and its representation as 16, 32 and 64 bit binary. I don't expect you will need to deal with assembly code day to day but being familiar with it is very useful when working with hypervisors and VMs. In the early stages of a VMs life there are very few tools for debugging and the best one you have is in knowing what the code does.
In the following article we will modify our VM to switch into long mode and we will execute the 64 bit binary.

References

  1. http://ref.x86asm.net/coder32.html
  2. https://c9x.me/x86/html/file_module_x86_id_222.html
  3. https://wiki.osdev.org/X86-64_Instruction_Encoding#togglelink:~:text=4.1.2%20REX%20prefix

Written by mvuksano | PSS - Pragmatic problem solver @ Facebook
Published by HackerNoon on 2020/08/01