In the previous set of articles we have worked our way through configuring a vCPU and getting it to run in 32bit mode with paging enabled. In this article we will take it a step further and enable 64bit mode. Before we can run a CPU in 64bit mode we need to "reconfigure" our page tables. Don't worry - this exercise should be a walk in a park compared to the previous one (enabling paging and switching CPU to 32bit mode). 64bit vs 32bit page tables Before we dive into implementation lets have a look how is paging different in 64bit mode to the one used in 32bit mode[1]. In 32bit mode we used three level paging scheme. The tables were called 1. Page Directory Table (PDT) 2. Page Directory (PD) 3. Page Table (PT) We used one part of virtual address (VA) in combination with CR3 to locate an entry in PDT called PDTE. Then we used PDTE in combination with another part of VA to locate an entry in PD (PDE). From there we went on and used yet another part of VA in combination with PDE to locate an entry in PT (PTE). And finally we combined PTE with VA to come up with physical address (PA). In summary, we traversed table structures in this order: . PDT -> PD -> PT -> PA (Physical Address) In 64bit mode x86 uses 4 level paging scheme in which tables have the following names: 1. PML4 (Page Map Level 4 Table) 2. PDP (Page Directory Pointer) 3. PD (Page Directory) 4. PT (Page Table) In this scheme tables are traversed in order . PML4 -> PDP -> PD -> PT Also keep in mind that in long mode page table contains 8 byte entries compared to 4 byte entries in 32 bit mode. That means each of the tables can be up to 2^9 * 2^3 = 4096 (or 0x1000) bytes large The following image from Intel System Programming Guide best illustrates the process. that works through an example where 0xc000 is identity mapped. Here you can find an illustration Create page tables Firstly, let's change our page tables. Instead of using uint32_t byte which is 4 bytes in size we will use uint64_t. In this example we have 0x10000 (65536) bytes that are allocated for the VM. We will identity map these area. The addresses that will be used for tables are 0x1000, 0x2000, 0x3000 and 0x4000 for PML4, PDP, PD and PT respectively. First three tables will have only one page that points to the following one. That means that PML4 will have one entry pointing to PDP. PDP will have one entry pointing to PD. PD will have one entry pointing to PT. Reason is that each of these tables can address up to 2^9 = 512 entries. This means we need only one PT which will host our 16 entries (0x10000 / 0x1000 = 0x10 = 16). { pml4e = | ; (mem, &pml4e, ); pdpte = | ; (mem + , &pdpte, ); pde = | ; (mem + , &pde, ); pte_1 = | ; (mem + , &pte_1, ); pte_2 = | ; (mem + , &pte_2, ); pte_3 = | ; (mem + , &pte_3, ); pte_4 = | ; (mem + , &pte_4, ); pte_5 = | ; (mem + , &pte_5, ); pte_6 = | ; (mem + , &pte_6, ); pte_7 = | ; (mem + , &pte_7, ); pte_8 = | ; (mem + , &pte_8, ); pte_9 = | ; (mem + , &pte_9, ); pte_10 = | ; (mem + , &pte_10, ); pte_11 = | ; (mem + , &pte_11, ); pte_12 = | ; (mem + , &pte_12, ); pte_13 = | ; (mem + , &pte_13, ); pte_14 = | ; (mem + , &pte_14, ); pte_15 = | ; (mem + , &pte_15, ); pte_16 = | ; (mem + , &pte_16, ); } void createPageTable ( *mem) void uint64_t 0x2000 0x3 memcpy 8 uint64_t 0x3000 0x3 memcpy 0x1000 8 uint64_t 0x4000 0x3 memcpy 0x2000 8 uint64_t 0x0000 0x3 memcpy 0x3000 8 uint64_t 0x1000 0x3 memcpy 0x3008 8 uint64_t 0x2000 0x3 memcpy 0x3010 8 uint64_t 0x3000 0x3 memcpy 0x3018 8 uint64_t 0x4000 0x3 memcpy 0x3020 8 uint64_t 0x5000 0x3 memcpy 0x3028 8 uint64_t 0x6000 0x3 memcpy 0x3030 8 uint64_t 0x7000 0x3 memcpy 0x3038 8 uint64_t 0x8000 0x3 memcpy 0x3040 8 uint64_t 0x9000 0x3 memcpy 0x3048 8 uint64_t 0xa000 0x3 memcpy 0x3050 8 uint64_t 0xb000 0x3 memcpy 0x3058 8 uint64_t 0xc000 0x3 memcpy 0x3060 8 uint64_t 0xd000 0x3 memcpy 0x3068 8 uint64_t 0xe000 0x3 memcpy 0x3070 8 uint64_t 0xf000 0x3 memcpy 0x3078 8 The above code is not the most efficient way to do memory mapping. Normally you would populate tables using loops but because goals of these examples is teaching the code is left verbose on purpose. Enable long mode To enable long mode we need to do three more things. 1. Set PAE bit in CR4 2. Set LMA bit in EFER register. 3. Set L bit in CS To set those registers we will read special registers using KVM_GET_SREGS ioctl. Setting PAE bit is straightforward. We read current value and xor it with 0x20 (bit 5 set to 1). sregs.cr4 = sregs.cr4 | ; 0x20 Setting LMA bit in EFER comes with a caveat. Most documentation will tell you that need to set bit 9 in EFER register to enable long mode. That is true when working with bare metal (no VM). For a VM to enter long mode you need to set bit 9 and 11. To understand why take a look at description of EFER register in "Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 4: Model-Specific Registers". You can see that when 11th bit is set it tells CPU that long mode (IA-32e) is enabled. When working with a CPU you can enable long mode but then you should check if CPU really enabled it. When working with VM setting bit 9 without setting bit 11 will not even work. Now we can use the same technique that we used with CR4 register. This time we will be xor-ing with value 0x500 (bit 9 and 11 set). sregs.efer = sregs.efer | ; 0x500 Lastly we need to set L bit in CS (code segment) register. It is sometimes misunderstood how code segments are used in IA-32e mode. Specifically it's often thought that they don't exist or are not used. Truth is that some fields are ignored (e.g. base address and limit fields), some are treated as 0 in some calculations and the remaining bits are used normally. Code segment descriptors and selectors are needed in IA-32e mode to and execution privilege level. establish processors operating mode What is relevant in our case is that setting bit L of the CS register and having IA-32e mode active means that processor starts using default address size of 64 bits and default operand size of 32 bits[R2]. In case we do not set this bit out processor will operate in 64 bit compatibility mode. sregs.cs.l = ; 0x1 And that's it. Now our VM is able to run 64 bit programs. Run program Finally, we need to compile a program that we can run. In my previous article I've spoken about differences between 16/32/64 bit programs and how to compile for each architecture. Let's use two toy programs in this example too. The first program will take two numbers in registers rax and rbx, compare them and output either N or Y depending whether the numbers are equal or not. From there it will jump to a different program (b.asm) that is located at memory address 0xc000. That program will output E (for end) and halt the machine. , , , , , .equal , , , , , , , , , , , ;a.asm BITS 64 mov rax 0x100000000 mov rbx 0x200000000 add rax rbx mov rbx 0x200000000 cmp rax rbx jz mov rax 'N' mov edx 0x3f8 out dx al mov rax 0xc000 jmp rax .equal: mov rax 'Y' mov edx 0x3f8 out dx al mov rax 0xc000 jmp rax ;b.asm BITS 64 mov rax 'E' mov edx 0x3f8 out dx al hlt Each of the programs can be compiled using : nasm nasm -O0 -o a.bin a.asm nasm -O0 -o b.bin b.asm The tells to not use any optimizations. This is to ensure that we run 64 bit instructions rather than potentially optimizing out some instructions and using the same ones as in 32 bit mode. During normal operation you would let compiler optimize the code but in this case we want to be sure the code that we run is 64 bit and that vCPU is running in true long mode (IA32e / 64 bit mode). -O0 nasm Finally, if we run our program we will be presented with the following output: Conclusion This brings us to the end of yet another challenge. Our VM can now run 64 bit programs. You can already use it to run a lot of useful programs. To make it even more useful, in the next article we will look into exception handling. Stay tuned! Notes: 1. Technically there is one more variation of paging in 32bit mode in addition to the one we used in previous article. You can check out chapter on paging from Intel x86 manual [R1] References: [R1] Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 3 (3A, 3B, 3C & 3D): System Programming Guide [R2] Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3 (3A, 3B, 3C & 3D): System Programming Guide - 5.2.1 Code-Segment Descriptor in 64-bit Mode