So far we have seen how to start up a VM, emulate CUPID instruction, run arbitrary and learned about segmentation and paging in a CPU. This already allows us to do a lot of useful things with our VM. However, our CPU is in so called real mode. In this article we will go through setting up paging structure and enabling . code protected mode with paging When protected mode is switched on, CPU will remain in segmentation mode. This segmentation in protected mode is not the same as segmentation in (checkout my previous article to learn more about this -> ). Segmentation is not used a lot today and long mode uses paging exclusively. real mode https://hackernoon.com/segmentation-and-paging-in-x86-xts3yp1 In this article we will do the following: Configure for paging CPU Load bootstrap program (program A) at memory location 0x4000 Load another program (program B) at memory location which will be mapped to virtual address 0x6000 0xc000 Switch CPU to protected mode with paging enabled starting execution in program A and doing a jump to 0xc000 Even though program B resides at physical address we expect it to be correctly executed when we execute a jump to from the first program due to paging that will be set up. 0x6000 0xc000 1. Configure CPU for paging Before CPU can start using paging we need to set up what is called page tables. In this exercise we will set up a 2 level page table. In my opinion this is the simplest to set up when starting. After going through this process (and reading a bit of ) I hope that you will be able to set up 3, 4 or even 5 level page tables. Intel manual First level page table is known as and each entry in that table is known as . PD is 4Kb aligned*. page directory (PD) page directory entry (PDE) Address of is derived from value stored in register CR3[31:12]. Bits CR3[11:2] are filled bits LA[31:22] while the last two bits are set to zero. PD contains a pointer to . Elements of page table are called . 4Kb aligned page table is located at memory location specified by bits PDE[31:12]. Bits [11:2] are filled with bits LA[21:12] and the last two bits are set to zero (same as previous step). PDE page table (PT) page table entries (PTE) Each points to a . The physical address of that page frame is formed by using bits PTE[31:12] and filling bits [11:0] with zeros. PTE Page Frame Following figure depicts the process. Lower 12 bits in CR3, PDE and PTE are status bits which we can use to configure paging mechanism. 1.1. Game plan Following is the image that depicts page mapping that we will implement. On the left hand side are Logical/Virtual addresses while on the right hand size you can see page frames that represent physical memory. As you read the following sections feel free to refer to the map above to enrich your understanding. 1.1 Identity map a region of memory Firstly we will identity map a small region of memory [0x0000 - 0x5fff]. This region will be used to store our PD, PT, stack and some other CPU relevant stuff. Page tables will be stored in memory region ranging from [0x1000, 0x2fff]. This will ensure that we can access our page tables once we enable paging mode. Now we know that PD and PT each use 10 bits of the linear address (LA) to select an entry. Using that many bits we can select 2^10 entries. Given each entry is 4 bytes in size that means our PD and each of the PTs that we create will be 4kb in size. If we decide to place those sequentially in memory each of the tables will be 0x1000 (4kb) bytes apart. We also know that last 12 bits of LA are used to select specific byte ( ) in a page frame. That tells us that a single page frame can have 2^12 = 4096 bytes = 4kb = 0x1000 . physical address / PA Each of the PTEs covers 4096 bytes large region of memory. That means we will need three PTEs to cover region of memory from 0x0000 to 0x2fff. Going backwards we know that PDE can select one of 2^10 = 1024 entries in PD. Each of which is 4bytes in size (Because 10 bits of LA are combined to produce address of PTE and then last two bits are set to 0). Given we need to index only the first PTE from PT we need only one PDE. We will place our PD at memory location 0x1000 and PT at 0x2000. Our mapping so far looks like this: CR3 = 0x1000 [0x1000] = 0x2003 (at location 0x1000, which is our PD we have value 0x2000 which is location of our PT). This is our only PDE. [0x2000] = 0x0003 [0x2004] = 0x1003 [0x2008] = 0x2003 [0x200c] = 0x3003 [0x2010] = 0x4003 [0x2014] = 0x5003 Check the following section to understand why the last digit is 3. Hint - those come from status bits. 1.2 Map "program region" The next area that we will map is [0xc000, 0xefff] That is, three blocks of 4Kb in size - 0xc000, 0xd000 and 0xe000. The mapping will be as follows: 1. [0xc000, 0xcfff] -> [0x6000, 0x6fff] 1. [0xd000, 0xdfff] -> [0x7000, 0x7fff] 1. [0xe000, 0xefff] -> [0x8000, 0x8fff] 1. [0xf000, 0xffff] -> [0x9000, 0x9fff] This mapping is a bit trickier so I made the following digram to demonstrate how mapping from LA 0xc000 to PA 0x3000 works. From the image we can see that our PTEs should be as follows: [0x2030] = 0x6003 [0x2034] = 0x7003 [0x2038] = 0x8003 [0x203c] = 0x9003 As you can see in the figure, the last digit being 3 in PDE and PTEs is due to setting status bits 0 and 1. Setting those bits tells CPU that the the PTE (for PDE) and page frame (for PTE) are present, readable and writable. 2. Implementation Now that we know what the page tables should look like implementing this is easy: { pde = | ; (mem, &pde, ); pte_1 = | ; (mem + , &pte_1, ); pte_2 = | ; (mem + , &pte_2, ); pte_3 = | ; (mem + , &pte_3, ); pte_4 = | ; (mem + , &pte_4, ); pte_5 = | ; (mem + , &pte_5, ); pte_6 = | ; (mem + , &pte_6, ); pte_7 = | ; (mem + , &pte_7, ); pte_8 = | ; (mem + , &pte_8, ); pte_9 = | ; (mem + , &pte_9, ); pte_10 = | ; (mem + , &pte_10, ); ; } * void createPageTable ( *mem) void uint32_t 0x2000 0x3 memcpy 4 uint32_t 0x0000 0x3 memcpy 0x1000 4 uint32_t 0x1000 0x3 memcpy 0x1004 4 uint32_t 0x2000 0x3 memcpy 0x1008 4 uint32_t 0x3000 0x3 memcpy 0x100c 4 uint32_t 0x4000 0x3 memcpy 0x1010 4 uint32_t 0x5000 0x3 memcpy 0x1014 4 uint32_t 0x6000 0x3 memcpy 0x1030 4 uint32_t 0x7000 0x3 memcpy 0x1034 4 uint32_t 0x8000 0x3 memcpy 0x1038 4 uint32_t 0x9000 0x3 memcpy 0x103c 4 return 0 Here we use type uint32_t as it's size is 4 bytes and it's unsigned. The first PDE entry will be placed at the beginning of memory block passed into the function. PTEs will be placed into the same region but offset by 0x1000 bytes. In previous articles we have seen that host OS needs to allocate some memory for a VM. *mem = mmap( , , PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, , ); void NULL 0x8000 -1 0 We will use our function to populate that block of memory with correct data structures: createPageTable createPageTable(mem); 3. Executable programs We will write two programs that we can use to test our mapping. The first program (a.asm) will output character 'A' and be loaded at memory address . This area of memory is identity mapped so physical and virtual addresses are the same. 0x4000 The other program ( ) is loaded at virtual memory address 0xc000 which is mapped to physical address 0x6000. . If we see output of the second program we can say that we set up paging correctly. b.asm We will use jump instruction in prog.a to jump to 0xc000 Here is our program: a.asm [ ] , , , , , BITS 32 mov ax 'A' add ax '0' mov dx 0x3f8 out dx al mov eax 0xc000 jmp eax And here is our program: b.asm [ ] , , , , BITS 32 mov ax 'B' add ax '0' mov dx 0x3f8 out dx al hlt We will use to compile both of the programs: nasm nasm -o a.bin a.asm nasm -o b.bin b.asm If you are interested in looking at binary code that was generated you can use tools like . I hope I'll write a blog post at some point where I'll disassemble one of this (or similar programs) and go through the generated binary file. objdump 4. Loading binary files into memory Now that we have files generated, we will need to load them into memory. Here is one way to do this. First in order for us to know how much memory to allocate we need to find out size of our binaries. We can use host OS's facilities to do so. get_file_size( fd) { (fstat(fd, &s) < ) { ; } s.st_size; } off_t int ; struct stat s if 0 return -1 return We will also need a function to copy those files into memory used by the VM: { temp[ ]; r = read(fd, &temp, ); total_bytes_copied = ; (r != ) { (buf + total_bytes_copied, temp, r); total_bytes_copied += r; r = read(fd, &temp, ); } total_bytes_copied; } int read_into_buffer ( fd, *buf) int uint8_t uint8_t 32 int 32 int 0 while 0 memcpy 32 return Next, let's open binary files: a_fd = open( , O_RDONLY); (a_fd == ) { ( ); ; } b_fd = open( , O_RDONLY); (b_fd == ) { ( ); ; } int "a.bin" if -1 printf "Could not open a.bin.\n" return -1 int "b.bin" if -1 printf "Could not open b.bin.\n" return -1 And copy them into VM memory: bytes_copied = read_into_buffer(a_fd, mem + ); (bytes_copied != a_fs) { ( ); ; } bytes_copied = read_into_buffer(b_fd, mem + ); (bytes_copied != b_fs) { ( ); ; } int 0x3000 if printf "Expected to copy as many bytes as there are in a.bin.\n" return -1 0x5000 if printf "Expected to copy as many bytes as there are in b.bin.\n" return -1 Notice how we have offset files a.bin and b.bin by 0x3000 and 0x5000 bytes respectively. Since this block of memory will be attached to the VM starting at address 0x1000, programs a.bin and b.bin will be found at addresses 0x4000 and 0x6000 from the perspective of VM. .slot = , .guest_phys_addr = , .memory_size = , .userspace_addr = ( )mem, }; ioctl(vmfd, KVM_SET_USER_MEMORY_REGION, &region); = { struct kvm_userspace_memory_region region 0 0x1000 0x8000 uint64_t 5. Running CPU in protected mode with paging enabled There are two more pieces of puzzle that we need to address: tell vCPU where to find page tables tell vCPU to start in protected mode with paging enabled. CPU looks in register CR3 for the address of page tables. This is a special register and its value can be retried and set using and ioctl pair of commands. KVM_GET_SREGS KVM_SET_SREGS Protected mode for a CPU is enabled by setting bit 0 in CR0 register while paging is enabled by setting bit 31 in the same register. struct kvm_sregs sregs; ioctl(vcpufd, KVM_GET_SREGS, &sregs); sregs.cr3 = ; sregs.cr0 = sregs.cr0 | ; ioctl(vcpufd, KVM_SET_SREGS, &sregs); 0x1000 0x80000001 6. Run VM Finally, set instruction pointer (IP/rip) to 0x4000 (location of a.bin program): .rip = , }; ret = ioctl(vcpufd, KVM_SET_REGS, &regs); = { struct kvm_regs regs 0x4000 And we are ready to kick off our VM: ... ioctl(vcpufd, KVM_RUN, ) ... NULL And hopefully you'll be greeted with the following output: Complete source code is available at GitLab: . Check out the README on how to run the code if you get stuck. https://gitlab.com/mvuksano/kvm-playground/-/tree/master/04-protected-mode-with-paging Conclusion And that's it! You reached the end of this article. I hope that by now you have better understanding of what it takes to configure CPU for what most engineers take for granted. In the upcoming articles I will talk more about compiling programs for different architectures (16/32/64) and configuring CPU to work in 64 bit mode. Stay tuned. assembly References: [1] Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1, Chapter 4 Paging Glossary: PD - Page Directory PDE - Page Directory Entry PT - Page Table PTE - Page Table Entry VA - Virtual Address LA - Logical Address PA - Physical address