Emulators are cool pieces of technology that allow users to run a totally different system on top of another one. Here, we'll explore how a CPU works and how its software can emulate one by implementing a simple machine that will run programs written for a fantasy CPU.
Emulators have a wide range of applications, like running x86 programs on ARM devices or running ARM Android applications on x86 Windows Desktops and even running your favorite retro game on an emulated computer/console on a Raspberry Pi.
When a full system is emulated, the emulator software needs to handle all the hardware devices of those systems. This may include not only the CPU but also the video system, input devices, etc. However, the core concept of an emulator is nevertheless emulation of the CPU.
It's well known that CPUs are the core of a machine. Their main job is to execute programs. And programs are nothing more than a series of instructions in the computer's memory.
At this point, you may be tempted to believe that your CPU knows JavaScript. Although a tempting concept, this is not the case. Imagine changing a CPU if a new version of JavaScript appeared!
In reality, a CPU understands only a finite set of instructions; the instructions the CPU engineering team designed the CPU to understand.
Users can create programs by using these instructions directly or by coding in a higher-level language and then using a compiler/system to translate the program into the set of instructions understood by the CPU.
Regardless of how the user created the program, the instructions end up in memory as a series of bytes like these:
11,0,10,42,6,255,30,0,11,0,0,11,1,1,11,3,1,60,1,10,2,0,20,
2,1,60,2,10,0,1,10,1,2,11,2,1,20,3,2,31,2,30,2,41,3,2,19,31,0,50
The CPU role is to fetch these instructions from memory and to execute them.
Using mnemonics makes the task of writing programs much easier for humans. If one writes programs using the instructions mnemonics it is said that he codes in assembly language. It only takes a simple step to transform these programs written in assembly language into binary instructions (e.g. machine language).
Instructions and mnemonics for our fantasy CPU, as described below, are quite intuitive. But we need to know the CPU instructions if we want to emulate that CPU into software.
Note: Under each instruction is specified the number used to encode that particular instruction. Also registers R0, R1, R2, R3 are encoded as numbers 0, 1, 2, 3):
Loads the value from regsrc into regdst. E.g. regdst = regsrc
MOVR reg_dst, reg_src
MOVR = 10
Loads the numeric value into register regdst. E.g. regdst = value
MOVV reg_dst, value
MOVV = 11
Adds the value from regsrc to the value of regdst and store the result in reg_dst
ADD reg_dst, reg_src
ADD = 20
Subtracts the value of regsrc from the value of regdst and store the result in reg_dst
SUB reg_dst, reg_src
SUB = 21
Pushes the value of reg_src on the stack
PUSH reg_src
PUSH = 30
Pops the last value from stack and loads it into register reg_dst
POP reg_dst
POP = 31
Jumps the execution to address addr. Similar to a GOTO!
JP addr
JP = 40
Jump to the address addr only if the value from reg1 < reg2 (IF reg1 < reg2 THEN JP addr)
JL reg_1, reg_2, addr
JL = 41
Pushes onto the stack the address of instruction that follows CALL and then jumps to address addr
CALL addr
CALL = 42
Pops from the stack the last number, assumes is an address and jump to that address
RET
RET = 50
Print on the screen the value contained in the register reg
PRINT reg
PRINT = 60
Stops our VM. The virtual CPU doesn't execute instructions once HALT is encountered.
HALT
HALT = 255
Armed with the fantasy CPU specs, we can now move ahead to emulate the CPU in software – in JavaScript.
As presented above, on real machines programs are stored in memory. In our emulator, we will emulate the memory using a simple array structure.
As a matter of fact, we’ll only place a single program in the memory.
let program = [11,0,10,42,6,255,30,0,11,0,0,11,1,1,11,3,1,60,1,10,2,0,20,
2,1,60,2,10,0,1,10,1,2,11,2,1,20,3,2,31,2,30,2,41,3,2,19,31,0,50];
Our virtual CPU will need to fetch instructions one by one from this array and execute them. The CPU will keep track of the instruction that needs to fetch by using a special-purpose register named “PC” (Program Counter).
Fact: PC register is a real register on many physical CPU architectures.
The core of the CPU emulator is just a big “switch” statement that will process each instruction according to the specs.
let pc = 0;
let halted = false;
function run()
{
while(!halted)
{
runone();
}
}
function runone()
{
if (halted)
return;
let instr = program[pc];
switch(instr)
{
// handle each instruction according to specs
// also advance pc to prepare for the next fetch
// ...
}
}
That’s all! This is the structure of our glorious CPU emulator. Processing instructions is also a very simple task. It requires only careful reading and implementation of the instruction’s specs.
switch(instr)
{
// movr rdst, rsrc
case 10:
pc++;
var rdst = program[pc++];
var rsrc = program[pc++];
regs[rdst] = regs[rsrc];
break;
// movv rdst, val
case 11:
pc++;
var rdst = program[pc++];
var val = program[pc++];
regs[rdst] = val;
break;
...
}
Take your time, read the specifications, and try to see if you can complete this virtual CPU implementation. You’ll know that you did a great job when you’ll see the results of the execution program.
If you coded carefully, the program should display the first 10 Fibonacci numbers.
You can also consult our version from this playground: https://codeguppy.com/code.html?simple_vm/vm0_fibonacci
As you saw in the above playground, our emulator implementation exists in a simple virtual machine that loads initially the program from an array of bytes and then asks the fantasy CPU to execute it.
vm.load([11,0,10,42,6,255,30,0,11,0,0,11,1,1,11,3,1,60,1,10,2,0,20,
2,1,60,2,10,0,1,10,1,2,11,2,1,20,3,2,31,2,30,2,41,3,2,19,31,0,50]);
Loading like this is fine if your only intent is to load the program we’ve provided. But if you want to design your own programs in machine language and execute them, this may not be a very intuitive or efficient way.
Here's a small technique that can be used to load the programs using a human-friendly CPU instruction mnemonics. It only takes defining a few constants:
const MOVR = 10;
const MOVV = 11;
const ADD = 20;
const SUB = 21;
const PUSH = 30;
const POP = 31;
const JP = 40;
const JL = 41;
const CALL = 42;
const RET = 50;
const PRINT= 60;
const HALT = 255;
const R0 = 0;
const R1 = 1;
const R2 = 2;
const R3 = 3;
vm.load([
MOVV, R0, 10,
CALL, 6,
HALT,
// PrintFibo: (addr = 6)
PUSH, R0,
MOVV, R0, 0,
MOVV, R1, 1,
MOVV, R3, 1,
PRINT, R1,
// continue: (addr = 19)
MOVR, R2, R0,
ADD, R2, R1,
PRINT, R2,
MOVR, R0, R1,
MOVR, R1, R2,
MOVV, R2, 1,
ADD, R3, R2,
POP, R2,
PUSH, R2,
JL, R3, R2, 19,
POP, R0,
RET
]);
With this trick, it's like you’re programming in assembly language inside JavaScript.
You can find the new program with mnemonics in this playground: https://codeguppy.com/code.html?simple_vm/vm1_fibonacci
Now, we'll take the opportunity to quickly cover a few tools commonly used when working with machine language code.
Perhaps the most important tool when working with machine language programs is an assembler program.
This program lets users write programs as text files, in easier to use assembly language. Then, the assembler does the heavy lifting by converting the assembly source code into binary data understood by the CPU.
We will attempt to build a simplified assembler program that does the basic job. It will work like this:
let code = `
// Loads value 10 in R0
// and calls Fibonacci routine
MOVV R0, 10
CALL 6
HALT
...
`;
let bytes = asm.assemble(code);
If you want to see the code, please check this playground: https://codeguppy.com/code.html?simple_vm/vm3_assembler
Another useful tool that for working with machine language is a disassembler.
A disassembler takes binary format and outputs its listing in a human-readable way, i.e. in assembly language.
When run on the following program...
let src = asm.disassemble(
[11,0,10,42,6,255,30,0,11,0,0,11,1,1,11,3,1,60,1,10,2,0,20,
2,1,60,2,10,0,1,10,1,2,11,2,1,20,3,2,31,2,30,2,41,3,2,19,31,0,50]
);
… our disassembler should output this:
0 11 0 10 MOVV R0, 10
3 42 6 CALL 6
5 255 HALT
6 30 0 PUSH R0
8 11 0 0 MOVV R0, 0
11 11 1 1 MOVV R1, 1
14 11 3 1 MOVV R3, 1
17 60 6 PRINT R1
19 10 2 0 MOVR R2, R0
22 20 2 1 ADD R2, R1
25 60 6 PRINT R2
27 10 0 1 MOVR R0, R1
30 10 1 2 MOVR R1, R2
33 11 2 1 MOVV R2, 1
36 20 3 2 ADD R3, R2
39 31 2 2 POP R2
41 30 2 PUSH R2
43 41 3 2 19 JL R3, R2, 19
47 31 0 2 POP R0
49 50 RET
The listing contains the memory addresses and binary instructions and associated mnemonic for each instruction in our program.
To see a very simplistic implementation of a disassembler, please check this playground: https://codeguppy.com/code.html?simple_vm/vm2_disassembler
You're probably wondering how we came up with the assembly program to print the Fibonacci numbers? We first wrote first the algorithm in JavaScript and then convert it step by step to assembler:
Once experienced with writing in assembly, this task can be done in one step. It just requires practice!
Even if there is still a lot of work before emulating a full system, hopefully this provided a basic overview of what it takes to emulate the core of a system – the CPU.
All the programs in this article and explanations are included in the following playground: https://codeguppy.com/code.html?simple_vm/vm
As a next step, I invite you to create additional programs for this fantasy CPU and if needed extend the CPU instruction set with new instructions to support your programs.
Looking forward to your feedback and comments. Please let me know if you want to see more about this subject, perhaps even seeing the implementation of a full emulator. For more recreational coding programs please feel free to browse codeguppy.com