By Pauli Rautakorpi (Own work) [CC BY 3.0 (], via Wikimedia Commons
This is Part 3 in a series of articles about building Win3mu — a 16-bit Windows 3 emulator. If you haven’t done so already I recommend starting at the beginning where I explain my (ir)rationale for starting this project.
This post covers the design of the CPU.
At the center of any emulator is an accurate emulation of the CPU and it’s usually the first thing to tackle. Win3mu’s CPU is implemented in a separate project called “Sharp86”.
Initially I was planning on implementing just the 8086 instruction set and running Windows in real-mode. It didn’t take long to realize however that that protected mode solves a whole class of memory management issues (which I’ll explain in a later article) so Sharp86 implements the 80186 instruction set and a mechanism to implement protected mode features externally to the processor.
Since all the new instructions introduced between the 186 and 286 are related to protected mode, from the 16-bit program’s point of view the CPU might as well be a 286 – because it shouldn’t be using any protected mode instructions anyway.
The Arithmetic Logic Unit, aka the ALU, is the part of the processor that performs math and logic operations. It accepts one or two input values, performs an operation, provides the result and sets the flags register according to the results.
Sharp86's ALU is implemented as a base class to the main processor and includes methods for all the required operations — most in 8 and 16-bit forms:
API to the ALU is essential a set of flags and a set of operations
It also implements the flags register and as a slight performance tweak it delays resolving some flags until required. eg: very rarely is the parity flag used so it doesn’t calculate it unless it’s read.
The CPU class derives from the ALU class and implements most of the processor logic. It’s responsible for maintaining the rest of the registers, instruction decoding, memory and I/O port access and raising interrupts.
It communicates with the outside world via an IBus interface which is analogous to the system bus on a physical processor. For simplicity, it’s a an 8-bit data bus (unlike a real 286 which is 16-bit).
The CPU class consists of a few key parts:
Besides IBus and a set of properties to access the registers, the API to the processor really only consists of a single method “Step()” which executes one instruction and returns.
By calling Step repeatedly in a loop you have a running processor:
x86 processors have three ways that interrupts are raised. In Sharp86 these all end up at the CPU’s RaiseInterrupt method
The default RaiseInterrupt method does the same as a real processor – it looks up the interrupt descriptor table for the address of the handler function, pushes the Flags registers and performs a far call to the handler.
Note that RaiseInterrupt doesn’t actually run the interrupt handler — it just sets up the processor so that next time Step() is called it will be.
I’m a little paranoid about bugs in a processor. Here’s why…
When I was working on FPGABee I hit a problem where the operating system was hanging part way through boot. Since the FPGA board I was using had very limited debugging support I didn’t have much visibility into what was happening — I couldn’t even step through the code.
In the end I configured the circuit to show the current instruction pointer on the LED readout, slowed the clock speed right down, videoed it with my iPhone and then played it in slow motion while stepping through a disassembly listing. I spent a lot of time tracking down what turned out to be a bug in the processor core— it wasn’t incrementing a register during a string operation.
Never had to video a bug to catch it before!
If there’s one thing I learned from this its that you need to be able to trust the processor — CPU bugs can be super nasty to chase down.
I don’t expect to be doing video camera debugging with this project but a subtle processor bug popping up in the middle of a program would probably be very difficult to track down.
Time for a whole pile of unit tests…
Testing a processor is tricky. It’s nearly impossible to get full coverage so I broke it down into a few key areas:
For the instruction decoding tests I wanted to make sure I had the right opcode mapped to the right instruction so I used a real assembler (YASM) to generate one instruction which then gets tested. Each of these unit tests follows this pattern:
A typical instruction unit test case — emit calls YASM, run calls step()
All up, there are over 470 test cases and so far I’ve only noticed one bug that’s slipped through — a far call instruction that was incorrectly doing a near call.
Ignore the times — most of that’s from spawning YASM.
For the processor’s first test run I needed a really simple executable. Since you can’t get much simpler than a DOS “.com” file I decided to:
First Signs of Life from the CPU
I ran a few more adhoc tests after which I was fairly confident I had a working processor.
Often you’ll hear the term “Cycle Accurate” when talking about CPU emulations. Sharp86 is not cycle accurate since it doesn’t need to be but I thought I quickly describe what this means.
For a hardware based emulation such as a an FPGA based processor, cycle accurate really means cycle accurate — it means the timing and order of CPU’s control lines, address and data bus and the speed of instructions all match the original processor.
For a software based CPU emulation it’s a little different and more about trying to accurately match the execution speed of the original processor.
This is normally done by calculating the number of clock cycles each instruction took on the original processor and maintaining a running count of which clock cycle number the emulated processor is up to.
The number of clock cycles per instruction is usually variable — and can even depend on the results of an operation. eg: conditional jump instructions often take a different number of clock cycles depending whether the branch was taken or not.
Once you have that running cycle count it’s pretty easy to throttle the processor to match the speed of the original:
As mentioned, this project doesn’t need this and simply executes as fast as possible. Programs running under the emulator will be scheduled by Windows just like any other program.
I’ve mentioned pseudo-protected mode a couple of times now but this article is long enough so I’ll leave it and cover it in the next post.
As you may have realized, I’m writing these articles retrospectively and the implementation is a fair way ahead of this post. Win3mu can now run large parts of “Clink II” — a Tetris style game I wrote in the 90s:
Hi, I’m Brad Robinson — an independent software developer living in Sydney Australia. I write software for musicians and as an indie developer I rely on word of mouth.
If you enjoyed this article please consider sharing it by hitting the “recommend heart” below or by sharing on Facebook/Twitter. It’s a small gesture but makes a real difference.
Also, if your feed is lacking in hex dumps, disassembly listings and screen shots of old Windows 3 games you might like to follow me on Twitter.
Continue reading — Part 4 — Protected Mode!