This post covers module loading — the process of loading .exe and .dll files. You’ve no doubt heard the term “Dynamic Link Library” — that’s what this post is mostly about — dynamic linking.
This post is a little longer and a little more tedious than previous articles because the topic is fairly involved. Bear with me — by the end it’ll all make sense and we’ll have covered everything required to get a simple program running under Win3mu.
The Module Manager
In Win3mu the ModuleManager class is responsible to loading and unloading modules. It supports two kinds of modules both of which derive from a common base class called ModuleBase.
- Module16 — encapsulates a module loaded from 16-bit .dll or .exe file whose code will run on the emulated CPU.
- Module32 — a module written in C# that emulates a 16-bit module (often by calling the real Windows API).
Modules based on Module32s are created and registered with the module manager during start up:
Note: I use the term “32” throughout this project to refer to the host operating system platform. When running on Windows x64 “32” actually refers to 64-bit Windows. This is similar to how Windows API is often referred to Win32 — even though it covers both the x86 and x64 editions.
The ModuleBase class provides an abstract interface to a module and many of these methods are used in the process of loading and linking modules:
Most of these methods will be described in the following sections.
Locating and Opening the File
The first step in loading a module is to check if it’s already loaded. This is done by module name (rather than filename) and if found its reference count is incremented and nothing else needs to happen.
If it’s not already loaded the next step is to locate the file. For the main .exe file this is already a fully qualified filename supplied on Win3mu’s command line.
Sometimes the running program will use LoadLibrary to load a module with a fully qualified filename, but more typically a plain module name needs to be mapped to a file name. eg: “Win87em”
There’s a search strategy for this but really it’s no more than appending “.dll” and looking in a couple of well defined locations for the file — mainly the same folder as the .exe.
Once the file is located a Module16 is created, passed the filename and Module16 opens the file using the NeFileReader class (as described in a previous post).
Resolve Referenced Modules
Most modules reference other modules so the next thing to do is make sure they’re available and loaded. The ModuleBase class has a method to return a list of referenced modules:
Note that this will often include modules like Kernel, User etc… these names are automatically mapped to previously registered Module32 implementations.
Load Code and Data Segments
Now that the referenced modules are loaded the module’s code and data segments can be loaded into the global heap. The ModuleManager calls the Module16.Load() method which does the heavy lifting:
For each segment in the NE File:
- Allocate memory from the global heap
- Configure the memory’s selector according to the segment’s flags
- Read the segment from the NE file
For the automatic data segment the amount of memory allocated is increased to make room for the local heap and for the stack.
Unlike the real Windows, Win3mu immediately loads all segments. There’s no segment discarding because it assumes we’ve got plenty of memory available.
Apply Code Relocations
Once all the segments and all referenced modules are loaded the newly loaded modules need to be linked. We’re now getting to the heart of dynamic linking.
The link process starts with the ModuleManager calling the ModuleBase.Link method. Module16 then goes through each segment and applies the relocations which describe both internal code fixups and references to external modules.
There are several different kinds of relocations but the main two are
- Internal References — which link to another segment in the same module
- Imported Ordinals — which link an exported function in another module:
Each relocation entry points to the head of a chain of addresses that need to be relocated and the chain is ended with 0xFFFF. Also, relocations can be “additive” in which case the resolved value is added to the value already in the segment (rather than replacing it). Additive relocations aren’t in a chain.
There’s one other class of relocation — “OSFixUp”. These are related to floating point math operations which I’ll cover it in another post.
Patching Exported Functions
Under Windows 3, any functions exported from a module must have a special prolog/epilog that serves two purposes.
- Ensuring that the correct data segment selector is loaded into the DS register
- Tweaking the BP register so that Windows can reliably walk the stack to apply address fix ups when code segments are moved around.
I’m not going to cover stack walking because it’s not relevant however setting up the DS register is crucial.
The DS register points to the module’s data segment and since each module has its own data segment when execution moves between modules the DS register needs to be updated to reference that module’s data segment.
On disk (ie: as generated by the compiler) a function’s prolog looks like this:
; Load AX from DS
; Setup for stack walking
; Save DS
; Load DS from AX
The default behaviour is to load AX from DS and then load DS from AX — ie: DS is left unaffected. For non-exported functions the prolog is left like this so that calls within the module maintain the same value for DS.
When a function is exported from a .exe, the caller must set the AX register to the correct data segment selector before calling. In this case the prolog needs is patched to disable the first few instructions:
; AX already points to DS (zap the old instructions to NOPs)
; the rest of the prolog as before ending with MOV DS, AX
And when a function is exported from a DLL the prolog is patched to explicitly set the AX register:
; Patched to load correct AX
; the reset of the prolog as before ending with MOV DS, AX
(You can now see why that extra no-operation (NOP) instructions was required in the original prolog — to reserve room for the longer MOV AX,xxxx instruction.)
These patches are applied in the Module16.Link() method after code relocations are applied:
The final step in loading a DLL is to call its LibMain. After the module manager has loaded and linked the module it calls ModuleBase.Init() and Module16’s implementation runs LibMain:
Finally the module’s all loaded and ready to go!
What About GetProcAddress?
One important thing I skimmed over is the implementation of GetProcAddress — the function that finds the address of an exported function during linking.
Module16’s implementation uses NeFileReader to work out the address of the function:
Module32’s implementation returns the address of a thunk (as described in the previous article) and I’ll show a more concrete example below.
Implementing One Windows API Method
Let’s now have a look at what Module32 does by creating a fake module that 16-bit code can call.
FakeUserDll implements one Windows API method — MessageBox which is exported from the module user.dll as ordinal ID #1.
Things to note:
- Its Init() method calls machine.CreateSystemThunk to create a thunk that when called from 16-bit code will invoke the FakeUserDll.MessageBox() function
- GetProcAddress returns the address of the thunk when asked for ordinal #1
- The MessageBox method reads parameters from the VM stack/memory, calls the real Windows MessageBox function and sets AX to the return value.
- The first parameter is at SS:SP+4 because the return address of the caller will have been pushed after the parameters.
First Win3mu Run!
We’ve now covered just about everything required for Win3mu’s first run. In fact its first run was exactly what I’ve described — a simple .exe that calls MessageBox:
- I built it in Windows 98 with Visual C++ 1.5
- Copied it to my main dev machine
- Ran it under Win3mu
- Debugged it, debugged it, debugged it some more
- Got a message box!
Phew! It’s been a long process to get here but that’s really the bare minimum to getting a 16-bit Windows program running under emulation.
The main things that I haven’t covered are just some miscellaneous startup trivia — setting the correct registers for calling the .exe and one other function “InitTask” which doesn’t do anything too interesting.
The Windows API consists of about 1,100 API methods. If I have to write code like the MessageBox example above for every one I’m going to go insane! The next step is to reduce the amount of code required for each method to an absolute minimum — and in many cases no code at all.
For the next article I’m going to cover off some miscellaneous topics like path mapping, config files and how I tracked down some subtle bugs in the processor that slipped through unit testing.
I mentioned OSFixup relocations above and how they’re related to floating point math. When I originally wrote the module loader I ignored them but once I encountered the first program using floating point math — Microsoft Tetris — I had to address them and they’re now working:
Hi, I’m Brad Robinson — an independent software developer living in Sydney Australia. I write software for musicians and as an indie developer I rely on word of mouth.
If you enjoyed this article please consider sharing it by hitting the “recommend heart” below or by sharing on Facebook/Twitter. It’s a small gesture but makes a real difference.
Also, if your feed is lacking in hex dumps, disassembly listings and screen shots of old Windows 3 games you might like to follow me on Twitter.
Continue reading… Part 9 — Path Mapping.