The Apple IIgs came out on September 15, 1986. It featured a 2.8 MHz WDC 65816 CPU (the same one that powered the SNES and other similar computers of that era, a 16-bit CPU with 24-bit addressing), 256k or 1MB RAM (upgradable to 8 MB), and an Ensoniq 8-bit stereo synth (which was a welcome upgrade from the bit-speaker of the Apple II family). For reference, the original Apple II family was built around the 6502 CPU (8 bit, 16-bit addressing), and had at most 1 MB of RAM in the IIe and II+. However, it was not until 1988 that Apple had released an operating system for the new computer that was able to meaningfully leverage the newer hardware. was written in native 16-bit code, and more importantly, was intended to be used via its new shiny GUI. GS/OS This article is about how I built a for the IIgs, from start to finish. tiny ‘text editor’ PS: while reading! This opcode reference might be useful What do we have to work with? At the beginning of this project I imposed a few requirements. My text editor should: Be launchable from GS/OS regardless of whether or not it actually runs windowed in the operating system. Not occupy more than 256k of RAM, so that it may run on IIgs. any Run in native 16-bit mode. If you are unfamiliar with this processor, then the last bulletpoint is confusing. By the time the IIgs had launched, there was already a plethora of software out in the wild that was compatible with the original Apple II (which used the 6502). The 65816 has an “emulation mode” that can be toggled which effectively turns the ‘816 into a 6502 by selectively “halving” the width of its accumulator and index registers (amongst other things, which we will cover later). GS/OS is a 16-bit “single-process at a time” operating system. For me, this means that my text editor is going to be the only running program on the system after it finishes loading. The OS will remain in RAM but I will have to abstain from writing to certain memory locations in order to not corrupt the OS code. GS/OS will bootstrap my program and then pass complete control of the computer to it. To quit the application, I have to jump ( in native mode) to the entry point of the OS where it will then take over. jsl Insofar as programming environment our options look kind of bleak: , ANSI C compiler, but must run on bare metal or emulator. ORCA/C , ANSI C compiler, but only for 6502. cc65 , a 65816 assembler, but must run on bare metal or emulator. Merlin16 , a Merlin16 macro compatible assembler that runs on any modern computer that can build it from source. Merlin32 I own a physical IIgs, but wanted to be able to develop easily from a modern computer. Furthermore, building of an emulator is also clunky and not something that can easily be automated with a build script (i.e. it is not a good use of time to hack the emulator such that I can script UI actions), so the only viable option was to use Merlin32. Additionally, the only way for me to really deploy things to the physical computer is by air-gapping a CF card to my desktop which is the only computer here with a multi-card reader. After deciding on Merlin32, the first order of business was to figure out how to write disk images with my program such that an emulator could load it. I had to for the BrutalDeluxe floppy imaging tool, since it only ran on Windows (which is some kind of cardinal sin) and it was the only tool that could easily be scripted. Judicious use of and some had me making Apple II disk images within a few hours. inside write POSIX file API bindings #pragma once stat() The journey of COUT The only way to not lose your mind with difficult projects is to try to write your software in as much of a goal-oriented manner as possible. The feature that editor must be able to implement is the display of strings on screen. In most programming languages, you can invoke some variant of to accomplish this task. In Apple II world, we do not have this affordance, although this is somewhat a lie since the Apple II ROM contains a series of toolbox functions that implement , and numerous other utilities. The workflow for toolbox utilities is simple: you prepare the registers with some data (in the case of , a character is loaded to the register, called the accumulator), to the routine from your program, it performs some arbitrary task, and then invokes to return control to your code. This seems to be exactly what we would want, so why can't we use it? bare minimum any print() COUT RDKEY COUT A jsr rts It is time to learn a little bit about Apple IIgs memory architecture. Apple IIgs Hardware Reference (APDA Draft) GS/OS executable files are stored in OMF, a relocatable executable format. If you’ve written software for Linux and/or Windows, then this is something analog to your or executable formats, but for the GS operating system. Observe that each chunk of memory flows , but more explicitly, that the memory is divided into 64k chunks. Why is this noteworthy? Remember that the 65816 can fully emulate a 6502, which can only address up to 64k (2¹⁶) of RAM. The Apple IIe and IIc used the 6502 but supported memory configurations in excess of 1M due to hardware that implemented bank switching (such that a 64k “window” out of the total was visible at any given time). Since the 65816 CPU supports 24-bit addresses, it can just use an extra byte to represent the bank number as we see in the diagram below. ELF PE32 $FFFF->$0000 Bank is special ( ), as it contains the ROM and a range of addresses used for I/O. $00 also for other hardware traits best explained here Apple IIgs Hardware Reference (APDA Draft) Given that our GS/OS code is relocatable, that the program will consistently load at the same location (even moreso given the variety of aux memory configurations your end user may possibly have). In other words, the operating system can arbitrarily load your code anywhere in free built-in or auxiliary RAM. Recall that earlier we spoke about using (with a 16-bit immediate address argument of the location of the routine) to invoke the toolbox’s after loading a character. Behind the scenes, the instruction places the memory location of the next instruction in line (i.e. ) onto the stack before jumping to the address you specified as an immediate argument. The instruction (what you use to exit from the routine) knows where to go back by pulling the value pushed onto the stack. That value is a 16-bit address, which means that this whole exercise will not work unless your program is in bank 0. there is no guarantee jsr COUT jsr PC+1 rts jsr We just finished talking about how the 65816 supports 24 bit addressing, so specifying the bank shouldn’t be a problem, right? This is true, however there is another catch. To retain compatibility with existing Apple II software, the original Apple II ROM must ship with the IIgs such that software written for the Apple II can still run. That is, if original Apple II programs use the routine, they will also need to use the same routine on the IIgs regardless of the 6502 emulation mode. Therefore, these ROM functions are written in 6502 assembly as opposed to native 65816 assembly, which poses two problems. Suppose we invoke a , with the 24-bit “long” address of the routine. The computer at the correct instructions — and given the instruction compatibility and overlap with the 65816 — it will run them. However, in native mode our registers are different sizes (and addressing mode may be different!), so the code will perform in an unexpected manner and cause the GS to crash to monitor (the ROM contains a small assembler/monitor that is invoked during a crash). Then, even if the computer was to execute these instructions successfully (and display the char), the used to exit the routine will only pull two bytes off of the stack (i.e. a 16-bit, not 24-bit address), so if bank boundaries were crossed, you will get back to your code. We would need a native IIgs ROM (which, I do not believe exists for the 40-char video page (?)). COUT jsl $00FDED COUT will arrive jsr never COUT Displaying a character So, we’re still left with the task of displaying a character. It looks like we’re going to have to get our hands dirty and write the routine ourselves. The bank 0 map above has addresses marked as I/O. On the IIgs, this specifically means locations allowing you to: $CFFF->$C000 Read the keyboard Softswitches to mode set to either: text, high-res, or SHR “super-high-res” video pages Invoke on the speaker to cause a tick lda Read the game controller The first two points are of value to us, as we need to read user input and ensure that we are in the correct video mode. We’ll first figure out how to toggle the softswitches, since we can test if our video works by loading a character code into a register (i.e. as an immediate value argument to an instruction like ) before doing what we need to do to get the character to display. What do we need to do? Let's move away from the I/O section and figure out where the video pages are. lda Apple IIgs Hardware Reference (APDA Draft) The 16-bit addresses listed in the table above are all on bank 0. If we write values (in this case, char codes) to the text pages, they will appear on the display. The 65816’s 24-bit addressing allows us to write to any 24-bit address, which means that we can just prepend the bank number to the addresses above to obtain and perform the write. By default (or so it seems during my testing), GS/OS loads a program in 80 character video mode, so we need to toggle the video softswitches before printing. We'll get to that in a later section. For now, just assume that we've already toggled the appropriate switches. Then, the code required to display a character would look as such: only available $000400 However, there is a catch with doing this. We used the register to store our character, then the command with a long 24-bit address to the first position of the 40 character text page. The register of the 65816 in native mode is 16 bits, or two bytes large. A character is only one byte, so our write in this case writes characters to the page (depending on what is left in the other half of the register when the write is performed). It is possible to operate on two characters at a time with , but reading one key at a time while having to write two characters is going to be an irritating exercise. Recall that we discussed a 6502 emulation mode that involved telling the 65816 to use smaller registers. It is to change the size of the registers without exiting native mode. That means we can change the size of to be 8 bits, such that performs an 8 bit write (or one char). This piece of the puzzle is complete, but in order to figure out how to get this code to do what we want it to, we need to understand the processor status register. A stal A two A ldal #”AB” also possible A stal The processor status register and softswitches For the assembly programmer, this is a place to be. Look at all the information you can represent with just 8 bits! Instructions like ("branch if carry clear") leverage this register for branching logic. Other instructions like set these bits to give you some information about the operation you performed. In 's case, the carry bit is set if is greater than the other operand. The bit is the emulation bit, but we won't be touching it for this tutorial since this editor will remain 16-bit native. magnificent bcc cmp cmp A E In our case, we need to set the “Memory/Accumulator Select” to such that it is then 8 bits wide. To do this, we can write to this register in the following manner. If not apparent, the hexadecimal number used in the instruction below is 8 bits large and represents the entire width of the register. Additionally, the 65816 is little-endian, so keep that in mind when comparing the number to the register! 1 To return back to full width: Recall that earlier we mentioned that softswitches in the I/O block were used to change video modes. We still need to place the GS into the 40 character video mode, so let’s do that. Softswitches can be toggled by either performing an or to the address of the switch. Consult an Apple IIgs hardware reference for all switches in the I/O block. In our case, you can imagine that some kind of video controller listens at these addresses for a signal to do something. We just very literally "toggle" the switch by performing a memory access operation against it. lda sta You be in 8-bit accumulator mode for this to work. For example, disables the 80 char hardware but enables it. If you write a 16-bit value to it will also overwrite , therefore turning it off and then immediately back on again. Ask me how I know! MUST $C00C $C00D $C00C $C00D Let’s combine everything we have so far, and add the softswitch toggling code to make a simple and imperative hello world program. We now have assembled almost all of the pieces necessary to start building our text editor. Reading the keyboard Our remaining challenge is to figure out how to capture user input. After consulting the , we obtain some useful I/O locations: IIgs hardware reference contains the character code of the key that was pressed and the $C000 strobe bit contains the any-key down flag and the softswitch $C010 strobe reset Recall that earlier we mentioned that a character code can fit within one byte (or 8 bits). Imagine now that I have just pressed the F key and immediately performed a . The register would look like this: ldal $00C000 A Strobe (7) 6 5 4 3 2 1 0 1 1 1 0 0 1 1 0 Bits 0–6 represent the character code , and bit 7 is the strobe. What is this strobe bit anyway, and why is it there? The keyboard controller sets the strobe to give the programmer a mechanism of understanding when a keydown event has happened. Some may say "I don't need a strobe bit to do this, can't I just check to see if the value changes?" What if the user intentionally inputs the same character twice? Additionally, you would have to waste an extra byte to store the previously inputted value if you wish to employ this method. The strobe bit provides a much more elegant solution to this problem, and furthermore, we control the entire feedback loop by being able to clear it, which then tells the keyboard controller that it is OK to poll for another keydown event and modify the data available at once more. When the strobe bit is set, the keyboard controller will overwrite that location! $C6 $C000 not Logic is best written on paper, so let’s come up with a small event loop: - Read character data from $C000- Determine if the strobe bit is set If yes, branch and handle input If not, proceed- Read $C010, thereby toggling a softswitch causing the strobe bit to clear Now, to translate this logic to 65816: The instruction sets the bit of the processor status register to the high bit of the data in the accumulator. Conveniently for us, this means that we can use the instruction ("branch if minus") immediately after, which branches only if the bit is high (i.e. set to ). bit n bmi n 1 Assembly programming at the beginning can be very daunting: you do not have to work with past mnemonic instructions. Sure, your assembler may have support for macros (labels that can substitute for blocks of assembly code), but otherwise you are not really afforded any conveniences for organizing your code. In C, we can group our logic into functions. In assembly, the nearest similar convenience are the & subroutine instructions we discussed earlier. Similarly, there are no loop constructs; loops are made by guarding jumps to other addresses by some kind of comparison as we have demonstrated above. The C code below is implemented in the same spirit as the assembly above: anything jsr rts jsl rtl The clause above may appear to be a bit tricky to read, but all we are doing is performing a bitwise between the dereferenced value of and the number bit-shifted 7 times to the left such that binary becomes . if AND gs_char 1 1 10000000 Success! We’ve assembled all the knowledge we need to make a very basic text editor for the Apple IIgs. What’s in a text editor? At its core, I only want the following out of my text editor: Only support the 40-character text page No scrolling (i.e. no text buffer with offset to sync to the text page, the text page our storage) is The arrow keys can be used to scrub the buffer and change location of the current character The bottom right hand corner of the display will show a column and row output in format (XX, YY) “Hitting” the 0th column or the 39th column (max) will ping the speaker To illustrate the second point, the modern text editor that you use is capable of having a file loaded with more text that can fit on your display. It follows then, that all of the text of the currently loaded file must reside somewhere even if you are only looking at a certain portion of it. On the IIgs, this would mean keeping some separate space for all of the text, but this is a lot more work. For the purpose of this tutorial, all this means is that the maximum amount of text we’ll allow the user to store in memory will be the maximum amount of text that can be displayed on screen by the 40 character page. Let’s decorate our earlier pen and paper event loop with a complete feature set and some variables: Variables in registers , the column X , the row Y , the character from the current key down A Those familiar with the 65816 will observe that we are out of “general purpose” registers. There are two more registers left: the direct page and stack pointer register. The former allows you to use instructions with smaller addresses as immediate values, so if you don’t use the mode you can technically use the register for another purpose. We cannot change the stack pointer register because we want to use the stack, and furthermore, the stack is a tool for saving and restoring register values so that you can free them up for destructive operations. Instructions such as push to the stack, and to pull from the stack. You'll see an example of this when we build the row and character counting mechanisms. far more idiomatic pha A pla A Core event loop - Read character data from $C000- Determine if the strobe bit is set If yes, branch and handle input (via a subroutine) Is the key up, down, left, right, return, or backspace? If yes, handle those cases Otherwise, for any arbitrary key Write the character to the text page Increment column and row appropriately rts If not, proceed- Invoke subroutine to display current character- Read $C010, thereby toggling a softswitch causing the strobe bit to clear- Jump to character reading address to repeat Starting implementation The first order of business is to tell our assembler, Merlin32, that we want to make a relocatable OMF executable that GS/OS can recognize. and are not 65816 instructions but rather mnemonics that only Merlin32 can parse. We use them to set the file name and type here ( for ). rel typ $B3 OMF16 The last two instructions are responsible for setting the data bank register to have the same value as the program bank register (by pushing the former to the stack and pulling that value into the latter register). Remember when we spoke about being ? Suppose that we store a string somewhere in our assembly source. We can do so by using instructions in sequence with bytes that represent characters which compose our string. Conveniently, Merlin32 lets us add labels, so we can reference this array with a label for use in our program (i.e. s to read the string) later. If we don't point the data bank register to the same bank as that which the program is located in, we would have to use larger 24-bit addresses to reference our string versus a 16-bit address. The 8 bit data bank register is appended to the of the 16-bit address you provide as an immediate value, making a 24-bit address. If you make a lot of references to that string, you save a few bytes of file size! OMF relocatable db [byte] lda high end Getting back to it, let’s add our switch toggle code and the core event loop: You can verify that you’re reading keys correctly by implementing the routine with a and commenting out as it is not yet implemented. We want to place the breakpoint in the routine since we only branch to the routine if a key was actually pressed. Otherwise, you'd have no reliable way of testing (as it doesn't matter $00C000 is until we know that we caused the input). Regardless, the computer will process this as a and display the monitor you see below. Look at the byte of A to see the character code that you have just pressed. keydown brk jsr drawpos what breakpoint rightmost Magic! This is very literally the core event loop for . You may notice that there is no routine yet. Remember, we handle it as part of the keydown routine (by virtue of not needing to redraw if certain keys are pressed). We'll get to it eventually, but let's get out of the way first. mrbuffer drawchar keydown Implementing the keydown routine Let’s restate our mission objectives: - Is the key up, down, left, right, return, or backspace? If yes, handle those cases- Otherwise, for any arbitrary key Write the character to the text page Increment column and row appropriately And of course, here’s an ASCII table to make life easier. Note that there are only 3 bits under the most significant (“MSD”) header; recall that the highest bit is the strobe bit (and therefore omitted here). I embarrassingly relied more on the method above to identify characters because I was lazy: brk Scanlon — IIgs Assembly Language Programming It looks like we’re going to get out of this one pretty easily. We can just chain together some s and follow them with s and the immediate value of the char codes for up, down, left, right, return, and backspace. cmp beq I don’t think anything else here needs more explanation, so on we go. Displaying a character (again) Citing from above, we can see that the 40 column text page begins at and ends at . Our imperative hello world program earlier did print "OHAI" to the screen by writing a character to each successive address past . Ostensibly, it would not be unreasonable to assume that if you keep going until that you would wrap around to the remaining rows until the display is full. However, this is incorrect. Let's take a look at the map of the 40 character text page: Table 2-8 $0400 $07FF $0400 $07FF Apple IIgs Hardware Reference (APDA Draft) Observe the values of the rows. The next successive address that represents a complete row is located at , but that row is not row 1, it is row 8! Try this out by writing a program to write 41 characters to the page starting from the first row. You'll see that the last character appears lower on the display! Furthermore, row 8 is contiguously followed by row 16. The next row that is closest to the end of row 16 is row 1, but row 1 does not contiguously follow. Row 16 ends at , but row 1 begins at leaving two bytes of space between the rows. Unfortunately, being irritated at this design decision resolves neither the fact that we have to implement a working solution, that things were implemented this way. $0428 $478 $480 nor the actual reason We can elect to use one of two solutions, both involving the column, row data we store in the X and Y registers, respectively: Define a pointer (and a routine to update it) to the base memory address that changes as the row changes. Make a big table for each row. cmp While the former is likely more terse, the edge cases will be annoying (e.g. the two byte hole) and will probably involve making a fair amount of spaghetti anyway. When documentation doesn't OCR, the best bet is to go with something imperative and reliable. cmp Let’s assume that we’ve already handled the row and column increment logic, and that the registers contain the correct values (don’t worry, we’ll get to it in the next section). Our solution looks like this: Not the most elegant solution, I admit, but it does work reliably. Cool! We can now arbitrarily write to anywhere in the buffer. Managing the row and column markers It’s easier to understand the task at hand if we can understand what valid values look like: : 0-39 (a 40 character row) X : 0-22 (23 total rows) Y No, it’s not a mistake, there are indeed 24 total rows available in the page, but I want to reserve the last line to display a small output so that the user can know how many characters they've written on a line, or alternately, so I can use the space to add hotkey definitions (like in ) to the bottom at a later point if I desire to do so. (col,row) nano The routines to do this that the above code shows are , , , , , and . We can implement all of these by just using (compare to X) and (compare to Y) with immediate values for the bounds to guard against setting an invalid value. Since we're checking the and registers this is also the perfect place to ping the speaker. We won't be exploring how to implement the subroutine in this tutorial, since it is deserving of its own write-up. If you're following along, just remove it (and, if you remove it, you can simplify the below logic a fair amount!) up down left colinc return backspace cpx cpy X Y ping Since a regular keypress will always increment the column, and since the right arrow key will do the same, it makes sense to use the same branch such that our increment logic is uniform. If the column is at position , we then reset the column to and use the logic from to advance the row (if possible). The branch is located next to on purpose so that I can just over the next compare instruction (since one was just performed). Furthermore, this is to facilitate the fact that I don't want the speaker to chirp unless keydown explicitly happens with the right arrow key. Similarly, the backspace command replaces the last value of with the space character and then invokes the routine itself checking if it's OK to decrement and peforming a decrement (i.e. because you want to delete the previous character if possible before clearing the current). 39 0 down right colinc jmp A drawchar after You will also notice that we’re doing a to the label; these branches are part of the original subroutine, so we need to jump back to the end of that routine so that it can clear the strobe bit and then invoke its and pass control back to the core event loop. I am sure more elegant solutions exist here and I plan to explore some in later posts. jmp finkey keydown rts Displaying the row and column markers By the sound of it, this seems to be a fairly innocent problem to tackle. Let’s obtain the character value of a number in C, to demonstrate first-impression simplicity: Job done. Actually, why even do the exercise when you have and all you want to do is display? printf() We unfortunately do not have here, nor the expressive power. We can totally load with the value of with , for example, however this a valid character code! It is just a number! Furthermore, only numbers are character codes (as you can compose characters to display base 10 numbers in a string), so the first example may read a little misleadingly in appearing that you can just add an arbitrary offset like and get one two byte character string, you do this, you compose them. printf() A X txa is not 0-9 57 cannot must Let’s assume for a second that has the value 15 representing the 15th column. We need a way to map that number into two one byte character codes. Consulting the ASCII table above shows us that we can do something similar to the first C example, where we add the number of the numeral that we want to the base address representing the character '0'. In this case, that base address is , so is "0", is "1", and so on. Since our row and column numbers never exceed 99, we only need to account for the "ones" and "tens" place of a base-10 integer. X $B0 $B0+0 $B0+1 Simply put, we can do something like this: Make a counter loop from the number stored in either or until 10 X Y Subtract 10, and each time you do so, increment a counter (“tens place”) The remainder from the previous operation is the “ones place” Add each to and display in order from least significant place to most significant place $B0 Since we only have 3 registers available, and we’re using all 3 of them ( , , and ) to store meaningful data, we have a small problem. How are we going to count if we can't use the registers? This is where my earlier mention of the stack's usefulness comes into play: if we need to modify the registers, we can "save" their previous values by pushing their contents to the stack before modifying them (with instructions ). A X Y pha phx phy To implement the above, we only need two registers. This algorithm may appear a bit strange, but it makes sense because we can directly transfer either (column number) or (row number) to and then subtract from while counting . Then, we can add the base offset for the "0" character to and then write the value of to the text page. Then, just transfer to , add the base offset again, and display. X Y A A X A A X A To DRY things up a little bit, since we’ll be using this for both the row and column values, it would be best to place that logic in its own subroutine. Before using this subroutine, ensure that you invoke to save the values of the current column and keydown char, since we modify by decrementing it and overwrite with 0 at the beginning (which is also great if we have a number like 2, since it represents 0 in the tens place). Why does the subroutine not do this? Let's take a look at the entire implementation that draws the character position to understand: phx pha A X Since we overwrite with a new number for display as we go from column to row, it would be wasteful to restore it only to replace it. Similarly, we don't need to actually place onto the stack since we only need to count. We can restore state at the end once we're finished with the job to be done. A Y X The locations represent the last 7 characters located towards the end of the 24th row (i.e. very bottom right hand corner of the screen). We draw the parens, a comma, calculate the number of 10s, and shuffle the characters into their positions on the screen. At the end, we restore state and to our core event loop, which continues running the program. stal rts Congratulations! You now know how to implement a very very basic text editor on the Apple IIgs by combining all of these basic elements! The end There is a lot more work to be done here. For example, I have not implemented quitting yet, so expect further technical articles picking off from where we’ve left off today. Assembly programming and working around the quirks of an old computer’s architecture are very patience-testing but make for rewarding learning experiences. I started from knowing 0 about the GS or Apple II family, and oddly enough became more motivated that this was a task that I couldn’t really “Google & StackOverflow” my way through, which felt refreshing since I pushed myself harder than I usually would. The total time that it took me to do all this research and implement the program was about a month’s worth of oddly spaced out nights and weekends. I hope that someone out there trying to accomplish a similar thing finds this post useful. I would like to also thank the #A2Central IRC channel and “Apple IIgs Enthusiasts” Facebook group for their willingness to answer my questions. There is a sizable community out there that loves this computer: people are homebrewing Ethernet adapters, making TCP/IP stacks, floppy emulators, and all sorts of new custom hardware and software to keep these machines alive. I was very surprised by this, and at the end of my task I understand why. It doesn’t run Crysis or mine cryptos, but there’s something so “old school cool” about these that you find yourself inadvertently doing with them every now and again. This has been one of the most rewarding experiences I’ve had with a computer in my entire life. Thank you for reading. really something Hello world! On my ROM03! Originally published on my weblog .