I was lucky enough to get my hands on a BBC micro:bit, courtesy of Nicholas Tollervey a while back, and wanted to see how far I could push python on it (I started this post in May last year!). This is my story of building a game on the micro:bit using a $2 external display.
The bbc micro:bit is a tiny computer, a million of which have been to English (and Welsh!) school children and schools. It’s got lights, buttons, USB, bluetooth, and runs Python!
The hardware is fairly limited. It’s built around the nRF51822 chip, which clocks in at 16 MHz, and has 16 KB of RAM. The flash space is slightly more generous, with 256 KB space available. This makes doing more complex things quite challenging and fun.
The board has two buttons, and a mini screen made up of 25 LEDs which is great for getting people started, but very limiting when drawing asteroids and spaceships! To make things more exciting, I added this tiny display:
Which you can buy for about $2 online. I’ve talked about this display before.
To connect the two, I bought an edge connector from Kitronik, which turns the clever edge connectors on the micro:bit into normal pins for electronics prototyping.
Given this, and a pinout from the micro:bit site, that shows the I²C pins (The wire protocol the display uses), and which pins can be used to power the display, I could connect them up. Unfortunately, the pins for the micro:bit and the display didn’t match up exactly, and I had to patch together an adapter that crossed two of the wires, to allow the display to plug directly into the breakout board. I then covered the connector in sugru to make it look a bit nicer:
Micropython is kinda awesome. An independent implementation of the Python3 language, by Damien George, it runs on low-power/memory chips, and is designed for embedded use.
Ultimately, it’s still Python which, for all its benefits, isn’t designed to be super fast, or memory efficient, both things that traditionally are critical for games. This makes a few things quite tricky, and I’ll explore them below.
Here’s a video of the game playing:
My idea was to create a simpler cross between asteroids, and space invaders, with boulders falling from the top of the screen, and the player having to fly a space-ship to dodge them.
The source code is on github: https://github.com/stestagg/bitflyer
While making this, I hit up against a few of the limitations of the platform, and I’ll discuss these below:
Speed actually wasn’t a massive issue. the chip is clocked at 16 Mhz, and micropython is pretty good making use of these cycles.
There were some cases where I moved some bit-twiddling code into a C-module (see below) which helped with some bottlenecks.
The main speed issue was with updating the display. I²C isn’t really designed for super high-speed communications. The nRF51822/ssd1306 can communicate with a clock speed of 1 Mhz, but with I²C wire and handshake protocols, this really translates to between 25–40 KBps (bytes).
To update the entire display is 1 KB of data, so doing this many times a second (the cpu is busy the whole time communication is happening) doesn’t leave much room for doing other things.
Memory was a massive issue. The micro:bit has 16 KB ram, which isn’t very much, especially when using python which heavily relies on dynamic allocation.
The following map shows how the RAM is divided up for micropython (with a lot of the larger built-in modules disabled):
Map of the RAM usage of the bitflyer code. Labels indicate which system uses that section of memory. Each square represents 32 bytes of RAM
The important part is the “micropython heap” section. All normal python objects live within this range,
All normal python objects live within the green ‘micropython heap’ in the above map, which is only able to use ~ half of the total ram.
Traditionally with embedded systems, all the code is compiled, and added to the flash memory, which on the micro:bit is 256 KB (loads of room!). With python, while the source code can live in flash, micropython has to runtime compile this into objects in RAM. Adding lots of complex code to avoid memory usage in this case isn’t usually an option, as that complex code will also use the same pool of memory as the object you’re trying to compress.
I didn’t dig into the internals of the micropython parser/compiler, but things as simple as stripping comments and using pyminifier significantly helped with fitting the code in memory.
In the end, there was a lot of trial-error, changing what variables were used, and in what order, to find a model that reliably ran. I also had to tweak the python heap and stack sizes (make them smaller) to suqeeze everything in.
I decided early on to use a display buffer. the display chip I used (ssd1306) has an internal frame buffer, but there is no easy/obvious way of reading current values from it, only for updating them.
I could have implemented a method of clearing/repainting sections of the screen based on keeping track of what objects have moved since last frame, and calculating the delta, but this method has some drawbacks: the code/memory cost of tracking what has changed each time is quite high, and communications with the display are fairly slow, so lots of small update commands would introduce noticable lags/delays.
Instead, I decided to keep an in-memory ‘view’ of the screen, and use that to do efficient updates each time. This was a big cost, the screen has 128x64 pixels, with each pixel either being on or off, so the screen buffer is 1 KB, or 1/16 of the machine total memory. The advantage of this approach is that updating the screen is far simpler, as you just have to keep track of which rectangle of screen contains changes, and do a single bulk-transfer of data to update the display.
Originally, I used a python array to store the screen buffer, and immediately ran into crippling memory issues. The problem turned out to be fragmentation. The micropython heap was only 9 KB and having a big 1 KB slab of unmovable data somewhere random in the middle of this caused a lot of pressure on the allocator when trying to fit small objects around it.
I also ran into some difficulties copying bitmap data into the array efficiently, using the ‘|’ operator in a loop was a bit slow.
To fix both these issues, I added a c module to micropython that provides a display buffer interface to a statically allocated 1 KB buffer, along with some blit functions written in c.
The display has an odd memory layout. There are 64 pixels vertically, but these are divided into rows of 8 pixels. Each row maps the bits in 1 byte to the vertical pixels, as shown here:
The ssd1306 internal memory layout
This non-standard format meant I couldn’t really use the micropython inbuilt image module, but instead wrote a script to convert the images to c-arrays packed in the above format, and included them as static variables.
The abstraction/specialisation tradeoff arguments are complex, and can get heated. I personally am very against more than 1 or 2 abstraction points in any stack, and the rant below is based on this opinion. I’m throwing sticks at a working implementation, without proving that an alternative approach is better (or sending a pull-request) so this isn’t really fair. I do think a friendly debate around this could be useful however. Please be aware that the following sentiments may not be universally shared.
What Damien managed to achieve by porting micropython to the micro:bit in such a short period was amazing, I don’t know the exact times, but I understand it was an impressively quick turnaround.
One way this was done was to leverage existing frameworks to achieve this, many of the promoted/managed by the mbed ecosystem. The cost of using the mbed tools is layers, the mbed tools all build on tools that build on underlying tools. and several of these layers add minimal value, but all come with a cost.
For example, to build the bitflyer/micropython binary is basically a case of calling the gcc arm cross-compiler on the source, with a pre-defined arguments, and linking it together with an nRF51822 ld script to give the right memory layout.
The mbed approach is to use a tool called yotta. Yotta is a package manager for ‘IOT’ platforms, and to use it, you need an ARM account, and to give ARM access to your github account.(I get the feeling ARM are trying to build a lock-in platform here).
Micropython has some extra build stages, so has a Makefile wrapper around yotta, to so some things, then call ‘yt build’
YT build ends up creating a CMake config (> 1.7 k lines) for the project, and running it to generate a ninja build configuration (> 2.8 k lines).
Ninja then runs, and invokes gcc/g++ for each file in turn, then links them together.
And it all works fine, a binary is produced. But if you’re trying to work out where all your RAM is going, you need to see the output of ld, and to do that, I had to discover, and trace through each layer, of the stack to find where the relevant information was being output. Luckily I didn’t need to actually change the ld command line arguments, as that would have been much harder.
Now, I understand, ARM are trying to build a magic platform that houses 1,000s of off- the shelf components, that can all cross-compile with lots of platforms, and provide one-click, ARM delivered, value to all IOT developers, but in this case, the magic wore off quite quickly.
The other case where this issue came up was in the code. Micropython is just an execution environment, so to integrate it with different hardware, it has a layer called mphal (micropython hardware abstraction layer). This turns the common module functionality into device-specific calls.
the micro:bit port of micropython uses a project called microbit-dal (device abstraction layer) which, for a long time, required personal approval from someone at University of Lancaster to access the code (this is still mentioned in the readme)
The microbit-dal is built on the mbed platform, so calls down to the mbed-classic code to do much of the heavy lifting.
mbed-classic is a hardware abstraction library over a set of IOT-related functionality, and ends up calling down to a yotta_target for the nrf for actual platform specific needs.
One of the side-effects of all these layers can be seen in the memory map above. There are two separate heaps in play on the microbit, the micropython one, and the mbed-classic one. For 16 KB memory, that’s a lot of heaps!
The lines of code involved in this stack quickly start to add up, and tracing code paths through all the layers gets old quickly:
Writing bitflyer was a really fun and rewarding challenge!
The experience is not too far from being doable by more senior/technical students. If some more memory can be freed up, and the helper modules tidied up a bit, having a school activity to solder up the display connector, and write a simple game should be easily doable.
Most of the hard work in writing the c modules etc, should be re-usable with some tidy-up, allowing cool projects to be built on top without too much technical knowledge.
My next step for the game would be to add sound, using a simple piezoelectric speaker, and the micropython music module.