Often we hear, learn, and even use terms or phrases that we don’t fully understand. I find this to be quite common within the software development community, whether is be RESTful Web APIs, Agile methodology, Machine Learning, or some other term. This isn’t necessarily a bad thing, but it’s important to understand when you truly know something and when you just know the name for it.
For me, Systems Programming is one such term. I’d like to try and explain, using simple language, what this means.
Before we can understand what Systems Programming entails, we first need to understand what a System is. Software tends to fall into one of two camps, system software and application software.
System software is computer software designed to provide a platform to other software. Examples of system software include operating systems, computational science software, game engines, industrial automation, and software as a service applications.
… Such software is not considered system software when it can be uninstalled usually without affecting the functioning of other software.
System software is a platform comprised of Operating System (OS) programs and services, including settings and preferences, file libraries and functions used for system applications. System software also includes device drivers that run basic computer hardware and peripherals.
System software refers to the files and programs that make up your computer’s operating system. System files include libraries of functions, system services, drivers for printers and other hardware, system preferences, and other configuration files. The programs that are part of the system software include assemblers, compilers, file management tools, system utilities, and debuggers.
The Wikipedia definition is very vague on what is considered system software as long as it is providing services to other applications. However the other two definitions focus purely on the operating system — drivers, kernels, libraries and functions (think kernel/libc header files and shared objects). This implies a close relationship to hardware. If we look at another Wikipedia article on Systems Programming we see:
System programming requires a great degree of hardware awareness.
The article goes on to imply that a core part of system programming is the need for things to be very fast. This makes sense why we would need to know a lot about the hardware. It also makes sense that speed (performance) would be a core part of systems programming if it is a platform to other software.
If the most central part of your application (the system software “platform”) is slow, then the whole application is slow. For many applications, especially at scale, this would be a deal-breaker.
The quotes above and other resources has lead me to the following criteria to define system software:
Examples of what is system software:
Examples of what isn’t system software:
You’ll notice that while Web Service API’s provide a service to other software, they don’t (typically) interact with hardware in order to expose abstractions over it. However there are applications that fall within a middle grey area. The ones that come to mind are high performance computing applications and embedded software.
High performance computing (HPC) applications, such as real-time trading on stock exchanges, don’t typically expose a platform, but it is common for them to write code that interfaces directly with hardware. An example would be bypassing the networking stack offered by the kernel and implementing their own networking stack talking directly to the NIC(s). In this way we can see how HPC software shares many similarities with systems software, by interacting directly with hardware in order to provide the needed performance gains.
Embedded software development also shares many similarities with systems software in that code is written to directly interface with hardware. However, any abstractions provided are typically consumed by the same software and could not be considered a platform.
It’s important to note applications that share similarities with our definition of system software since you’ll likely see those applications/jobs described in these terms (systems software, systems engineers, etc.)
Having defined Systems, we can now define Systems Programming as the act of building Systems Software using System Programming Languages. Simple enough, right?
Well there one thing we skipped over, languages. People often talk about Systems Programming Languages in ways such as “X is great, it’s fast, compiled, and a systems programming language.” But is everyone on the same page as to what a systems programming language is?
Given our definitions of Systems I would define the criteria for a Systems Programming Language to be:
Disclaimer: This is my definition. Since there is no set criteria, I am deriving a definition from what makes sense to me given the context in which I’ve defined system software.
If a language cannot compile to an executable that is directly interpretable by the CPU then it, by definition, is running on a platform (e.g. JVM, Ruby VM, Python VM, etc). There may be some arguments to be made here, but for simplicity I think this is a suitable criteria.
The argument is similar to compiling to a native binary. If the language always requires some other software be present to execute, then it is running on a platform. An example of this is Go and it’s included standard library. It requires support from the OS to perform basic actions such as allocating memory, spawning threads (for goroutines to run on), for its built-in network poller, and other actions. While it is possible to re-implement these core functions, it does create a barrier to use in this context and it is easy to imagine why not all languages, even those that compile to static binaries, are intended as system programming languages.
This one is a bit of a cop-out. However, it is to say that within the system of languages typically classified as systems programming languages, there should not be large (order of magnitudes) differences in performance characteristics. By characteristics I am explicitly referring to execution speed and memory efficiency.
The golden standard for comparison is C and/or C++ as is often represented in comparative benchmarks, which measure execution speed in how many orders of magnitudes slower languages are than C/C++.
The languages that come to mind immediately, given the above definition are C and C++. But there are also newer languages such as Rust and Nim which also fill this niche. In fact, there is already an OS written entirely in Rust (RedoxOS) and a kernel in Nim (nimkernel).
Earlier I hinted at the fact that Go may not fall within the family of “systems programming languages.” However, just like not all applications fit nicely into application software and system software, neither do languages.
Often people will call Go a systems programming language and even golang.org is quoted as:
Go is a general-purpose language designed with systems programming in mind.
However, even this isn’t an outright claim that Go is a systems programming language, simply that it is designed with it in mind. I find that it rather sits in the middle.
While Go does compile to native binaries, contain useful low-level concepts (raw/unsafe pointers, native types such as bytes and int32, and inline assembly support), and it is relatively performent; it still has some challenges to overcome. Go ships with a runtime and a garbage collector.
A runtime means that bootstrapping/overriding of the runtime will be required to run in environments without kernels. This gets more into the internal implementation of the language, which could change in future releases. Changes require additional bootstrapping work as the language evolves.
A garbage collector (GC) either means that Go is restricted in what application domains it can be used or that the GC must be disabled and replaced with manual memory management. In the case that GC cannot be replaced, the real-time domain (defined by operations which must complete within given time-bounds and/or the performance is measured in nano-seconds) would not be able to risk non-deterministic pause times of a GC.
With growing talk of distributed systems, and applications like Kubernetes becoming very popular, we get to hear a slew of new vocabulary that (if we’re being honest) most of us don’t fully understand.
To this point, I’ve seen the terms systems programming and systems engineers used in contexts where what they really meant was distributed systems programming and distributed systems engineers.
We’ve defined system software, systems languages, and systems programming in this post. However, when we talk about distributed systems, the meaning of system changes. And while I’m not going to dive into the specific differences here (mainly because I still need to better grasp them myself), it is important that we make those mental distinctions and use more exact speech when we can to avoid confusion to those still learning the space.
I hope you’ve enjoyed this article and please leave any comments if you’d like to continue discussing. You can keep up to date on the latest posts by following me, John Murray, on Medium and please 👏 if you enjoyed the post.