These days my preferred programming environment is a Smalltalk-80 descendent called Pharo. The system allows me to quickly explore ideas, change code and I am always surprised how little code I need to get things done. I use the word environment as the programming language is just one part of it. It is the combination of compiler, editor, debugger, inspector that determine how quickly one can solve problems and I sometimes I am painfully reminded of how much ahead Pharo is, especially when writing code that resolves around edit, compile, run, wait, crash, add printf, repeat, wait, crash at same place but with more info, cycles.
But I am getting ahead of myself. I started my programming career in junior high by typing my first lines of imperative code into a terminal connected to a Unix system of my school. We used a language called ELAN and solved problems like finding primes, building guessing games, etc.. Other imperative languages followed and for a brief period I wrote things in Java but then started to mostly read and write C++ and a bit of C. In University I was exposed to a fair amount of Haskell and since then I wrote scripts in Python and for the last couple of years I learn Smalltalk.
Smalltalk-80 was literally ahead of its time. It started as a bet that the semantics of a language could be described on a single piece of paper and it ran on computers that utilized Moore’s Law. That means the very expensive computers built at Xerox Parc represented what would, thanks to Moore’s Law, become affordable in just a couple of years. The issue is that people wanting to use Smalltalk-80 couldn’t wait until the 90s and it was trapped in the ivory tower as a language for the elites with expensive computers and deep pockets.
From time to time I try to leave my personal ivory tower and explore other programming languages. I look at Erlang, Rust, Go and for the last couple of weeks I am debugging, fixing and writing Go code to learn the ups and downs of the language. Today I want to write about being able to reason about runtime failures and what a language/runtime can provide to make one more productive.
I was parsing a network protocol where inside the message it tells me where to find the next header. In Go the message could be represented by a []uint8 (an array of unsigned int taking 8-bit each, in Go read types from left to right) and I wrote something basic like that:
func NextHeader(message []uint8) []uint8 {offset := message[2]len := message[2 + offset]return message[2 + offset + 1 : len]}
The experienced Python and Go programmer will spot the issue immediately but when running the resulting application it sometimes would work but for some messages it would abort with:
panic: runtime error: slice bounds out of range
goroutine 1 [running]:
main.NextHeader(...) slicetest.go:8
The panic lacks information about which slice and what bounds. As this only happened on some of the messages I was parsing I resorted to debugging it like:
func printSliceBounds(message []uint8, begin, end uint8) {if recover() != nil {fmt.Printf("%#v[%v:%v]\n", message, begin, end)}}
func NextHeader(message []uint8) []uint8 {offset := message[2]len := message[2 + offset]defer printSliceBounds(message, offset, len)return message[2 + offset + 1 : len]}
This will make sure that after NextHeader and before returning to the caller the printSliceBounds function will be called. In case the code is recovering from a panic the formatted slice and bounds will be printed. I wanted to build a testcase with this slice but as soon as I tried to compile the code the static analysis of the Go compiler pointed out the error.
Until here this might be all pretty standard for you but I had the pleasure to work in the ivory tower. So let me tell you about of where I come from. In my world everything is an object. This includes the stackframe/activation record of the running execution and is referred to by the reserved keyword thisContext. It is an object like any other object in the system and it can be queried and manipulated. I can query the Program Counter, I can read objects from the stack, I can walk the call hierarchy, I can access the Abstract Syntax Tree of the method (a method is just another object) currently executing but I can also modify the Program Counter, context chain, etc.
Smalltalk a slice error would be signaled as an Exception. In contrast to many other languages the exception handling code is not part of the Virtual Machine but a part of the runtime package and built on top of thisContext. When developing I would get a debugger in my environment that let’s me inspect the slice but when running in production I might consider to serialize the exception and the entire running process to a file. There is a nice demo of running Pharo on Lambda AWS and taking a “crash” and debugging it locally.
As a thought experiment you can ask yourself how you would implement a debugger in C that runs a halted method until its end and how an implementation in Smalltalk would like? The beauty of a system is that with the right set of abstractions one can build things that are normally very time consuming to build. The easiest way to do it in Smalltalk is to create a new context, put a trap/halt/breakpoint/continuation into it and place it in between the current method and the original caller and then resume the execution of the process. Here one can see a real implementation using this for GNU Smalltalk.
Exception handling has its problems and it is fair for Go to have picked a more explicit path for error propagation and panics. I started to check what it would take to add a high level out of bounds object that carries the slice and the bounds and explored the Go sourcecode. The nice thing is that the entire system builds quite quickly (e.g. compared to gcc/libstdc++). So here are my findings after working on it. The first one is the place where the compiler is generating the bounds checking:
In src/cmd/compile/internal/gc/ssa.go// slice computes the slice v[i:j:k] and returns ptr, len, and cap of result.// i,j,k may be nil, in which case they are set to their default value.// t is a slice, ptr to array, or string type.
func (s *state) slice(t *types.Type, v, i, j, k *ssa.Value) (p, l, c *ssa.Value) {
...
s.sliceBoundsCheck(i, j)if j != k {s.sliceBoundsCheck(j, k)}if k != cap {s.sliceBoundsCheck(k, cap)}...
The compiler will generate up to three checks for your invocation of the slice code. i, j, k are user supplied and can be null and then the default values zero (i), len (j) and cap (k) will be used. The first check will always be generated and is to check that 0 ≤ i ≤ j, the second if generated will check 0 ≤ j ≤ k (length ≤ capacity) and the last will check capacity ≤ existing capacity.
The ssa of a simple check might look like this:
func Slice(slice []uint8, start, end int) []uint8 {return slice[start:end] // to defeat the static analysis}
Slice <nil>
b1:v1 = InitMem <mem>v9 = Arg <int> {start} : start[int]v10 = Arg <int> {end} : end[int]v28 = Arg <int> {slice} [16] : slice+16[int]v32 = Arg <*byte> {slice} : slice[*byte]--- Load registers and check 0 <= i <= j here.v7 = LoadReg <int> v9 : AXv15 = LoadReg <int> v10 : CXv31 = CMPQ <flags> v7 v15ULE v31 -> b2 b3 (likely)b2: <- b1--- Load k into register and check 0 <= j <= k here.v34 = LoadReg <int> v28 : DXv8 = CMPQ <flags> v15 v34ULE v8 -> b4 b3 (likely)b4: <- b2... create the new slice here and return it
--- Error case jumped to from b1 (0 <= i <= j) or b2 (0 <= j <= k)b3: <- b1 b2v18 = CALLstatic <mem> {runtime.panicslice} v1Exit v18
At the time panicslice is called the information about the slice and which of the three checks failed is gone. After some exploring I can pass the error kind to panicslice, the slice and bounds as well. The first issue was to replace the manual Common Subexpression Elimination (CSE) that assumed that each line of code can only have one kind of panic but then I faced a fundamental issue. The src/runtime/panic.go is at the bottom of the food chain and I can’t just use fmt.Errorf to format a new error in there. First of all the system might be out of memory and second fmt.Errorf might use broken slice code itself and then the panic code would recurse into more panics.
At that point I explored if I could use defer on the outside as a generic error handler. The idea is to install something in the main() that will print the faulting slice and the bounds in the error code. It would use recover() and somehow check if it went through panicslice and then somehow find the variables and then maybe even create the boilerplate for a testcase.
While Go has built-in reflection and Program Counter to function code it doesn’t have seem to have the same for data (the stack pointer). To compare this to Smalltalk thisContext, Go gives us static information (function name, line number) but lacks dynamic information (variables in the stack). In theory it could be added (IIRC the debug format DWARFv2 has support for tracking the locations of variables) but it doesn’t seem to be used in runtime/symtab.go.
I ended up making a proof of concept and patching the compiler like this:
// Panic if slice indices are not in bounds.
s.sliceBoundsCheck(i, j, v, i, j, s.constInt(types.Types[TINT], 1))if j != k {s.sliceBoundsCheck(j, k, v, j, k, s.constInt(types.Types[TINT], 2))}if k != cap {s.sliceBoundsCheck(k, cap, v, k, cap, s.constInt(types.Types[TINT], 3))}
And the runtime code like that. Make the SliceError an errorString but provide high level information about the error:
type SliceError struct {errorStringSlice interface{}Start intEnd intErrorKind int}
func panicslice(slice interface{}, len int, start int, end int, kind int) {panicCheckMalloc(sliceError)panic(SliceError{errorString: sliceError.(errorString),Slice: slice,Start: start,End: end,ErrorKind: kind})}
Using a defer/recover approach I can get an output like the following:
runtime.SliceError{errorString:"slice bounds out of range", Slice:<nil>, Start:10, End:6, ErrorKind:2}
Here I was trying to slice an array of length six to the 10th position. It resulted in the second check failing and there is an error putting the slice ptr onto the stack (it is formatted as nil). From my point of view this is a lot more helpful that printing “slice bounds out of range” (writing foo and printing the line number would already point to the slice).
What is next? So the proof of concept works but the issue is that a lot more code is generated. This includes setting up the stack for the panicslice code and one error path per check emitted. I don’t think it is practical for upstream Go code. It might be useful behind a debug flag. What I think would be useful is to have a thisContext like concept that allows to access data as well. This way an error handler could extract the slice and bounds from the stack or registers. What do you think and how do you debug your go programs? How do you capture bad data in production?