Screamin’ Speed with WebAssembly

Written by tomlagier | Published 2017/06/30
Tech Story Tags: javascript | web-assem | performance | d3js | debugging

TLDRvia the TL;DR App

A Tale of Javascript Performance, Part 5

This article continues my series chronicling my investigation into JavaScript performance by creating HeapViz, a visualization tool for Chrome memory profiles. On the ticket today is a look at how I used WebAssembly to get a large performance increase on my most expensive operation. If you missed it, check out part 1, part 2, part 3, and part 4.

WebAssembly is a hot topic in the Javascript ecosystem right now. The ability to compile a handful of mature languages with granular memory management (currently C, C++, and Rust) to an extremely performant assembly-like language has a lot of appeal. While it might seem like it’s a new development, the toolchain for building it has been around for over 5 years so there are plenty of resources out there to help developers of any skill level get started giving their apps a huge performance boost.

Understanding the Layout

If you’ve been following along, you know that I’ve come to a point where I’ve optimized my loading and parsing, and my rendering. The remaining bottleneck, as a couple of you astutely noted, is my layout algorithm. I mentioned that one of the major tradeoff of doing a circle-packing layout is that computing the layout is quite expensive.

Why is that? Well, if you’re the algorithmically inclined sort, take a few minutes and think about how you would write a process for doing it. Go ahead, I’ll wait.

Not so easy, is it? Circle packing is actually a really challenging problem. There are entire internet communities devoted to finding new and better ways to pack circles with a variety of restrictions. For our particular variant, known in the field as the “unequal circles in a circle” problem, I am leaning on d3-hierarchy’s pack module. This is an implementation of a combination of Wang, et al’s sibling packing algorithm and the Matoušek-Sharir-Welzl algorithm (check out this beautiful explanation by Mike Bostock, the creator of d3).

Both of these algorithms are recursive, with a high degree of complexity and a lot of integer math. They also are completely self-contained, and together make up the majority of the actual work my app needs to perform for any given layout. This makes them ideal candidates for compilation into WebAssembly! I decided to take the plunge and rewrite my hot path in a language that I can compile to the fastest execution context the web offers.

WebAssembly 102

This cartoon post does a great job of explaining what WebAssembly is and why it matters. If you’re unfamiliar with WebAssembly, give it a read — it’s the best 5 minute introduction I have found. I’m going to skip past the overview and jump right in to implementation details.

To compile any language to WebAssembly, you will need the Emscripten toolchain. I started by following the instructions and tutorial here to get that installed, the emsdk compiled, and my first hello.c running in the browser.

Output Format

One thing that might strike you as curious is the output format of emcc. While it can be used to output straight WebAssembly, there is a decent amount of setup that needs to be done on the browser to actually load and run the .wasm file. When you run emcc with the -o {filename}.html flag, Emscripten helpfully generates a loader index.js file and a sandbox index.html file that you can use to play around with your WebAssembly code.

The Emscripten sandbox

This sandbox is extremely valuable for debugging, it gives you a quick way to REPL your module and captures all of the output of your module on screen. I found myself, while debugging my build, running python -m SimpleHTTPServer 3005 in the output directory of emcc for a super light little development environment.

Once I had finished my development, I incorporated that same loading boilerplate into my app bundle, which exposes the Module object Emscripten uses to mount your WebAssembly interface to.

While it is easy to do development like this, it is not particularly efficient. You need to do a lot of manual reloading and copy your compiled files over to their target project after you are done. I think there is an opportunity for a Webpack plugin that will take a source c, cpp, or rust file, perform the emcc compilation, and automatically include the results in your project.

Bindings, Three Ways

After you have played around a bit with hello world and the sandbox, you will quickly see the need to do Javascript <-> WebAssembly bindings. A binding is a way to reference Javascript from WebAssembly, or vice versa. It describes the interface between WebAssembly and the rest of your code.

One problem I had getting started with WebAssembly was the sheer number of different ways you could do these bindings. By my count there are three unique ways to do WebAssembly -> Javascript:

  • ccall, cwrap and direct function calls
  • WebIDL
  • Embind

The first two ways of doing bindings are for C code only. They involve passing a command line flag to describe your exported functions:

emcc test.cpp -s WASM=1 -s EXPORTED_FUNCTIONS=”['_int_sqrt']" -o index.html

With this EXPORTED_FUNCTIONS flag, you tell the compiler to make a binding to that particular function. You can then reference it from Javascript in a number of ways:

That last method of invocation seems like it would be the simplest, and it certainly is for integers and strings. For more complicated data structures, however, it gets quite a bit trickier as you need to tell it exactly how it can convert your Javascript data into a structure that C can understand.

I would recommend this style if the code you are optimizing is constrained to a single function or small set of functions that take relatively simple input — numbers and strings are perfect. If you are going to be passing around larger structures, I would instead suggest writing in C++ so you can leverage Embind and WebIDL.

What is WebIDL?

WebIDL is a W3C specification that describes a standard language for interfaces between external code and the browser. Similarly, it can be used to describe the interface between WebAssembly and Javascript. It uses a very familiar C++-style notation:

For WebAssembly, Emscripten provides a python script that consumes an IDL file and turns it into glue code that you then include when compiling your wasm. Unfortunately, I was not able to get this pipeline working well for my classes and so I decided to try Embind instead.

I do think that as the WebAssembly ecosystem matures further, WebIDL will be the preferred method for describing the interfaces between WebAssembly and Javascript. It is a standard component of the browser contract already and provides a very logical way of expressing these interfaces.

Embind to the rescue

Embind allows us to write WebAssembly to Javascript bindings from directly within our C++ code. I found its syntax a little opaque at times, but it is by far the most well-supported binding method and worked well out of the box.

When using Embind, you provide a block of binding declarations somewhere in your cpp file. It looks like:

This declares 3 classes that I can now access through Javascript off of Emscripten’s Module object, and also exposes a meta-class of VectorHNode that describes an array of HNodes. I can use these (mostly) just like any old Javascript class, e.g.:

One thing to note is that I have declared the allow_raw_pointers flag on the Hierarchy.pack method. This is to tell Embind that I am going to use pointers of any kind as either an input or a return value from this method. Embind doesn’t have any way of type-checking these pointers so this is basically an opt-in to undefined behavior and Bad Things™ if I screw up either the method call or the handling of the pointer inside the method.

How do I debug this stuff?

Let’s assume that I goof and pass the wrong thing to my allow_raw_pointers()'d method — what then? Luckily, we have several lines of defense against this sort of issue. The first is through the compiler — it will catch syntax errors and type mismatches, within your code and between your bindings and your code.

The wrapping code that Emscripten provides will also do its best to provide helpful error messages:

These combined will get you a good deal of the way, but when you’re instrumenting something more complex you will eventually get an error within WebAssembly that you don’t have a good way to debug. Thankfully, Emscripten provides a killer feature for c and cpp files — sourcemaps!

If you run your compilation with the -g4 flag, you will get bona-fide browser sourcemaps for your compiled code. While you don’t get the actual context, you do get breakpoints. Combined with robust logging from within your app, you can diagnose and fix pretty much any issue.

I will say though that while in-browser debugging via is pretty great, it doesn’t take the place of a test suite and C++ debugging environment. If I was going to do it over again, the first thing I would do would be to set up NetBeans with a CppTest suite. It might have taken a couple of hours to get going properly, but it would have easily paid itself off over the course of the project.

The final compilation command

After a lot of trial and error, I ended up with the following command:

# for devemcc hierarchy.cpp -g4 --bind -o index.html -s WASM=1 -s ALLOW_MEMORY_GROWTH=1

# for prodemcc hierarchy.cpp -O3 --bind -o index.html -s WASM=1 -s ALLOW_MEMORY_GROWTH=1

-g4 sets the debugging verbosity level to maximum. This includes sourcemaps in the output

-O3 sets the optimization level. It is incompatible with the -g flag

--bind tells Emscripten to use Embind bindings.

-s WASM=1 tells Emscripten to emit WASM instead of asm.js. It is worth testing both outputs as I have read that for some applications asm.js can actually be faster than WebAssembly.

-s ALLOW_MEMORY_GROWTH=1 was necessary for my application to avoid depleting the pre-allocated WASM memory pool. If you can get away with not setting this, you should as it incurs a performance penalty.

Is it faster?

With my classes bound, my bugs squashed, and everything importing into my project correctly, the last step was to actually run some benchmarks. Was the hassle of translating my hot path into WebAssembly worth it?

Here is the code I used to run the benchmarks:

time in ms to render a given number of nodes

The results confirm that, yes, WebAssembly really is fast. It was over 30% faster for my large datasets, and maintained that separation between 220k and 1 million nodes. Interestingly, it was not significantly faster for my smaller dataset, implying that the startup incurred by Emscripten’s wrapper code might outweigh the WebAssembly performance increase for small amounts of data.

One million circles

After all of that, the goal was to render my million circle data set. Without further ado, here is one million circles laid out and drawn by the browser:

Isn’t it beautiful?

We did it! A million circles in the browser! A fully interactive accounting of every single byte of memory in a large web app.

Join me next time as I unveil the tool I made so you too can explore your heap profile in a cool interactive visualization.

Up next —Introducing HeapViz: A new way to debug memory issues


Published by HackerNoon on 2017/06/30