Hackernoon logoC++ to WebAssembly using Bazel and Emscripten by@s0l0ist

C++ to WebAssembly using Bazel and Emscripten

image
Nick Angelou Hacker Noon profile picture

@s0l0istNick Angelou

Startup Advisor, Entrepreneur, Full-Stack Developer, Cybersecurity Professional, Privacy Advocate

How to get Bazel and Emscripten to compile C++ to WebAssembly or JavaScript

In my quest to generate a re-usable WebAssembly template, I discovered many ways that appear easy to implement, but don’t really work in applications beyond a simple ‘hello world’. If you’re interested in building slightly more complex WebAssembly modules, this article is for you.

Ultimately, I’m looking to compile a nice C/C++ library to the JavaScript domain — I’m not looking to build a specific one-off functions, I want the entire library support (or at least the majority). Additionally, I want this to run in the browser as well as NodeJS. I don’t want to deal with instantiating the WASM or manage its memory across these environments either. These requirements mean I can rule out several alternatives…

Spoiler: Use Emscripten with Bazel. Shortcut to the github repo.

What’s out there?

There are many starting points and many tutorials on how to get started using C/C++ with JavaScript. Here are a few tools you would find in the wild:

Some of these tools are not exactly what I want…

WAT — requires a lot of low-level work and only makes sense for simple one-off functions. 👎👎👎

N-API — is not much better. I’m still writing a lot of bindings, need to worry about node-gyp, and it may not work in the browser. 👎👎

LLVM — allows us to directly compile C/C++ to WASM! Unfortunately, this is still a low-level job that requires me to perform a lot of extra steps just to get it working. 👎

Wasmer  actually looks great! They’re a relatively new player and support lots of integrations. Event their builds are relatively lean! Unfortunately, they still require a lot of glue to the native C/C++ code which is not really pleasant for larger projects. However, they are working on better integration support and are moving rather fast. 👍

Cheerp — another great tool. They’re similar to emscripten, but have a different memory model that allows for automated garbage collection. The performance is quite similar, often beating emscripten in special cases. However, the community support is not quite as large and I found myself getting stuck. I’ll keep these guys on the radar. 👍👍

Emscripten — just right. Integration with C++ is made extremely easy by using embind. I can pass non-primitive types between both domains (C++ only). They have a larger community presence. They can output into a format that is relatively straight forward to use in the browser or NodeJS with ease. 👍👍👍

Getting started

I’ll showcase a simple “hello world” C++ application that we will convert to WebAssembly.

How do I convert an existing C++ library?

This is the crux of it all. Every toolchain has some initial difficulties setting up and I’m often left scratching my head on where to even start. No one wants to manually invoke gcc so we built scripts such as 

configure
make
, or 
cmake
 to automate the build process — great!

…except, not 🙁

Sometimes I’ve needed to hack the existing make/cmake rules to avoid dependencies on shared libraries, ignore some intrinsics checks, etc. This obviously doesn’t play nice with a centralized C++ code base that attempts to build bindings for many languages. So what are our options?

Bazel 💚

— a fast, scalable, multi-language, and extensible build system.

While this build system can be quite daunting, it is actually very powerful. Unfortunately, there’s just not that much documentation to learn to use it with emscripten. In fact — their docs are broken, more broken, and maybe not even supported.

I argue that it can be done decently well — even the reputable TensorFlow.js team has managed to get it working! So what was so difficult? What makes it so special?

After converting several libraries to WebAssembly, I can tell you that the isolation Bazel offers is quite nice — no horrible breaking changes when a cmake script has been modified. No more complex logic determining the target to build, etc. Once defined it will almost always just work.

First steps

Install Bazel. You will also need yarn to install the dev dependencies.

Fast forward a bit, here is the github repo so you can follow along.

Note: I’ve taken a lot of inspiration from the TensorFlow.js project on how they managed to get it working. My changes revolve around compiler/linker flags, showing how to output both JS and WASM, and most important — using the latest emscripten release 🎉!

git clone --recurse-submodules https://github.com/s0l0ist/bazel-emscripten.git
cd bazel-emscripten

yarn install

I’ve taken the liberty to include the emsdk as a git submodule instead of managing it yourself. The first step is to get the emsdk cloned. If you’ve cloned my repo recursively, you can skip this step:

yarn submodule:update

Next, we need to update the release tags and then install the latest version of emscripten:

yarn em:update
yarn em:init

Done 🎉!

The layout

Some important files and directories:

  1. .bazelrc
     — describes default commands for building a target
  2. WORKSPACE
     — defines our external dependencies

Some files inside 

hello-world/
:

  1. BUILD
     — empty file so bazel doesn’t complain
  2. deps.bzl
     — bazel toolchain dependencies (emsdk)

A few directories in 

hello-world/
:

  1. cpp/
     — holds the simple C++ sources
  2. javascript/
     — holds all JS related material
  3. javascript/bindings/
     — holds all emscripten bindings
  4. javascript/src/
     — holds all JS wrappers
  5. javascript/scripts
     — the handy build scripts to shorten our cli statements
  6. javascript/toolchain
     — the heart of the Bazel + Emscripten configuration

The rest is self explanatory.

The code 💻

I’ve outlined a very simple library containing Greet and LocalTime classes that have static methods for this example:

LocalTime class:

//////// cpp/localtime.hpp ////////
#ifndef LIB_LOCAL_TIME_H_
#define LIB_LOCAL_TIME_H_
namespace HelloWorld {
class LocalTime {
        public:
        /*
        * Prints the current time to stdout
        */
        static void Now();
    };
} // namespace HelloWorld
#endif

//////// cpp/localtime.cpp ////////
#include <ctime>
#include <stdio.h>
#include "localtime.hpp"
namespace HelloWorld  {
void LocalTime::Now() {
        std::time_t result = std::time(nullptr);
        printf("%s", std::asctime(std::localtime(&result)));
    }
} // namespace HelloWorld

Greet class:

//////// cpp/greet.hpp ////////
#ifndef LIB_GREET_H_
#define LIB_GREET_H_
#include <string>
namespace HelloWorld {
class Greet {
        public:
        
        /*
        * Greets the name 
        */
        static std::string SayHello(const std::string &name);
    };
} // namespace HelloWorld
#endif

//////// cpp/greet.cpp ////////
#include <string>
#include "greet.hpp"
namespace HelloWorld {
std::string Greet::SayHello(const std::string &name) {
        return "Hello, " + name + "!";
    }
} // namespace HelloWorld

Emscripten bindings 🦾

The bindings are quite short for our example. We make use of the powerful embind which lets us talk to C++ classes.

You may notice that 

LocalTime::Now
 outputs directly to stdout. Emscripten is intelligent enough to redirect our output to
console.log
 so we don’t need to do anything else 😎.
Greet::SayHello
 returns a primitive string that we will manually need to send to 
console.log
.

//////// javascript/bindings/hello-world.cpp ////////
#include <emscripten/bind.h>
#include "hello-world/cpp/greet.hpp"
#include "hello-world/cpp/localtime.hpp"
using namespace emscripten;
using namespace HelloWorld;
EMSCRIPTEN_BINDINGS(Hello_World) {
    class_<Greet>("Greet")
        .constructor<>()
        .class_function("SayHello", &Greet::SayHello);
      
    class_<LocalTime>("LocalTime")
        .constructor<>()
        .class_function("Now", &LocalTime::Now);
}

Now that we’ve defined our bindings, we’re ready to build!

Building 🏗

You may build the native libraries, but they’re quite useless by themselves…

bazel build -c opt //hello-world/cpp/...

I’ve configured the 

.bazelrc
 file to build the with two different options: JS or WASM.

JS — Specifies flags to emscripten to output a single asmjs file that does not contain any WebAssembly. This is useful for environments that can’t work with WebAssembly such as React-Native, but is significantly larger and slower.

WASM — Specifies flags to emscripten to output a single JavaScript file containing the WebAssembly as a base64 encoded string. This means we don’t need to manage a separate 

.wasm
 file in our bundles or figure out how to properly serve this file in the browser. The drawback is a larger file size due to the base64 encoding.

To make it simple, I’ve created some helper scripts so all you need to do is run the following:

yarn build:js
// or
yarn build:wasm
// or both
yarn build

There are some good and bad things about using emscripten here:

Good: It generates glue code for you automatically.

Bad: It generates glue code for you automatically.

Obviously, the glue code adds some bloat but keeps me from having to deal with the intricacies of initialization 👌.

Note: In 

.bazelrc
 there are a few defined compiler flags that are present for both the JS/WASM builds geared towards production use. You may feel free to modify the flags as necessary, but I wanted to show what’s possible here.

If you do want to have full control over instantiating the WASM to reduce the bundle size, you may generate a pure WASM build by adding the link flag 

-s STANDALONE_WASM=1
 inside the starlark file, 
hello-world/javascript/BUILD
.

Bundling 📦

You may have seen the 

javascript/src/implementation
 files which wrap the emscripten output. Do we really need these files? — no, you don’t. However, I like my APIs to be abstracted from the output of emscripten. This allows for more flexibility when there are potentially breaking changes to the C++ core.

An important thing to note is that the outputs are quite a bit larger than you would expect. A big reason for some people is that some code requires 

<iostream>
 where a lot of code is pulled in for static constructors to initialize the iostream system even if it is not used — but our builds don’t have this problem. Then there is the glue code auto-generated to manage initialization and provide helpers for memory allocation, resizing, and the like.

Generate the bundles

yarn rollup

This gathers the files in 

hello-world/javascript/bin/*
hello-world/javascript/src/*
 and produces a few output bundles in 
hello-world/javascript/dist/
.

You will notice two minified bundles for 

js
 and for 
wasm
 that each have two different targets for 
ES6 module
 support or 
UMD
 (for browser and NodeJS) in 
hello-world/javascript/dist/<js|wasm>/<es|umd>/*
.

Details of the rollup configuration are in rollup.config.js.

Let’s run 🏃

So we’ve compiled our C++ to JS and WASM — what’s next?

Run the JS bundle in NodeJS

yarn demo:js

image

Or run the WASM bundle in NodeJS

yarn demo:wasm

image

Or open 

javascript/html/index_wasm.html
 to run the WASM bundle in the browser:

image

Conclusion

By spending a little time with bazel, you can create a nice build system that works for many languages without breaking your other targets.

We can now drive a core C++ application with bindings in several different languages all while simplifying the interoperability between them.

Stay tuned for part 2 where I show a real C++ library converted to JS and WASM!

Hope you enjoyed and thanks for reading!

Credits

  1. schoppmp for help with optimizing the bazel configuration
  2. TensorFlow.js for the initial bazel configuration

Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.