This article tells about my first practical experience with WebAssembly and few useful technics which I’ve obtained while creating vmsg library.
I’ve got some free time recently so I decided to try the new WebAssembly standard and implement simple but useful library with it.
On the other hand, WebAssembly binary can be parsed and compiled as fast as it comes over the network making it a perfect build target for libraries written in C which you need to use inside a web page.
The idea of library came pretty quickly: I’m spending quite a lot of time on web forums, discussing various things via text messages and images. Recently with the raise of HTML5 video and WebM/VPx formats, it’s became quite common to attach small videos to the posts, increasing the possibilities of self-expression even more. What about voice? What if you can literally tell your message and send it as part of the post? Sounds great, let’s try it!
So first we need to grab audio samples from microphone, then encode it, then return file to the library user. Looks pretty simple.
In 2018 Web Audio API is widely supported, no real headaches here. getUserMedia with conjuction of ScriptProcessorNode are capable of the first step, WebAssembly module would be responsible for the second. Because
onaudioprocess callback of
ScriptProcessor node is being executed in the main thread and also to keep interface of web page responsive, WebAssembly module will be instantiated in the Web Worker, communicating with the main thread via messages.
ScriptProcessNodehas been deprecated and is soon to be replaced by Audio Workers but it’s only implemented in Chrome 64+ behind a flag for the moment and for compatibility we have to use old API in near future anyway. Moreover, since we process samples in a worker, don’t output them to speakers and can use large buffer, Worklet isn’t required in our particular case.
ScriptProcessNodeshould work just fine, all it needs to do is send samples to Web Worker which is very fast and lightweight operation.
We also going to create simple interface which would ask for permission to use mic and display recording form with start/stop/close buttons. Below you can see schematical overview of components of the library:
Now we need to decide which audio format we will use to encode received samples to. Prerequisites: it should work in all browsers that support WebAssembly, it should give sane compression, it should be widespread across all platforms.
My initial desire was to grab Opus because it’s the best that you can use for speech compression. Unfortunately it’s not supported by
Even though it’s possible it’s too impractical for real use in my opinion. It’s ok to say: “If you want to support for voice messages add this library to your project”. But now you dictate which player to use to listen for resulting audios or require some non-trivial code to handle playback. I don’t like it so I had to abandon Opus. Maybe in few years from now the choice would be much easier.
Side note: Chrome and Firefox support MediaStream Recording API and can encode
MediaStreamdata with Opus codec right out of the box. Not in Safari and Edge though and I really want to make my library work in all 4 of them so no luck here again.
Next one, there is WAV/PCM format available in all browsers. Creating WAV file from raw samples is a dead-simple process, there is library already for that. It has one little drawback though: there is no compression at all. So whether you sing some beatiful song in your mic or keep silent, 30 seconds of record (48KHz/mono) will always weight exactly 2.7 megabytes. This is way too wasteful.
What about MP3? It’s supported everywhere, has decent compression and great LAME encoder. Historially FOSS projects stepped aside from using it because of software patents but all of them have expired in last year. So seems like we have a winner.
There are also AAC and Vorbis but neither of them fit. Former forbids distribution of codec implementation in binary form which our WebAssembly module will effectively be. (Also it’s questionable whether free implementations as good as proprietary.) Latter doesn’t suitable for the speech compression.
There are tons of asm.js ports of LAME and maybe even WASM ports but I decided to make a new one from scratch in order to focus on build size optimizations.
For compiling we use de-facto standard Emscripten toolchain, nothing new here. It’s been actively developed for many years and designed to port C/C++ libraries to the web, exactly what we need. I won’t go into details, you can read more about Emscripten at official site.
Emscripten’s asm.js compiler is powered by LLVM backend called fastcomp. For WebAssembly you have two options: compile to asm.js first and translate to WASM with Binaryen. Or use LLVM’s in-tree WebAssembly backend which is capable of producing WebAssembly binaries by itself (almost, you still need to use Binaryen for the final step). I chose the second because it seems to be the preferred one in near future. Also Emscripten’s got support for standard LLVM linker recently which is again going to be preferred soon.
Side note: I’m not going to describe the process of compiling LLVM with WASM backend. It’s generally recommended to use the latest SVN version. You can check out for the starting point. It’s also possible to compile WASM backend with emsdk by providing
--enable-wasmflag but it uses pretty old LLVM (the base for fastcomp patches) so the resulting module might be bigger/slower than with SVN LLVM. It also doesn’t build LLD.
Let’s create stub of our library. I will use Linux shell commands, YMMV.
$ cd ~$ git init vmsg && cd vmsg$ npm init -y
Now we need sources of LAME encoder, git submodules are really handy for that:
$ git submodule add https://github.com/Kagami/lame-svn.git$ cd lame-svn && git checkout RELEASE__3_100 && cd ..
So far so good. Let’s compile
libmp3lame.so (shared LAME library) so that we can later call its functions from WebAssembly module. I use GNU Makefile even though modern builders like webpack and parcel are getting support for WASM, because it’s not mature yet and I want to experiment with compiler flags and other optimizations. And builders will only stand in the way here.
Makefile with the following text (make sure to use tabs for indentation):
export EMCC_WASM_BACKEND = 1export EMCC_EXPERIMENTAL_USE_LLD = 1
lame-svn/lame/dist/lib/libmp3lame.so:cd lame-svn/lame && \git reset --hard && \patch -p2 < ../../lame-svn.patch && \emconfigure ./configure \CFLAGS="-DNDEBUG -Oz" \--prefix="$$(pwd)/dist" \--host=x86-none-linux \--disable-static \--disable-gtktest \--disable-analyzer-hooks \--disable-decoder \--disable-frontend \&& \emmake make -j8 && \emmake make install
I told Emscripten to use WASM backend and LLD, enabled advanced shrinking size optimizations, disabled asserts and disabled some extra stuff in LAME we don’t need. The patch fixes strtol check in configure script and disables default LAME’s reporters to shrink the build size (otherwise Emscripten will include implementation of
printf function and other stuff).
$ source /path/to/emsdk/emsdk_env.sh$ make
This activates Emscripten environment and creates LAME library at
vmsg structure is being used which stores the current state of encoding. It’s also possible to encode multiple files in parallel because we don’t have global variables.
Let’s finally compile our WebAssembly module. Add this to
vmsg.wasm: lame-svn/lame/dist/lib/libmp3lame.so vmsg.cemcc $^ \-DNDEBUG -Oz --llvm-lto 3 \-Ilame-svn/lame/dist/include \-s WASM=1 \-s "EXPORTED_FUNCTIONS=['_vmsg_init','_vmsg_encode','_vmsg_flush','_vmsg_free']" \-o _vmsg.jscp _vmsg.wasm [email protected]
make vmsg.wasm and that’s it. We’ve ported fully-functional MP3 encoder to web which weights only about 70kb gzipped:
$ wc -c < vmsg.wasm152799$ gzip -6 -c vmsg.wasm | wc -c74152
Date object so the date/time functions of musl can work properly. Unfortunately it comes with a cost: even minified by Closure library it would weight about 10kb, so I was curious if I can do a better job than Emscripten in my particular case.
Let’s first look what module actually needs with
wasm-dis from Binaryen toolchain:
$ wasm-dis vmsg.wasm | grep '(import'(import "env" "memory" (memory $0 3))(import "env" "pow" (func $import$1 (param f64 f64) (result f64)))(import "env" "exit" (func $import$2 (param i32)))(import "env" "powf" (func $import$3 (param f32 f32) (result f32)))(import "env" "exp" (func $import$4 (param f64) (result f64)))(import "env" "sqrtf" (func $import$5 (param f32) (result f32)))(import "env" "cos" (func $import$6 (param f64) (result f64)))(import "env" "log" (func $import$7 (param f64) (result f64)))(import "env" "sin" (func $import$8 (param f64) (result f64)))(import "env" "sbrk" (func $import$9 (param i32) (result i32)))
Only 10 functions and most of them can be mapped directly to
Math object! There is also
exit which is called when module decided to, well, exit,
memory which is virtual memory and created with
new WebAssembly.Memory and
sbrk called by musl when it needs to allocate more memory. Here you can see my implementation of all that functions which takes only 30 lines and works perfectly fine.
It’s a great thing that WebAssembly is supported by all 4 major browsers (Chrome/Firefox/Safari/Edge) but not all users on the web have access to latest versions of browsers. So it’s reasonable to make your application support as many versions as you can if it doesn’t hurt readability/performance/maintenance/etc much. For example I intentionally use XHR on browsers without WebAssembly.instantiateStreaming because it makes code only 5 lines longer and allows to support browsers without Fetch API e.g. Edge 12–14.
Right now the recommended way of “polyfilling” WebAssembly is to make separate asm.js build of the same code. It works pretty well with Emscripten’s runtime because it abstracts out differences between these two techs and provides single
Module interface to interact with compiled code. Because we use our own runtime and because it feels more natural to use WebAssembly API if available and emulate it if not, I decided to put foot at “true polyfill” path.
Quick search for “WebAssembly polyfill” returned several projects, most promising of them was by Ryan Kelly. It works by emulating WebAssembly browser APIs such as
WebAssembly.Table, parsing binary module and generating asm.js-alike code on the fly. Exactly what I wanted! Unfortunately it was no longer maintained so I had to fork it, slightly refactor, fix obvious issues and tests and publish to NPM. The most horrible bug was in code generation of
i64.store instruction but eventually I kinda fixed it. Here is my fork, I think it might be useful for other projects too.
I’ve also found polyfill in Binaryen repo but it was too huge (2.5mb vs 95kb in case of wasm-polyfill) and not complete: it doesn’t emulate the WebAssembly browser API. Finally the official polyfill prototype looks abandoned so wasm-polyfill is probably the best option that we have now. It’s not ideal though: generated code is not as effecient as it could be, there’re lot of extra bound checkings created to be fully semantically correct. See last section for the possible improvements in that area.
Usage of polyfill is straight-forward: include minified
wasm-polyfill.js build with
<script> tag or call
importScripts in case of Worker. A piece of cake.
Also we need to build some UI so users won’t have to reimplement it all the time. At first I leant towards to React because it’s so extremely popular and powerful library for creating composable UI components. It comes with a cost though: not everyone in the world uses React e.g. Angular and Vue.js are widespread too, and by sticking to React-only you leave lot of potential users of your library outside. Given that I planned to make interface pretty simple, React won’t help much here, so better to utilize standard DOM API. Moreover it’s always possible to include such library into site powered by any framework but not the other way around.
I won’t annotate all code I’ve got, most of it is self-descriptive. Check out resulting vmsg.js. Interaction with Web Audio and Web Workers is already documented quite good on the web. The only interesting part is that I don’t use separate file for worker source but create a Blob URL instead. This makes library a bit more pleasant to use: you don’t have to care about extra file.
The full demo is available here.
What now? Library works pretty well, I already use it at my forum for voice messages support. But there are few more interesting areas worth trying: