6,765 reads

Start Using Superior Image Compression Today

by Kagami HiiragiApril 11th, 2019

Too Long; Didn't Read

<em>This article proposes use cases for new image codec and presents libraries to work with it on both front-end and back-end.</em>

Companies Mentioned

featured image - Start Using Superior Image Compression Today

This article proposes use cases for new image codec and presents libraries to work with it on both front-end and back-end.

Compression overview

First of all, why use better compression, doesn’t network bandwidth increase every year? Why media industry continues to push requirements for codec standards with even higher compression ratio?

There are three major reasons for that:

Network coverage. While you’re achieving 20 Gbps near the 5G cell, there are many places in town where speed isn’t that great at all.
Traffic cost. Even if you download fast, most telcos charge you for every byte of data.
Higher resolution and quality. Need at least 8K for 12.9” tablet for human-eye resolution, ideally lossless. We are not there yet.

For example I browse web in underground a lot and inside tunnel fantastic LTE Advanced transforms into ridiculous 100 Kbit/s. On the forum that I host and regularly visit users upload photos mostly without any preprocessing, so usual picture takes 1–4 MBs, quite a lot if you open bunch of them every now and then. And it would be strange to prefer 800×1200 to 2000×3000 (typical resolutions of exported photos) because the latter looks much nicer on high DPI display.

So higher compression is essential, but don’t we already have tons of solutions for that? What about WebP based on VP8 format that is meant to be replacement of JPEG and just recently gained support in most browsers? Also there are many other candidates, e.g. JPEG 2000, JPEG XR, and finally HEIF based on HEVC video codec standard.

Another thing worth to be noted, 800×1200 photos taking 4 MB are clearly not optimized, even with normal JPEG we may compress them down to ~500 KB without losing visual quality much. And even more with state of the art JPEG encoders such as mozjpeg and guetzli.

MSU Codec Comparison (April 4, 2019)

Well, it seems that coding tools of AV1, the new video coding format developed by Alliance for Open Media, are the most promising at the moment to get highest compression possible, especially at low bit-rates. See e.g. this and this benchmarks. Since we would need to re-compress images anyway to get benefits of new format, why not choose the best?

If you’re interested in leveraging AV1 for video compression, take a look at my previous article dedicated to this subject.

Another advantage of AV1 is that it’s a royalty-free format which means you don’t have to pay to patent-holders. For what it’s worth software patents is unfortunately still a thing. Formats like JPEG XR didn’t achieve great adoption mostly because of patents involved. So AV1-based solutions are attractive from both technical and legal points of view.

Meet AVIF

AVIF (AV1 Still Image File Format) is a new still image format based on AV1. Specification was released and approved as 1.0.0 just recently, meaning it’s ready to be used in production. AVIF file is basically an AV1 key frame packed inside ISOBMFF container, almost identical to the HEIF structure, except AV1 video format instead of HEVC is used.

In theory it looks great but what about practice? Well, tooling support is currently not that good as it could be, given novel aspects of the format. For example latest versions of Chrome and Firefox support AV1 video decoding, but still can’t display AVIF images, it usually takes some time before new format will be added. See e.g. Firefox issue.

Same for encoding. Most existing software isn’t even aware of such format. So I had to implement both encoding and decoding libraries by myself. See next sections for the details.

Encoding AVIF

As said earlier AVIF file is nothing more than AV1 intra frame in ISOBMFF container. So we may use any available AV1 encoder to produce the actual picture.

Which one to choose among libaom, rav1e and SVT-AV1, the three currently available open-source AV1 encoders?

Comparison of intra coding efficiency

This graphic produced by av1-bench promotes libaom as a clear winner: it has best score on VMAF metric and its slowest encoding preset is actually faster than competitors’, at least on my pre-AVX2 CPU. (libjpeg results are provided for the reference.) That could be explained by speed-over-quality trade-offs chosen in SVT-AV1 and rav1e. It’s not bad, though still images are represented as single frame videos and it’s not that long to encode one frame even with slowest compression settings. So libaom should be a good choice. Not like we can’t make it faster with speed controls if needed.

I’ve also compared libaom and SVT-AV1 encodes with my eyes because objective metrics are not the single source of truth. From my subjective perspective it pretty much correlated with VMAF results, though sometimes it was hard to choose the best of two.

So AV1 encoder is chosen, what’s next? Forum back-end where I’m going to use AVIF is written in Go, so I needed a library for that language. After some search I’ve found libavif C library mentioned in official AVIF wiki. It probably works fine and should allow to write Go bindings, but I decided to write my own for better understanding of the format.

Since we won’t implement encoder from scratch, the entire library boils down to libaom cgo wrapper and pure Go ISOBMFF muxer.

libaom provides typical encoder library C API. We need to prepare frame i.e. wrap pixel data into library’s structures, run encode function on it and get results back.

Most encoders operate with Y’CbCr color model, and 4:2:0 subsampling is the most common. I’m using image package from standard Go library to get RGB pixel values from the image provided by the user. It supports decoding most popular JPEG and PNG formats out of the box. Pixels in .png are already stored as RGB and for .jpg Go will convert them to RGB automatically. We just need to convert RGB to Y’CbCr BT.709 4:2:0 limited range and can pass it to encoder. If it sounds scary, don’t worry. This operation boils down to multiplying R, G and B components of every pixel with some coefficient and few additions.

Now we need to pass that data to libaom, I’m using small C wrapper av1.c for easier interoperability between C and Go. libaom’s API is pretty straightforward, but there are few things worth to note:

We utilize 2-pass encoding even though it’s a single picture. libaom (and libvpx) are known to produce better results that way.
CRF (Q mode in libaom’s terminology) maps pretty well to quality slider we’re used to with JPEG converters. It’s just a bit different scale from 0 to 63 where 0 means lossless and 63 is worst quality. Well, codecs might be strange 😉
We can control encoding speed in range from 0 to 8, 4 being the default which seems reasonable.
Multithreading is enabled by default, otherwise it would be too slow. row-mt parallelism and 4 tiles give pretty good results.

Go part is available in avif.go.

Encoder part is done, now it’s muxer part. Specs for container are freely available here. Here is HEIF extension and here is AVIF extension. I won’t go into details, you can check final implementation in mp4.go, the code should be self-explanatory. Looks tedious to implement all that dozen of ISOBMFF boxes, but in fact it was a real fun.

Entire Go library is published here. Usage comes down to avif.Encode call and a bit of preparation/error handling boilerplate:

package main

import (
    "image"
    _ "image/jpeg"
    "log"
    "os"

    "github.com/Kagami/go-avif"
)

func main() {
    if len(os.Args) != 3 {
        log.Fatalf("Usage: %s src.jpg dst.avif", os.Args[0])
    }

    srcPath := os.Args[1]
    src, err := os.Open(srcPath)
    if err != nil {
        log.Fatalf("Can't open sorce file: %v", err)
    }

    dstPath := os.Args[2]
    dst, err := os.Create(dstPath)
    if err != nil {
        log.Fatalf("Can't create destination file: %v", err)
    }

    img, _, err := image.Decode(src)
    if err != nil {
        log.Fatalf("Can't decode source file: %v", err)
    }

    err = avif.Encode(dst, img, nil)
    if err != nil {
        log.Fatalf("Can't encode source image: %v", err)
    }

    log.Printf("Encoded AVIF at %s", dstPath)
}

See GoDoc documentation for further details. go-avif also provides simple CLI utility for converting images to AVIF format with single command from console. You can download binaries for Windows, Linux and macOS here. Usage cheat sheet:

# Encode JPEG to AVIF with default settings
avif -e cat.jpg -o kitty.avif

# Encode PNG with slowest speed and quality 15
avif -e dog.png -o doggy.avif --best -q 15

# Fastest encoding
avif -e pig.png -o piggy.avif --fast

Decoding AVIF

Now we can produce AVIF files, but how would we display it? As said earlier, browsers don’t support it yet and we can’t simply ship new decoder library to the browser, we’re limited to JavaScript and various web APIs.

Fortunately in the last decade browser vendors implemented so many of them, it became possible to add new image format with almost native level of integration. I’m talking about Service Workers and WebAssembly in particular.

Service Worker fetch interceptor (MDN docs)

First API provides a way to intercept any fetch requests occurred on the page and respond with custom JavaScript-processed answer. Pretty impressive, isn’t it? Without that feature we would be limited to imperative decode-me-that-file-and-paint-it-here style of library API which is of course usable too, but ugly. We, web developers, are used to transparent polyfills that hide implementation details under the hood and provide precious and clean APIs. For image format that would mean ability to display it with <IMG> tag, CSS properties and so on. As you might have guessed library I’m going to propose uses exactly that mechanism of embedding.

What about the second API? Since format is not yet supported natively, we also need to somehow decode it, i.e. transform bytes into actual pixels to display. Well, JavaScript is (obviously) Turing complete language so it’s perfectly possible to write decoder of any complexity in pure JS. That was actually demonstrated in the past, see e.g. JavaScript MPEG-1 decoder.

Unfortunately decoders are tend to be very computationally expensive, especially in the case of new formats such as AV1. Even native code implemented in computer’s best friend languages such as C and Assembly tend to be slow, with full access to SIMD instructions, threads and what else.

Recently we’ve got one useful instrument in JS land that helps with that type of tasks. It’s WebAssembly, a new binary format that is designed to evaluate code with speed close to the native.

I’ve already written an article about using it on the web, so won’t go into much detail here. All we need to know is that it allows to convert library written in C/C++ in such form that it can be executed inside a web page without any changes to the code. This is especially useful since there is great AV1 decoder already exist. I’m talking about dav1d, the current state of the art.

One thing worth to note, we won’t get full speed (<7 ms per Full HD frame) of native code with WebAssembly for the reasons. It’s currently 32-bit, no SIMD, no threads (or behind the flag) and sandboxing also brings some overhead. But for the still images 100–200 ms of delay should be fine.

The only code we need to write are small wrappers for C and JavaScript to glue everything together. You can see the implementation in dav1d.c and dav1d.js files respectfully. The entire polyfill is available here and also at npm.

WebAssembly is cool, but can we do better? The most keen-eyed readers should have noticed that AV1 in AVIF is exactly the same as in video, so we should be able to decode using already shipped AV1 codec for HTML5 video. It turns out we can! Well, at least in browsers that support AV1, it’s still bleeding edge technology.

The tricky part is that we can’t insert AVIF file as is into <video> tag, that simply won’t work. We need to write the parser (demuxer) of ISOBMFF container format in order to extract the actual content of the frame (OBUs). And then also write a muxer to wrap that frame into playable .mp4 video which also uses ISOBMFF container.

AVIF file in ISOBMFF Box Structure Viewer

It turned out to be pretty fun and exciting task, my implementation can be found here. I suggest you also scroll through ISOBMFF spec. MP4 file is like XML i.e. nested tags with some attributes and content, but binary. The design is clean and simple, I really like it.

After we’ve got .mp4 file in a typed array, we need to convert it to blob and pass to standard video element. Believe me or not but it turned out to be a really hard task. That’s because inside Service Worker you don’t have access to DOM and can’t create new HTML elements with document.createElement. Fail.

After some thinking I’ve come to a solution that made the architecture of the library really clumsy. Since we have to respond to intercepted fetch event in the same Service Worker that received it but can process it only in main thread, we would just use message passing to do the decoding task and get results back. It turned out to be working pretty well. And also a bit faster than WebAssembly version of dav1d because of access to SIMD and things like that.

There is one small thing needed to be done left. Browsers won’t understand uncompressed Y’CbCr frame returned by the decoder. We can only respond with image data supported by the standard <IMG> tag. Those are JPEG, PNG, BMP and similar to them. The simplest solution would be to use standard toDataURL("image/jpeg") method of canvas component to get JPEG data as a string. Though this way leads to quality and performance losses, so I’ve implemented small .bmp muxer in pure JS instead: bmp.js. BMP can contain uncompressed RGB pixel data, so it’s only a matter of writing header and reordering of RGB to BGR triplet.

avif.js demo in Chrome for Android

The entire library is published here and also at npm. Usage is dead simple:

// Install library 
npm install avif.js

// Put this to reg.js and serve avif-sw.js from web root
// Both scripts should be transpilled (either manually with e.g.
// browserify or automatically by parcel)
require("avif.js").register("/avif-sw.js");

// HTML
<body>
  <!-- Register worker -->
  <script src="reg.js"></script>

  <!-- Can embed AVIF with IMG tag now -->
  <img src="image.avif">

  <!-- Or via CSS property -->
  <div style="background: url(image2.avif)">
    some content
  </div>
</body>

You can also see the demo here.

I’m not satisfied with only one particular thing about it: Service Workers is complex, fragile and error-phone API. E.g. see here for details about update mechanism of service worker. You can easily mess up and break the entire site. Or won’t get updated version. Or fetch requests will hang forever. Needless to say all these things happened to me while developing avif.js. Hopefully there won’t be issues like that anymore since code is stabilized. Let’s also hope web standard authors will improve situation in next iterations of Service Workers API.

Future ideas

Most of the work to start using AVIF right now is done but there are always many things to improve:

Better compression is great, but how do we know that quality of the original image is preserved when we re-encode? There are various objective metrics such as PSNR, SSIM or modern VMAF exist, but they simply don’t define the “visual identity” score we should target. And even if they would, how should we map encoder’s QP to that score?
I’m going to collect some statistics about AVIF usage on my site and maybe share them in the next article.
Let’s hope Chrome, Firefox, Edge or even Safari will add native support for AVIF soon. avif.js should still be useful for supporting elder versions though.
AVIF tooling might be much better. Right now we’re limited to this small list of AVIF-aware software. Would be great to have support in popular image viewers and converters too.
AV1 decoder polyfill (dav1d.js) might be tinier and faster, it’s not tuned to the max yet. But since polyfill is only needed for old browsers, it’s not that important.