

This article proposes use cases for new image codec and presents libraries to work with it on both front-end and back-end.
First of all, why use better compression, doesnβt network bandwidth increase every year? Why media industry continues to push requirements for codec standards with even higher compression ratio?
There are three major reasons forΒ that:
For example I browse web in underground a lot and inside tunnel fantastic LTE Advanced transforms into ridiculous 100 Kbit/s. On the forum that I host and regularly visit users upload photos mostly without any preprocessing, so usual picture takes 1β4 MBs, quite a lot if you open bunch of them every now and then. And it would be strange to prefer 800Γ1200 to 2000Γ3000 (typical resolutions of exported photos) because the latter looks much nicer on high DPIΒ display.
So higher compression is essential, but donβt we already have tons of solutions for that? What about WebP based on VP8 format that is meant to be replacement of JPEG and just recently gained support in most browsers? Also there are many other candidates, e.g. JPEG 2000, JPEG XR, and finally HEIF based on HEVC video codec standard.
Another thing worth to be noted, 800Γ1200 photos taking 4 MB are clearly not optimized, even with normal JPEG we may compress them down to ~500 KB without losing visual quality much. And even more with state of the art JPEG encoders such as mozjpeg andΒ guetzli.
Well, it seems that coding tools of AV1, the new video coding format developed by Alliance for Open Media, are the most promising at the moment to get highest compression possible, especially at low bit-rates. See e.g. this and this benchmarks. Since we would need to re-compress images anyway to get benefits of new format, why not choose theΒ best?
If youβre interested in leveraging AV1 for video compression, take a look at my previous article dedicated to thisΒ subject.
Another advantage of AV1 is that itβs a royalty-free format which means you donβt have to pay to patent-holders. For what itβs worth software patents is unfortunately still a thing. Formats like JPEG XR didnβt achieve great adoption mostly because of patents involved. So AV1-based solutions are attractive from both technical and legal points ofΒ view.
AVIF (AV1 Still Image File Format) is a new still image format based on AV1. Specification was released and approved as 1.0.0 just recently, meaning itβs ready to be used in production. AVIF file is basically an AV1 key frame packed inside ISOBMFF container, almost identical to the HEIF structure, except AV1 video format instead of HEVC isΒ used.
In theory it looks great but what about practice? Well, tooling support is currently not that good as it could be, given novel aspects of the format. For example latest versions of Chrome and Firefox support AV1 video decoding, but still canβt display AVIF images, it usually takes some time before new format will be added. See e.g. FirefoxΒ issue.
Same for encoding. Most existing software isnβt even aware of such format. So I had to implement both encoding and decoding libraries by myself. See next sections for theΒ details.
As said earlier AVIF file is nothing more than AV1 intra frame in ISOBMFF container. So we may use any available AV1 encoder to produce the actualΒ picture.
Which one to choose among libaom, rav1e and SVT-AV1, the three currently available open-source AV1 encoders?
This graphic produced by av1-bench promotes libaom as a clear winner: it has best score on VMAF metric and its slowest encoding preset is actually faster than competitorsβ, at least on my pre-AVX2 CPU. (libjpeg results are provided for the reference.) That could be explained by speed-over-quality trade-offs chosen in SVT-AV1 and rav1e. Itβs not bad, though still images are represented as single frame videos and itβs not that long to encode one frame even with slowest compression settings. So libaom should be a good choice. Not like we canβt make it faster with speed controls ifΒ needed.
Iβve also compared libaom and SVT-AV1 encodes with my eyes because objective metrics are not the single source of truth. From my subjective perspective it pretty much correlated with VMAF results, though sometimes it was hard to choose the best ofΒ two.
So AV1 encoder is chosen, whatβs next? Forum back-end where Iβm going to use AVIF is written in Go, so I needed a library for that language. After some search Iβve found libavif C library mentioned in official AVIF wiki. It probably works fine and should allow to write Go bindings, but I decided to write my own for better understanding of theΒ format.
Since we wonβt implement encoder from scratch, the entire library boils down to libaom cgo wrapper and pure Go ISOBMFFΒ muxer.
libaom provides typical encoder library C API. We need to prepare frame i.e. wrap pixel data into libraryβs structures, run encode function on it and get resultsΒ back.
Most encoders operate with YβCbCr color model, and 4:2:0 subsampling is the most common. Iβm using image package from standard Go library to get RGB pixel values from the image provided by the user. It supports decoding most popular JPEG and PNG formats out of the box. Pixels inΒ .png are already stored as RGB and forΒ .jpg Go will convert them to RGB automatically. We just need to convert RGB to YβCbCr BT.709 4:2:0 limited range and can pass it to encoder. If it sounds scary, donβt worry. This operation boils down to multiplying R, G and B components of every pixel with some coefficient and few additions.
Now we need to pass that data to libaom, Iβm using small C wrapper av1.c for easier interoperability between C and Go. libaomβs API is pretty straightforward, but there are few things worth toΒ note:
Go part is available inΒ avif.go.
Encoder part is done, now itβs muxer part. Specs for container are freely available here. Here is HEIF extension and here is AVIF extension. I wonβt go into details, you can check final implementation in mp4.go, the code should be self-explanatory. Looks tedious to implement all that dozen of ISOBMFF boxes, but in fact it was a realΒ fun.
Entire Go library is published here. Usage comes down to avif.Encode call and a bit of preparation/error handling boilerplate:
package main
import (
"image"
_ "image/jpeg"
"log"
"os"
"github.com/Kagami/go-avif"
)
func main() {
if len(os.Args) != 3 {
log.Fatalf("Usage: %s src.jpg dst.avif", os.Args[0])
}
srcPath := os.Args[1]
src, err := os.Open(srcPath)
if err != nil {
log.Fatalf("Can't open sorce file: %v", err)
}
dstPath := os.Args[2]
dst, err := os.Create(dstPath)
if err != nil {
log.Fatalf("Can't create destination file: %v", err)
}
img, _, err := image.Decode(src)
if err != nil {
log.Fatalf("Can't decode source file: %v", err)
}
err = avif.Encode(dst, img, nil)
if err != nil {
log.Fatalf("Can't encode source image: %v", err)
}
log.Printf("Encoded AVIF at %s", dstPath)
}
See GoDoc documentation for further details. go-avif also provides simple CLI utility for converting images to AVIF format with single command from console. You can download binaries for Windows, Linux and macOS here. Usage cheatΒ sheet:
# Encode JPEG to AVIF with default settings
avif -e cat.jpg -o kitty.avif
# Encode PNG with slowest speed and quality 15
avif -e dog.png -o doggy.avif --best -q 15
# Fastest encoding
avif -e pig.png -o piggy.avif --fast
Now we can produce AVIF files, but how would we display it? As said earlier, browsers donβt support it yet and we canβt simply ship new decoder library to the browser, weβre limited to JavaScript and various webΒ APIs.
Fortunately in the last decade browser vendors implemented so many of them, it became possible to add new image format with almost native level of integration. Iβm talking about Service Workers and WebAssembly in particular.
First API provides a way to intercept any fetch requests occurred on the page and respond with custom JavaScript-processed answer. Pretty impressive, isnβt it? Without that feature we would be limited to imperative decode-me-that-file-and-paint-it-here style of library API which is of course usable too, but ugly. We, web developers, are used to transparent polyfills that hide implementation details under the hood and provide precious and clean APIs. For image format that would mean ability to display it with <IMG> tag, CSS properties and so on. As you might have guessed library Iβm going to propose uses exactly that mechanism of embedding.
What about the second API? Since format is not yet supported natively, we also need to somehow decode it, i.e. transform bytes into actual pixels to display. Well, JavaScript is (obviously) Turing complete language so itβs perfectly possible to write decoder of any complexity in pure JS. That was actually demonstrated in the past, see e.g. JavaScript MPEG-1Β decoder.
Unfortunately decoders are tend to be very computationally expensive, especially in the case of new formats such as AV1. Even native code implemented in computerβs best friend languages such as C and Assembly tend to be slow, with full access to SIMD instructions, threads and whatΒ else.
Recently weβve got one useful instrument in JS land that helps with that type of tasks. Itβs WebAssembly, a new binary format that is designed to evaluate code with speed close to theΒ native.
Iβve already written an article about using it on the web, so wonβt go into much detail here. All we need to know is that it allows to convert library written in C/C++ in such form that it can be executed inside a web page without any changes to the code. This is especially useful since there is great AV1 decoder already exist. Iβm talking about dav1d, the current state of theΒ art.
One thing worth to note, we wonβt get full speed (<7 ms per Full HD frame) of native code with WebAssembly for the reasons. Itβs currently 32-bit, no SIMD, no threads (or behind the flag) and sandboxing also brings some overhead. But for the still images 100β200 ms of delay should beΒ fine.
The only code we need to write are small wrappers for C and JavaScript to glue everything together. You can see the implementation in dav1d.c and dav1d.js files respectfully. The entire polyfill is available here and also atΒ npm.
WebAssembly is cool, but can we do better? The most keen-eyed readers should have noticed that AV1 in AVIF is exactly the same as in video, so we should be able to decode using already shipped AV1 codec for HTML5 video. It turns out we can! Well, at least in browsers that support AV1, itβs still bleeding edge technology.
The tricky part is that we canβt insert AVIF file as is into <video> tag, that simply wonβt work. We need to write the parser (demuxer) of ISOBMFF container format in order to extract the actual content of the frame (OBUs). And then also write a muxer to wrap that frame into playableΒ .mp4 video which also uses ISOBMFF container.
It turned out to be pretty fun and exciting task, my implementation can be found here. I suggest you also scroll through ISOBMFF spec. MP4 file is like XML i.e. nested tags with some attributes and content, but binary. The design is clean and simple, I really likeΒ it.
After weβve gotΒ .mp4 file in a typed array, we need to convert it to blob and pass to standard video element. Believe me or not but it turned out to be a really hard task. Thatβs because inside Service Worker you donβt have access to DOM and canβt create new HTML elements with document.createElement. Fail.
After some thinking Iβve come to a solution that made the architecture of the library really clumsy. Since we have to respond to intercepted fetch event in the same Service Worker that received it but can process it only in main thread, we would just use message passing to do the decoding task and get results back. It turned out to be working pretty well. And also a bit faster than WebAssembly version of dav1d because of access to SIMD and things likeΒ that.
There is one small thing needed to be done left. Browsers wonβt understand uncompressed YβCbCr frame returned by the decoder. We can only respond with image data supported by the standard <IMG> tag. Those are JPEG, PNG, BMP and similar to them. The simplest solution would be to use standard toDataURL("image/jpeg") method of canvas component to get JPEG data as a string. Though this way leads to quality and performance losses, so Iβve implemented smallΒ .bmp muxer in pure JS instead: bmp.js. BMP can contain uncompressed RGB pixel data, so itβs only a matter of writing header and reordering of RGB to BGRΒ triplet.
The entire library is published here and also at npm. Usage is deadΒ simple:
// Install library
npm install avif.js
// Put this to reg.js and serve avif-sw.js from web root
// Both scripts should be transpilled (either manually with e.g.
// browserify or automatically by parcel)
require("avif.js").register("/avif-sw.js");
// HTML
<body>
<!-- Register worker -->
<script src="reg.js"></script>
<!-- Can embed AVIF with IMG tag now -->
<img src="image.avif">
<!-- Or via CSS property -->
<div style="background: url(image2.avif)">
some content
</div>
</body>
You can also see the demoΒ here.
Iβm not satisfied with only one particular thing about it: Service Workers is complex, fragile and error-phone API. E.g. see here for details about update mechanism of service worker. You can easily mess up and break the entire site. Or wonβt get updated version. Or fetch requests will hang forever. Needless to say all these things happened to me while developing avif.js. Hopefully there wonβt be issues like that anymore since code is stabilized. Letβs also hope web standard authors will improve situation in next iterations of Service WorkersΒ API.
Most of the work to start using AVIF right now is done but there are always many things toΒ improve:
Create your free account to unlock your custom reading experience.