In this short article, I follow the main steps of video transcoding and optimization for the web. I share basic tips and examples to consistently automate these processes inhouse using ffmpeg, a powerful and open source package.
When it comes to deploy a video processing and delivering pipeline for a small to medium web business intensive in media like an ecommerce, the first decision is whether to do it inhouse or use an external service (and which service). Aspects to consider: the need of customising and branding the video experience, the amount of videos and their duration, the regional coverage sought, the cost, or the complexity of the associated workflows, among other factors.
While making a living from deploying and building video processing and delivery services, here I’ll be the devil’s advocate. In the case you are done with progressive video and the following steps cover your expectations, you likely don’t need anything else.
The following tips are intended to deliver progressive videos in HTML5 without the need of any javascript viewer. I also comment on HLS and per title encoding, a powerful alternative to address a number of issues with progressive videos.
I assume you start from a first version of a finished video provided by a photographer or media agency. This is the pristine video. The first thing to be aware is that the pristine sets the maximum resolution and the maximum quality.
A good pratice for a web is to set a format, resolution, and quality for pristines and stick to it. For instance H264 in mp4, 4k resolution, and maximum visual quality (crf=17). Requiring providers to stick to this definition will avoid last minute (bad) surprises. You might need to define more than one resolution with different aspect ratios, depending on the web layout.
From the pristine video you’ll need to derive the variants with the formats, the resolutions, and the quality that you intend to deliver. As for images, progressive videos should be delivered under a responsive approach. It doesn’t make sense to deliver a heavy FullHD version only to be rendered in a small viewport. Breakpoints should be defined to ensure a proper coverage of your audience.
Regarding formats, you should serve at least a version based on H264. This is the universally supported coding standard. You may also consider to serve a H265 version for iOS and Safari users and a VP9 version for Android and Chrome users as they will show better than H264 with even less weight. Still, you should always keep H264 as fallback. For instance,
<video width="100%" controls>
<source src="Sintel_Trailer.1080p.DivX_Plus_HD.mp4" type="video/mp4; codecs=hevc">
<source src="Sintel_Trailer.1080p.DivX_Plus_HD.webm" type="video/webm; codecs=vp9">
<source src="Sintel_Trailer.1080p.DivX_Plus_HD.m4v" type="video/mp4">
</video>
This short article may help in the decision regarding format selection.
If you decide to serve several formats, then you need transcoding. Assuming a pristine in H264, you may use ffmpeg to convert to H265 (HVEC) and to VP9. For instance
# H265
ffmpeg -i input.mp4 -c:v libx265 -crf 23 -tag:v hvc1 -pix_fmt yuv420p -color_primaries 1 -color_trc 1 -colorspace 1 -movflags +faststart -an output.mp4
# VP9
ffmpeg -i input.mp4 -c:v libvpx-vp9 -crf 30 -speed 3 -pix_fmt yuv420p -color_primaries 1 -color_trc 1 -colorspace 1 -movflags +faststart -an output.webm
As pointed, to be responsive you’d serve several resolutions. To define your breakpoints you’ll look to the page layout and the traffic analytics (devices and screens). Once defined, Ffmpeg allows you to easily rescale a video. for instance:
# Rescale to full HD
ffmpeg -y -i input.mp4 -vf scale=1920:-2 -c:v libx264 -crf 23 -profile:v high -pix_fmt yuv420p -color_primaries 1 -color_trc 1 -colorspace 1 -movflags +faststart -an output.mp4
Remind to always downscale from the pristine. Avoid upscaling. You would be inventing pixels that add to the weight and use the bandwidth, while the browser can do the same without such cost. Moreover, upscaling produces artifacts, specially blur. Just define the pristine and the breakpoints with resolution enough.
Video compression for web is lossy. And lossy means discarding information to reduce weight. Depending on the amount of information discarded, the content of the video and which coding standard you’re using, at a given point you’ll notice increasing visual artifacts (blur, blocking, mosquito).
When choosing compression settings, there are two main approaches: setting a visual quality or setting a bitrate.
In this case you should set the crf parameter, which is a proxy for visual quality. This is the default configuration in ffmpeg.
The underlying metric behind the crf is different for each coding standard. You’ll need to guess the value that meets your expectations for each video format. A good starting point is the default value in ffmpeg,
# Example of quality definition based on crf
ffmpeg -y -i input.mp4 -vf -c:v libx264 -crf 23 -profile:v high -pix_fmt yuv420p -color_primaries 1 -color_trc 1 -colorspace 1 -movflags +faststart -an output.mp4
While the benefit of a policy based on crf is the assurance of visual quality, the main risk are peak values of bitrate in videos with rich textures and fast movements.
If you want to assure the bandwidth used by the video, then you need to set the bitrate.
# Example of compression with constant bitrate
ffmpeg -y -i input.mp4 -vf -c:v libx264 -b:v 2500k -profile:v high -pix_fmt yuv420p -color_primaries 1 -color_trc 1 -colorspace 1 -movflags +faststart -an output.mp4
The downside is that to avoid a good number of videos with very poor visual quality, you’ll have to set a fairly high bitrate value. This means delivering a lot of other videos much heavier than needed. I’d rather use a compression based on crf and not on bitrate.
While the approach described should fairly cover your basic needs with short videos of few seconds, I’d like to comment on HLS. It is an adaptive bitrate (ABR) protocol created by Apple with killing properties, and widespread support thanks to open source viewers. It adapts the resolution and quality of the video delivered not only to the viewport size but also to the network bandwidth, even if they change after the video started. For mobile users, HLS combined with per title encoding -optimization of the bitrate ladder based on the content of the specific video-, provides the best possible experience, even for quite short clips.
Although transcoding can be certainly done using ffmpeg, it involves more complex aspects (quality selection, renditions, viewer), out of the scope of this basic tips article.
Here I have gone through some basic tips that cover a simple pipeline for transcoding progressive videos for web delivery.
I have just used Ffmpeg, a very powerful package that allows to deploy almost any possible pipeline you can imagine to manipulate and process videos for web. For some more tips on basic video editing, you may also want to check this. For a bird's eye view of what a custom deployment for video publishing entails you may check this other post.