paint-brush
An Intro to Colorspaces and FFmpeg with InVideoby@invideo
337 reads
337 reads

An Intro to Colorspaces and FFmpeg with InVideo

by InVideo.ioJune 23rd, 2021
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

The rendering process of a.mp4 video created on InVideo involves piping a sequence of virtual framebuffers, containing the information of each frame of the composition, to the FFmpeg encoder. This operation is very similar to taking a sequence. of. images and piping them to FFmpeg. Instead of.png images, we are just piping ‘virtual images’. (It might have minute differences, but for the scope of this discussion, we can assume that it’s the same.)

Company Mentioned

Mention Thumbnail
featured image - An Intro to Colorspaces and FFmpeg with InVideo
InVideo.io HackerNoon profile picture

The rendering process of a .mp4 video created on InVideo involves piping a sequence of virtual framebuffers, containing the information of each frame of the composition, to the FFmpeg encoder, which sequentially stitches them into a video with standard H.264 encoding. A virtual framebuffer is just another fancy term for — the array of RGB values corresponding to the pixels of the frame of the video. It’s a ‘virtual image’ of sorts.

Let’s also note here that all InVideos are rendered in HD resolution.

Now this operation is very similar to taking a sequence of .png images and piping them to FFmpeg. Instead of .png images, we are just piping ‘virtual images’. (It might have minute differences, but for the scope of this discussion, we can assume that it’s the same.)

The simplified FFmpeg command for stitching a video from a sequence of images is —

ffmpeg -i img%03d.png -pix_fmt yuv420p video.mp4

The above converts a sequence of images (‘image001.png’‘image002.png’) to a yuv420 chroma sub-sampled H.264 video at the standard 25 frames per second. (Replace the sequence of real .png images, with the virtual framebuffers and the result is the same.)

Simple, no?

Well, we thought so too until we started getting complains that the colours in the video seemed ‘washed out’, ‘faded’, ‘dull’. So we did a visual comparison of our InVideo output with the output of the same composition from After Effects (via Adobe Media Encoder) in their export settings for High Definition videos.

Visual Differences

While the difference is minor (as you may notice on the cover image of this blog), you can notice how the colours in the AE render are richer as compared to the InVideo render; which seems more washed out.

It took some research and closely inspecting the video metadata using FFprobe and MediaInfo to figure out that there was a difference in the colour spaces used between the two renders.

Colour Spaces

Before going further, it is important to touch base on the concept of colour spaces -

Colour spaces are the mathematical profiles/ equations which transform a pixel value of a colour model (such as (255,255,255) of RGB) to a corresponding wavelength of light. These equations have been derived and re-derived from an empirical understanding of how we humans perceive light. In other words, colour spaces are responsible for creating the physical representation of the colour red (or green or blue or a combination) as we see it, on a device like an LCD or CRT screen.

Without colour spaces, different display devices and screens, or even applications and browsers, would render the colours of the same image differently.

To make this uniform, colour spaces were formalized and created by an international committee (CIE). Today, it is incomplete to save an image or video with just the pixel information. A colourspace should be attached to the metadata of the media file to ensure uniform colour generation across the board.

To read more about colour spaces, do check out the following article. To read more about colour space support in FFmpeg, check out the following article.

Among others, the 3 popular video colour spaces for YUV encoded videos generated from FFmpeg are -

​BT.601 (“Standard-Definition” or SD)​BT.709 (“High-Definition” or HD)​BT.2020 (“Ultra-High-Definition” or UHD)

Converting an RGB signal to YUV is done via the YUV/RGB transformation matrix that is different for the different colour spaces mentioned above. The BT.601 is an old format for SD TV videos from the days of PAL and NTSC, while BT.709 is the newer most used format for HD videos.

(It is interesting to note that BT.709 shares the same primaries and white point chromaticity as the sRGB space — which is the go-to colourspace for most images and web browsers.

The BT.2020 is the latest format for UltraHD with a wider and richer gamut. However, we can leave it out of the scope of discussion until we start creating UltraHD videos with richer colours for large screens.)

Coming back to the renders…

Which Colour Space was the AE video rendered in? BT.709!

This makes sense since the video was in HD resolution, and BT.709 is the wider accepted colour space for most HD videos today.

Which Colour Space was the InVideo video rendered in? BT.601!

After much digging, we realised that FFMPEG by default converts RGB to YUV using 601 matrices, irrespective of the resolution — if nothing is specified.

So which one is better?

Hmm, hard question. They were meant for different devices (BT.601 for CRTs and BT.709 for HD LCD screens) so we should not really compare.

However, BT.709, being a newer format, has a slightly wider colour gamut and is arguably closer to the human perception of colour and luminance than BT.601. It was also built for it and thus maps better than most HD formats used today.

And just based on the colour differences in the screenshots we saw above — we like BT.709 better.

Rendering InVideos in the BT.709 Colour Space

Now that we had identified the problem, we just had to figure out how to get FFmpeg to generate the video in the BT.709 colourspace.

Attempt 1

A hasty attempt led us to run the following modified command:

ffmpeg -i img%03d.png -pix_fmt yuv420p -vf colorspace=all=bt709:iall=bt601-6-625:fast=1 -colorspace 1 -color_primaries 1 -color_trc 1 video.mp4

We added the following parameters to the original command -

-colorspace 1 -color_primaries 1 -color_trc 1

The “1” corresponds to BT.709.
(More on the relevant FFmpeg documentation here.)

However, we soon realised that — all that this did was assign the BT.709 tags to the video metadata. FFmpeg still encoded the video using the BT.601 transformation matrix.

This was wrong and a double whammy!

The video was still encoded using the BT.601 transformation matrix but was assigned BT.709 metadata tags. So a media player like VLC, would decode the video according to the BT.709 reverse transformation matrix — causing all sorts of unwanted discolouration.

Attempt 2

Instead of simply assigning the metadata tags, we also needed to ensure that the BT.709 transformation matrix was used, or alternatively do the YUV to YUV colour space conversion from BT.601 to BT.709.

For this, we used the FFmpeg filters feature (“-vf”). It could all implicitly be done in 1 FFmpeg pass using the command below -

ffmpeg -i img%03d.png -pix_fmt yuv420p -vf colorspace=all=bt709:iall=bt601-6-625:fast=1 -colorspace 1 -color_primaries 1 -color_trc 1 video.mp4

The extra filters parameter takes care of the conversion to BT.709 -

-vf colorspace=all=bt709:iall=bt601-6-625:fast=1

(fast=1 was used so that the correct gamma correction happens.)

The metadata tags were still kept as a good practice. Many media players, by default, decode an HD video using the BT.709 colour space if nothing is specified, some still need the metadata tags for proper decoding.

And voila! We had an output which was almost identical in colour to the AE render.

Also published on: https://medium.com/invideo-io/talking-about-colorspaces-and-ffmpeg-f6d0b037cc2f