Unless you are living inside the rock, I am pretty sure that most of you have heard something about the codecs. If you have not heard about the codec term specifically, then you must have seen the file extensions consisting of MP3, MP4 Windows Media Audio, and Windows Media Video. Anything that you can see on your screen, play online games, process huge videos with multiple filters or stream through Netflix, Prime, or Youtube cannot be achieved without the codecs.
Yet, very little word gets spread around with respect to the codecs. This guide is intended for all non-technical readers who wish to understand anything with respect to the codec. This is the ultimate guide all over the Internet for codecs written in plain English.
Unlike all the posts on the Internet looking to define it, let’s first understand the problem it solves.
Imagine that Peter has a cart which can carry only 100 different types of clothes piled up on each other, something similar to the picture below -
Since these clothes are all disordered, they take up massive volumes thus, filling up the cart frequently. The store, located in the city, then arranges the clothes properly for their customers. The greater the number of clothes, the greater is the time invested by Peter to travel. It just frustrates him, and he looks for some way out.
One way that he devises with his wife is to fold the clothes and put them into the cart properly—something like the image below.
By folding the clothes like the above-mentioned method, he ends up hitting two birds with a single stone. The number of clothes that he can transport now in one go doubles from 100 to 200, thus reducing his travel to the town for the same number of clothes. Moreover, now the store doesn’t have to fold the clothes properly, they decide to give Peter more monetary incentives.
Now, since the money flows in, Peter pops up with one more idea and gets a cabinet installed in his cart, something like the one shown in the image below.
Now, after the above idea, The capacity of clothes that he can carry in one go increased to a whopping four times, from 100 clothes to 400 clothes. The financial condition of Peter becomes better, and he continues to live a happy life with his family.
Everything was just the same with Peter. Throughout the story, his cart was the same. The owner of the store who used to purchase the clothes was the same. His wife was the same, and the quality of the clothes that were manufactured was also the same.
The only thing that changed throughout the story was the way he ended up transporting the clothes. The more organized and more compressed his clothes were, the easier was to transport them to town. When different compression techniques were applied to his transportation method, different types of benefits were received. The benefit was maximized in the second case with the cabinets and compared to the first one.
The same can be understood for audio or video. The size in which audio and video are created is huge. Hence, these audio and video files need to be compressed before they are written into the CDs or published on the Internet so that they are easy to transport or download, whichever is preferred.
For any video streaming service like Netflix, Prime, or Youtube, the encoded files are present on their servers. Whenever we play a video, those servers using certain javascript codes determine the type of browsers that the file will play on. Once the browser is identified, it sends encoded packets decoded by the browser.
To ensure seamless playback, video decoding takes place with the help of GPU [Graphical processing unit or Graphics card, now available in almost every PC]. The browser determines the sync in between audio, video, and subtitles using networking protocols like RTCP and other information from metadata. Once decoded, the video easily plays on your machine and you get to enjoy your day.
For instance, If the browser is chrome, then the compressed video packet sent by the server to your browser will be encoded in VP8 or VP9 format, while if the same browser is firefox, then the encoding for the same packet will be H.264 or H.265 format. Needless to say, the audio packet remains in the Opus format for both of these.
So, remember behind the playing of a downloaded movie or streaming a youtube video, tons of processing per microsecond involving decoding is performed.
In certain cases where some filters appear on the screen (like in Zoom), the same is done using the OpenCV filter.
From a noob’s perspective, a codec is basically a method of storing multimedia content so that it’s easier to transport.
These storing methods are actually compression methods that vary in range. The compression method is different for Blu-Ray compared to an MP4 video format.
The only file format that we see on our screen is the RAW format, which is the uncompressed video or audio.
Whenever we play a video file from the VLC, the VLC first parses the metadata (summary of content present in that file) of that file; then it attempts to identify the format using which the file was compressed. Once the format has been identified, it goes on to search that particular format in its list. If it is supported, then it attempts to decompress the payload of the file using the decompression technique of that format and then simply plays it.
I will also state a fairly technical definition so that I don’t miss out on anything taken from this link -
A codec compresses or decompresses media files such as songs or videos. Windows Media Player and other apps use codecs to play and create media files. A codec can consist of two parts: an encoder that compresses the media file (encoding) and a decoder that decompresses the file (decoding). Some codecs include both parts, and other codecs only include one of them.
Naturally, multiple codecs come into pictures whenever you go through your day. Let’s assume the daily lifestyle of Bob who is a middle-aged man working in an MNC.
1- Bob has requested a wake-up alarm from his Fixed line Phone company. At 6:30 AM, he gets a wake-up call on his phone, on which announcement is wake-up announcement is heard using PCMA Codecs.
2- Bob starts his day by listening to some downloaded MP3 Meditation songs on his Mobile Phone.
3- Then he goes for some news watching on his brand new Samsung TV, wherein video displayed is in H.264 codec, and audio is Opus codec.
4- In the times of COVID, he starts his work by attending a meeting on Zoom through google chrome, wherein he will use VP8/VP9 for video and Opus for Audio
5- If for some reason, his Internet does not work correctly, he calls the “Guy“ (Telecom service provider Guy), on his mobile, he ends up using AMR/AMR-WB/EVS codec while talking. If for some reason, they decide to switch on video, then they simultaneously use H.264/H.263 also.
6- While using Netflix, Google duo, WhatsApp video/voice call, or Skype, they use their certain proprietary codecs which have been optimized by them over time to take less bandwidth and provide a good quality of service to the end-users.
7- In the evening, Bob decides to watch a Blu-Ray Lord of the Rings, which uses the Blu-Ray codecs for Audio and Video. If he decides to watch the downloaded movie on Laptop, then WebM in Windows Media Player comes into pictures.
There are tons of scenarios for Bob that I have failed to recall while writing this article. But, you get the idea of the importance the codec play in our lives.
We can just forget the streaming and any Audio, video and Image that ever exists provided codecs were never invented.
If the bandwidth is not good, the advanced codecs automatically adjust the bitrate and the format of the data as per your circumstances. A professional would mention something like this as transrating which is something changing bitrates, such as taking a 4K video input stream at 13 Mbps and converting it into a 720P video stream. We will discuss this in more depth in the next blog post.
If the downloaded file was downloaded for the laptop, but bob, without listening to anyone attempts to play it on TV using a Pendrive. Since the screen size is very different, your TV will automatically adjust the frame size of the data format as per the dimensions of your TV. A professional would mention something like this as transsizing, which specifically refers to resizing the video frame, say, from a resolution of 1920×1080 (1080p) up to 3840×2160 (4K UHD).
We will discuss transrating and transsizing in more depth in the next blog post.
If the device does not support the codecs, it will just give a prompt that it is missing certain codecs. If available, it will automatically download and install them or you can do it with the help of Google.
As the codecs are all about compression and decompression of the size, there comes a chance that the compressed file cannot be restored to its original size. The same will be called a Lossy codec. There will also be the time that a codec which is compressed when decompressed restores into its original size, thereby resulting in no loss. This type of codec is called a Lossless codec. The same can be easily explained using the image below.
Of course, As we all the humans, along with all life forms, consists of the stardust at our very core, similarly these codecs consist of the digital signals at their very core. There is no point in going that far down the line. It is just enough to know that there are different codecs for audio, video, and text.
Apart from that, nowadays, nearly all codecs consist of dynamic data format (payload in tech jargon).
[ Jargon ] Payload - It is the data format used for transmitting digital media streams of audio and video on the Internet (Internet Protocol networks). The details of the media encoding, such as signal sampling rate, frame size, and timing, are specified in an RTP payload format.
During the early stage of the development of the RTP development, it was necessary to use statically assigned payload types because no other mechanism had been specified to bind encodings to payload types. Those were called the Linear codecs. Some famous codecs which were generally used were Alaw (G.711), Ulaw, GSM, JPEG among others.
However, the payload type number space is relatively small and cannot accommodate assignments for all existing and future encodings. It was anticipated that mechanisms would be specified to establish a dynamic mapping between a (data format of the codec) payload type and encoding of the codec. These mechanisms associate the registered name of the encoding/payload format, along with any additional required parameters, such as the RTP timestamp, clock rate, and the number of channels, with a payload type number.
This association is effective only for the duration of the RTP session in which the dynamic payload type binding is made. This association applies only to the RTP session for which it is made. Thus, the numbers can be re-used for different encodings in different sessions to avoid the number space limitation. These are called Dynamic codecs. Examples include AMR, AMR-WB, EVS (all three are used for voice in Mobile Networks), H.264, VP8, VP9 (For video) among others.
Yup, that’s totally right, and it is a very high chance that what works in apple Tablet might not work for Samsung TV. In this scenario, Transcoding takes place.
Transcoding is converting one codec into another codec in order to ensure interoperability among systems.
In order to make communications possible in the presence of incompatibilities, there exists a need to introduce intermediaries that provide the transcoding services to a session. More can be learned from this Link.
Needless to say, this blog does not contain the description with respect to the specific codecs, and sincerely speaking, I don’t believe there is enough room in this post to communicate the uniqueness of each codec. Each codec, be it for audio, video, or text, has its own beauty, characteristics, and an environment where it shines. Unless your work involves dealing in hardcore multimedia-specific software like Gstreamer, there is nothing much that can be gained from understanding each of these codecs.
This blog post is just for you to experience that “Aha“ moment where you can appreciate the seamless work done by these codecs to make our life simple and easy. Hope that this blog was able to do its justice to the codecs.