Whether it’s a World Cup match, the Super Bowl, or the French Open finals, watching it with your friends on a Saturday night is #goals. Sadly, not all of us can get tickets and travel across cities, countries, or continents to attend them. Thankfully, live streaming makes it possible to watch all the action, close to real-time.
But, the only question is “how close to real-time are we talking?”
Video streaming is largely facilitated on the back of a video protocol called HLS (HTTP Live Streaming). While the origins and fundamentals of HLS are explained in another piece on our blog, the current piece will focus on how HLS resolved one of its greatest shortcomings: latency.
To start with, let’s take a quick peek at how HLS works.
We will first try to understand how HLS works, and makes live streaming possible. This is what the typical flow of an HLS streaming system looks like:
The audio/video stream captured by input devices is encoded and ingested into a media server.
The media server transcodes the stream into an HLS-compatible format with multiple ABR variants and also creates a playlist file to be used by the video players.
Then, the media server serves the media and the playlist file to the clients, either directly or via CDNs by acting as an origin server.
The players, on the client end, make use of the playlist file to navigate through the video segments. These segments are typically “slices” of the video being generated, with a definite duration (called segment size, usually 2 to 6 seconds).
The playlist is refreshed based on segment size and players can select the segments specified in them, based on the order of playback and the video quality they require.
Even though HLS offers a reliable way of video streaming, its high latency levels may pose obstacles and issues for many streamers or video distributors. According to the initial specification, a player should load the media files in advance before playing it. This makes HLS an inherently higher latency protocol with a latency of about 30 to 60 seconds.
Everyone was interested in implementing HLS but the high latency was a serious roadblock. So, devs and enthusiasts started to find workarounds to reduce latency and refine the protocol for effective usage. Some of these practices offered such positive results that they started becoming a silent standard along with the HLS specification. Two of these practices are listed below:
When Apple introduced HLS, the typical segment size was 10 seconds. Most HLS implementers found it too long because of which Apple decided to reduce it to 6 seconds. The overall latency can reduced by reducing segment size and the buffer size of the player.
However, this carries some issues. Some of them include increased overall bitrate, buffering or jitter for devices with inferior network conditions. The ideal segment size should be decided based on the target audience and could be in the range of 2 to 4 seconds.
The main reason HLS is used for live streaming is the scalability, reliability and player compatibility it provides across all platforms, especially when compared to other protocols. This has made HLS irreplaceable for video delivery so far.
But the first mile contribution (also known as ingest) from the HLS stack can be replaced with lower latency protocols to reduce overall latency.
The HLS ingest is usually replaced by RTMP ingest, which enjoys wide support for encoders/services and has proved to be a cost-effective solution. The stream ingested with RTMP is then transcoded to support HLS with the help of a media server before serving the content. Even though there have been experiments with other protocols such as WebRTC, SRT for the ingest part, RTMP remains the most popular option.
The latency in HLS started posing a significant hurdle, leading to less than stellar user experiences. This was becoming more frequent since HLS was being widely adopted around the world. Tuning HLS wasn’t enough and everyone was looking for better and more sustainable solutions.
It was in 2016 that Twitter’s Periscope engineering team made some major changes to their implementation in order to achieve low latency with HLS. This proprietary version of HLS, often referred to as LHLS, offered latency of 2 to 5 seconds.
DASH, the main competitor for HLS came up with a low latency solution based on chunked CMAF in 2017, following which a community-based low latency HLS solution (L-HLS) was drafted in the year 2018. This variant was heavily inspired from the Periscope’s LHLS and leveraged Chunked Transfer Encoding (CTE) to reduce latency. This variant is often referred to Community Low Latency HLS (CL-HLS).
While this version of HLS was gaining popularity, Apple decided to release their own extension of the protocol called Low Latency HLS (LL-HLS) in 2019. This is often referred to as Apple Low Latency HLS (ALHLS). This version of HLS offered low latency comparable to the CL-HLS and promised compatibility with Apple devices. Since then, LL-HLS has been merged into the HLS specification and has technically become a single protocol.
In this section, we’ll explore the changes LL-HLS brings to HLS, making low latency streaming possible. This protocol came with 2 main changes in spec, responsible for its low latency nature. One is to divide the segments into parts and deliver them as soon as they’re available. The other is to inform the player about the data to be loaded next before said data is even available.
The video segments are further divided into parts (similar to chunks used in CMAF). These parts are just “smaller segments” with a definite duration - represented with EXT-X-PART
tag in the media playlist.
The players can fill up their buffer more efficiently by publishing the parts while the segment is being generated. Reducing the buffer size on the player side using this approach, results in reduced latency. These parts are then collectively replaced with their respective segments upon completion, which will remain available for a longer period of time.
When LL-HLS was first introduced, it had HTTP/2 push specified as a requirement on the server side for sending new data to clients. Many commercial CDN providers were not supporting this feature at the time, which resulted in a lot of confusion.
This issue was addressed by Apple in a subsequent update, replacing the HTTP/2 push with preload hints. They decided to include support for preload hints by adding a new tag EXT-X-PRELOAD-HINT
to the playlist, reducing overhead.
With the help of preload hint, a video player can anticipate the data to be loaded next and can send a request to URI from the hint to gain faster access to the next part/data. The servers should block all requests for the preload hint data and return them as soon as the data becomes available, thus reducing latency.
Now, let’s take a look at how these tags are specified in the media playlist file, using an example. We will assume the segment size to be 6 seconds and the part size to be 200 milliseconds. We will also assume that 2 segments (segment A and B) have been completely played, while the 3rd segment (segment C) is still being generated. This segment is being published as a list of parts in the order of playback because it has not yet been completed.
The following is a sample media playlist (M3U8 file).
#EXTM3U
# Other tags
#
# The following tags are used for accessing the sequences
# that are completely generated and can be loaded without any
# delay.
# These segments are specified by their duration, followed
# by an unique URI.
#EXTINF:6.0,
fileSegmentA.mp4
#EXTINF:6.0,
fileSegmentB.mp4
#
# The following tags are used for accessing the parts of
# the segment currently being generated.
# These parts are specified by their duration, followed
# either by a unique URI for that specific part or the segment
# URI with a byte-range.
#EXT-X-PART:DURATION=0.200,URI="filePartC.0.mp4"
#EXT-X-PART:DURATION=0.200,URI="filePartC.1.mp4"
# or
#EXT-X-PART:DURATION=0.200,URI="fileSegmentC.mp4",BYTERANGE=20000@21000
#
# The following tag is used to inform the player about the
# most likely part to be fetched next, before it becomes available,
# to be used for playback.
#EXT-X-PRELOAD-HINT:TYPE=PART,URI="filePartC.2.mp4"
Players that don’t support LL-HLS yet tend to ignore tags like EXT-X-PART
and EXT-X-PRELOAD-HINT
, enabling them to treat the playlist with the traditional HLS and load segments at a higher latency.
The new and improved HLS has a latency of about 3 seconds or less. The only reasonable competition for this protocol is LL-DASH. But Apple does not support DASH on all of its devices. This makes LL-HLS the only low latency live streaming protocol that has wide client-side support including Apple devices.
One of the main advantages of using LL-HLS is its backward compatibility with legacy HLS players. The players that don’t support this variant may fall back to standard HLS and still work with higher latency. Since this protocol required players to start loading unfinished media segments instead of waiting until they become fully available, the changes in the spec made it difficult to adapt it quickly for all players.
It took a while for most non-Apple devices to start supporting LL-HLS. Now, it is widely supported across almost all platforms with relatively newer versions of players. Even though some of them have been planning the support for the protocol since its inception, most of them are new and are improving their compatibility at the moment.
Here are some popular players from different platforms that support LL-HLS in its entirety:
Here, we compare three protocols LL-HLS, LL-DASH and WebRTC on six parameters: compatibility, delivery method, support for ABR, security, latency, best use case.
First, let’s go through a few relevant terms used with CMAF.
Chunked Encoding (CE) is a technique used for making publishable “chunks”. When added together, these chunks create a video segment. Chunks have a set duration and are the smallest unit that can be published.
Chunked Transfer Encoding (CTE) is a technique used to deliver the “chunks” as they are created in a sequential order. With CTE, one request for a segment is enough to receive all its chunks. The transmission ends once a zero-length chunk is sent. This method allows even small chunks to be used for transfer.
Adaptive Bitrate (ABR) is a technique for dynamically adjusting the compression level and video quality of a stream to match bandwidth availability. It heavily impacts the video streaming experience for the viewer.
Both LL-HLS and LL-DASH support media encryption and benefit from security features such as token authentication and digital rights management (DRM).
WebRTC supports end-to-end encryption of media for transfer, user, file, and round-trip authentication. This is often sufficient for DRM purposes.
Both LL-HLS and LL-DASH have a latency of 2 to 5 seconds.
WebRTC, on the other hand, has a sub second latency of ~500 milliseconds.
Both LL-HLS and LL-DASH are best suited for live streaming events that need to be delivered to millions of viewers. They are often used for streaming sporting events live.
WebRTC is very frequently used for solutions such as video conferencing that require minimal latency and are not expected to scale to a big number.
Now that the HLS supports low latency streaming, it is all set to conquer the video streaming space, ready to serve millions of fans watching their favourite team play a crucial match without any issues. Whether you want to start live streaming yourself or build an app that facilitates live streaming, LL-HLS remains your best friend.