Some popular iterations of an on-demand video streaming service today are:
- YouTube
- Netflix
- Vimeo
- TikTok
Requirements
- The user can upload video files
- The user can stream video content
- The user can search for videos based on the video title
Data storage
Database schema
- The primary entities are the videos, the users, and the comments tables
- The relationship between the users and the videos is 1-to-many
- The relationship between the users and the comments table is 1-to-many
- The relationship between the videos and the comments table is 1-to-many
Type of data stored
- The wide-column data store (LSM tree-based) such as Apache HBase is used to persist thumbnail images for clumping the files together, fault-tolerance, and replication
- A cache server such as Redis is used to store the metadata of popular video content
- Message queue such as Apache Kafka is used for the asynchronous processing (encoding) of videos
- A relational database such as MySQL stores the metadata of the users and the videos
- The video files are stored in a managed object storage such as AWS S3
- Lucene-based inverted-index data store such as Apache Solr is used to persist the video index data to provide search functionality
High-level design
- Popular video content is streamed from CDN
- Video encoding (transcoding) is the process of converting a video format to other formats (MPEG, HLS) to provide the best stream possible on multiple devices and bandwidth
- A message queue can be configured between services for parallelism and improved fault tolerance
- Codecs (H.264, VP9, HEVC) are compression and decompression algorithms used to reduce video file size while preserving video quality
- The popular video streaming protocols (data transfer standard) are MPEG-DASH (Moving Pictures Experts Group — Dynamic Adaptive Streaming over HTTP), Apple HLS (HTTP Live Streaming), Microsoft Smooth Streaming, and Adobe HDS (HTTP Dynamic Streaming)
Video upload workflow
-
The user (client) executes a DNS query to identify the server
-
The client makes an HTTP connection to the load balancer
-
The video upload requests are rate limited to prevent malicious clients
-
The load balancer delegates the client’s request to an API server (webserver) with free capacity
-
The web server delegates the client’s request to an app server that handles the API endpoint
-
The ID of the uploaded video is stored on the message queue for asynchronous processing of the video file
-
The title and description (metadata) of the video are stored in the metadata database
-
The app server queries the object store service to generate a pre-signed URL for storing the raw video file
-
The client uploads the raw video file directly to the object store using the pre-signed URL to save the system network bandwidth
-
The transcoding servers query the message queue using the publish-subscribe pattern to get notified on uploaded videos
-
The transcoding server fetches the raw video file by querying the raw object store
-
The transcoding server transcodes the raw video file into multiple codecs and stores the transcoded content on the transcoded object store
-
The thumbnail server generates on average five thumbnail images for each video file and stores the generated images on the thumbnail store
-
The transcoding server persists the ID of the transcoded video on the message queue for further processing
-
The upload handler service queries the message queue through the publish-subscribe pattern to get notified on transcoded video files
-
The upload handler service updates the metadata database with metadata of transcoded video files
-
The upload handler service queries the notification service to notify the client of the video processing status
-
The database can be partitioned through consistent hashing (key = user ID or video ID)
-
Block matching or Phase correlation algorithms can be used to detect the duplicate video content
-
The web server (API server) must be kept stateless for scaling out through replication
-
The video file is stored in multiple resolutions and formats in order to support multiple devices and bandwidth
-
The video can be split into smaller chunks by the client before upload to support the resume of broken uploads
-
Watermarking and encryption can be used to protect video content
-
The data centers are added to improve latency and data recovery at the expense of increased maintenance workflows
-
Dead letter queue can be used to improve fault tolerance and error handling
-
Chaos engineering is used to identify the failures on networks, servers, and applications
-
Load testing and chaos engineering are used to improve fault tolerance
-
RAID configuration improves the hardware throughput
-
The data store is partitioned to spread the writes and reads at the expense of difficult joins, transactions, and fat client
-
Federation and sharding are used to scale out the database
-
The write requests are redirected to the leader and the read requests are redirected to the followers of the database
-
Vitess is a storage middleware for scaling out MySQL
-
Vitess redirects the read requests that require fresh data to the leader (For example, update user profile operation)
-
Vitess uses a lock server (Apache Zookeeper) for automatic sharding and leader election on the database layer
-
Vitess supports RPC-based joins, indexing, and transactions on SQL database
-
Vitess allows to offload of partitioning logic from the application and improves database queries by caching
Video streaming workflow
-
The client executes a DNS query to identify the server
-
The client makes an HTTP connection on the load balancer
-
The CDN is queried to verify if the requested video content is on the CDN cache
-
The CDN queries the transcoded object stored on a cache miss
-
The load balancer delegates the client’s request to a web server with free capacity using the weighted round-robin algorithm
-
The web server delegates the client’s request to an app server using consistent hashing
-
The app server queries the metadata cache to fetch the metadata of the video
-
The app server queries the metadata database on a cache miss
-
The app server queries the thumbnail store to fetch the relevant thumbnail images of the video
-
The app server queries the transcoded object store to fetch the video content
-
The app server delegates the search queries of the client to the inverted index store
-
The read and write traffic are segregated for high throughput
-
Popular video content is streamed from CDN (in memory)
-
The push CDN model can be used for caching videos uploaded by users with a significant number of subscribers
-
The moderately streamed video content can be served from the video server directly (disk IO)
-
Consistent hashing can be used to load balance cache servers
-
Caching can be implemented on multiple levels to improve latency
-
LRU cache eviction policy can be used
-
Entropy or jitter is implemented on cache expiration to prevent the thundering herd failure
-
The video files are distributed to the data centers closer to the client when the client starts streaming
-
The traffic should be prioritized between the video streaming cluster (higher priority) and the general cluster to improve reliability
-
The videos can be recommended to the client based on geography, watch history (KNN algorithm), and A/B testing results
-
The video file is split into chunks for streaming and improved fault tolerance
-
The chunks of a video file are joined together when the client starts streaming
-
The video chunks allow adaptive streaming by switching to lower-quality chunks if the high-quality chunks are slower to download
-
Different streaming protocols support different video encodings and playback players
-
The video content is streamed on TCP through buffering
-
The video offset is stored on the server to resume playback from any client device
-
The video resolution on playback depends on the client's device
-
The video file should be encoded into compatible bitrates (quality) and formats for smoother streaming and compatibility
-
The counter for likes on a video can be non-accurate to improve performance (transactions executed at sampled intervals)
-
The comments on video content are shown to the comment owner by data fetching from the leader (database) while other users can fetch from followers with a slight delay
-
The services can be prescaled for extremely popular channels as autoscaling might not meet the requirements of concurrent clients
-
The autoscaling requirements can be predicted by performing machine learning on traffic patterns
-
The autoscaling service should keep a time buffer due to the anticipated delays in the provisioning of resources
-
Fully baked container images (no additional script execution required after provisioning) improve the startup time of services
-
The infrastructure can be prewarmed before the peak hours using benchmark data through load testing
-
The emergency mode should shut down non-critical services to free up resources, and skip the execution of failed services for improved reliability
References
- Cuong Do, YouTube Scalability, Google Talks
- Vitess Architecture, vitess.io
Also published here.
Featured image source.