In an age where video streaming platforms are ubiquitous, the power of personalized recommendations cannot be overstated. Whether you're intrigued by the ever-changing suggestions on YouTube or the lightning-fast shifts in your Amazon recommendations, these are all brought to you by the marvel of recommendation systems. These systems, like the one we are about to delve into, work to tailor content suggestions to individual users based on their past interactions.
My design model for this streaming service recommendation system is built on real-time recommendations. This approach offers several advantages, including ease of implementation and lower computational costs. The true strength of real-time recommendations shines when the system relies on the user's current context.
To develop an effective recommendation system, we need to break down the process into four key stages:
In the initial stage, we transform the videos in our catalog into condensed embeddings. For multi-modal use cases, we can leverage OpenAI's CLIP (Contrastive Language-Image Pre-Training). CLIP encodes both text and images into the same vector space, making it a powerful choice. It's renowned for its impressive zero-shot capabilities, and if necessary, it can be fine-tuned. Videos can be broken down into images, with the most crucial frames converted into image embeddings using CLIP.
Once we've successfully generated vector representations of our videos, we need to efficiently store and retrieve these vectors. Milvus, an open-source vector database, is a top choice for this task. Milvus is designed to power embedding similarity search and AI applications, making it an ideal solution for our needs. It allows us to efficiently retrieve similar videos using the Approximate Nearest Neighbour (ANN) search algorithm.
Before we continue, let's briefly explain what Milvus is:
Milvus is an open-source vector database designed for embedding similarity search and AI applications. It simplifies unstructured data search and ensures a consistent user experience regardless of the deployment environment. With Milvus, we can efficiently perform approximate nearest-neighbor searches on the data.
The ranking stage involves refining the initial set of recommendations for efficiency. It's a meticulous process that considers various user and video features, such as age, gender, category affinity, engagement statistics, video duration, type, language, and more. These features are used to formulate a classification problem, where we predict whether a user will click on a video or watch it for a specified duration. Algorithms like XGBoost or LightGBM can be suitable choices for building a classification model.
With user and video metadata, we can sort and select the top recommendations. Additional business logic can be incorporated if needed. To evaluate the quality of recommendations, metrics like recall@k, precision@k, NDCG, and MRR can be used. Once we are satisfied with the results, an online A/B test can validate the system.
The system's workflow starts with user interactions being transferred using Kafka. The last 5/10 interacted video IDs for each user are stored in Redis. When an API call is made to fetch recommendations, these cached video IDs are used to create an average embedding, as discussed earlier. Once the query embedding is obtained, a vector search is performed in the vector database to generate a candidate set.
API: User and video metadata must be stored in a feature store for quick access. Redis is a suitable choice for a feature store, but alternative solutions can be explored. This metadata is utilized by the classification model to determine probabilities for the defined task. The final recommendation set is sorted and returned to the user. FastAPI can be used to build the underlying APIs at each step, and deployment on Kubernetes ensures scalability.
The crux of my recommendation system's algorithm lies in the Artificial Neural Network (ANN) algorithm. ANN brings significant benefits:
Understanding the impact of dataset size on computational time.
Identifying optimal scenarios for model performance.
Explaining why certain models excel in specific environments.
The ANN formulation is a multi-layered neural network, as shown below:
Pseudocode for ANN Algorithm:
Building a robust streaming recommendation system involves meticulously designing each stage, leveraging powerful algorithms, and implementing efficient data structures. By following these guidelines, you can create a real-time recommendation system that provides users with engaging and personalized content.