Behind the Scenes of Social Feeds: How AI Picks What You See Next

Table of Contents Overview System Architecture Components Home Mixer Thunder Phoenix Candidate Pipeline How It Works Pipeline Stages Scoring and Ranking Filtering Key Design Decisions Overview Overview System Architecture System Architecture Components Home Mixer Thunder Phoenix Candidate Pipeline Components Home Mixer Thunder Phoenix Candidate Pipeline Home Mixer Thunder Phoenix Candidate Pipeline How It Works Pipeline Stages Scoring and Ranking Filtering How It Works Pipeline Stages Scoring and Ranking Filtering Pipeline Stages Scoring and Ranking Filtering Key Design Decisions Key Design Decisions Overview The For You feed algorithm retrieves, ranks, and filters posts from two sources: In-Network (Thunder): Posts from accounts you follow Out-of-Network (Phoenix Retrieval): Posts discovered from a global corpus In-Network (Thunder): Posts from accounts you follow In-Network (Thunder) Out-of-Network (Phoenix Retrieval): Posts discovered from a global corpus Out-of-Network (Phoenix Retrieval) Both sources are combined and ranked together using Phoenix, a Grok-based transformer model that predicts engagement probabilities for each post. The final score is a weighted combination of these predicted engagements. Phoenix We have eliminated every single hand-engineered feature and most heuristics from the system. The Grok-based transformer does all the heavy lifting by understanding your engagement history (what you liked, replied to, shared, etc.) and using that to determine what content is relevant to you. System Architecture ┌─────────────────────────────────────────────────────────────────────────────────────────────┐ │ FOR YOU FEED REQUEST │ └─────────────────────────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────────────────────┐ │ HOME MIXER │ │ (Orchestration Layer) │ ├─────────────────────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ QUERY HYDRATION │ │ │ │ ┌──────────────────────────┐ ┌──────────────────────────────────────────────┐ │ │ │ │ │ User Action Sequence │ │ User Features │ │ │ │ │ │ (engagement history) │ │ (following list, preferences, etc.) │ │ │ │ │ └──────────────────────────┘ └──────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ CANDIDATE SOURCES │ │ │ │ ┌─────────────────────────────┐ ┌────────────────────────────────┐ │ │ │ │ │ THUNDER │ │ PHOENIX RETRIEVAL │ │ │ │ │ │ (In-Network Posts) │ │ (Out-of-Network Posts) │ │ │ │ │ │ │ │ │ │ │ │ │ │ Posts from accounts │ │ ML-based similarity search │ │ │ │ │ │ you follow │ │ across global corpus │ │ │ │ │ └─────────────────────────────┘ └────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ HYDRATION │ │ │ │ Fetch additional data: core post metadata, author info, media entities, etc. │ │ │ └─────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ FILTERING │ │ │ │ Remove: duplicates, old posts, self-posts, blocked authors, muted keywords, etc. │ │ │ └─────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ SCORING │ │ │ │ ┌──────────────────────────┐ │ │ │ │ │ Phoenix Scorer │ Grok-based Transformer predicts: │ │ │ │ │ (ML Predictions) │ P(like), P(reply), P(repost), P(click)... │ │ │ │ └──────────────────────────┘ │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ ┌──────────────────────────┐ │ │ │ │ │ Weighted Scorer │ Weighted Score = Σ (weight × P(action)) │ │ │ │ │ (Combine predictions) │ │ │ │ │ └──────────────────────────┘ │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ ┌──────────────────────────┐ │ │ │ │ │ Author Diversity │ Attenuate repeated author scores │ │ │ │ │ Scorer │ to ensure feed diversity │ │ │ │ └──────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ SELECTION │ │ │ │ Sort by final score, select top K candidates │ │ │ └─────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ FILTERING (Post-Selection) │ │ │ │ Visibility filtering (deleted/spam/violence/gore etc) │ │ │ └─────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────────────────────┐ │ RANKED FEED RESPONSE │ └─────────────────────────────────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────────────────────────────────┐ │ FOR YOU FEED REQUEST │ └─────────────────────────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────────────────────┐ │ HOME MIXER │ │ (Orchestration Layer) │ ├─────────────────────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ QUERY HYDRATION │ │ │ │ ┌──────────────────────────┐ ┌──────────────────────────────────────────────┐ │ │ │ │ │ User Action Sequence │ │ User Features │ │ │ │ │ │ (engagement history) │ │ (following list, preferences, etc.) │ │ │ │ │ └──────────────────────────┘ └──────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ CANDIDATE SOURCES │ │ │ │ ┌─────────────────────────────┐ ┌────────────────────────────────┐ │ │ │ │ │ THUNDER │ │ PHOENIX RETRIEVAL │ │ │ │ │ │ (In-Network Posts) │ │ (Out-of-Network Posts) │ │ │ │ │ │ │ │ │ │ │ │ │ │ Posts from accounts │ │ ML-based similarity search │ │ │ │ │ │ you follow │ │ across global corpus │ │ │ │ │ └─────────────────────────────┘ └────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ HYDRATION │ │ │ │ Fetch additional data: core post metadata, author info, media entities, etc. │ │ │ └─────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ FILTERING │ │ │ │ Remove: duplicates, old posts, self-posts, blocked authors, muted keywords, etc. │ │ │ └─────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ SCORING │ │ │ │ ┌──────────────────────────┐ │ │ │ │ │ Phoenix Scorer │ Grok-based Transformer predicts: │ │ │ │ │ (ML Predictions) │ P(like), P(reply), P(repost), P(click)... │ │ │ │ └──────────────────────────┘ │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ ┌──────────────────────────┐ │ │ │ │ │ Weighted Scorer │ Weighted Score = Σ (weight × P(action)) │ │ │ │ │ (Combine predictions) │ │ │ │ │ └──────────────────────────┘ │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ ┌──────────────────────────┐ │ │ │ │ │ Author Diversity │ Attenuate repeated author scores │ │ │ │ │ Scorer │ to ensure feed diversity │ │ │ │ └──────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ SELECTION │ │ │ │ Sort by final score, select top K candidates │ │ │ └─────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ FILTERING (Post-Selection) │ │ │ │ Visibility filtering (deleted/spam/violence/gore etc) │ │ │ └─────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────────────────────┐ │ RANKED FEED RESPONSE │ └─────────────────────────────────────────────────────────────────────────────────────────────┘ Components Home Mixer Location: home-mixer/ Location: home-mixer/ The orchestration layer that assembles the For You feed. It leverages the CandidatePipeline framework with the following stages: CandidatePipeline Stage Description Query Hydrators Fetch user context (engagement history, following list) Sources Retrieve candidates from Thunder and Phoenix Hydrators Enrich candidates with additional data Filters Remove ineligible candidates Scorers Predict engagement and compute final scores Selector Sort by score and select top K Post-Selection Filters Final visibility and dedup checks Side Effects Cache request info for future use Stage Description Query Hydrators Fetch user context (engagement history, following list) Sources Retrieve candidates from Thunder and Phoenix Hydrators Enrich candidates with additional data Filters Remove ineligible candidates Scorers Predict engagement and compute final scores Selector Sort by score and select top K Post-Selection Filters Final visibility and dedup checks Side Effects Cache request info for future use Stage Description Stage Stage Description Description Query Hydrators Fetch user context (engagement history, following list) Query Hydrators Query Hydrators Fetch user context (engagement history, following list) Fetch user context (engagement history, following list) Sources Retrieve candidates from Thunder and Phoenix Sources Sources Retrieve candidates from Thunder and Phoenix Retrieve candidates from Thunder and Phoenix Hydrators Enrich candidates with additional data Hydrators Hydrators Enrich candidates with additional data Enrich candidates with additional data Filters Remove ineligible candidates Filters Filters Remove ineligible candidates Remove ineligible candidates Scorers Predict engagement and compute final scores Scorers Scorers Predict engagement and compute final scores Predict engagement and compute final scores Selector Sort by score and select top K Selector Selector Sort by score and select top K Sort by score and select top K Post-Selection Filters Final visibility and dedup checks Post-Selection Filters Post-Selection Filters Final visibility and dedup checks Final visibility and dedup checks Side Effects Cache request info for future use Side Effects Side Effects Cache request info for future use Cache request info for future use The server exposes a gRPC endpoint (ScoredPostsService) that returns ranked posts for a given user. ScoredPostsService Thunder Location: thunder/ Location: thunder/ An in-memory post store and realtime ingestion pipeline that tracks recent posts from all users. It: Consumes post create/delete events from Kafka Maintains per-user stores for original posts, replies/reposts, and video posts Serves "in-network" post candidates from accounts the requesting user follows Automatically trims posts older than the retention period Consumes post create/delete events from Kafka Maintains per-user stores for original posts, replies/reposts, and video posts Serves "in-network" post candidates from accounts the requesting user follows Automatically trims posts older than the retention period Thunder enables sub-millisecond lookups for in-network content without hitting an external database. Phoenix Location: phoenix/ Location: phoenix/ The ML component with two main functions: 1. Retrieval (Two-Tower Model) Finds relevant out-of-network posts: User Tower: Encodes user features and engagement history into an embedding Candidate Tower: Encodes all posts into embeddings Similarity Search: Retrieves top-K posts via dot product similarity User Tower: Encodes user features and engagement history into an embedding User Tower Candidate Tower: Encodes all posts into embeddings Candidate Tower Similarity Search: Retrieves top-K posts via dot product similarity Similarity Search 2. Ranking (Transformer with Candidate Isolation) Predicts engagement probabilities for each candidate: Takes user context (engagement history) and candidate posts as input Uses special attention masking so candidates cannot attend to each other Outputs probabilities for each action type (like, reply, repost, click, etc.) Takes user context (engagement history) and candidate posts as input Uses special attention masking so candidates cannot attend to each other Outputs probabilities for each action type (like, reply, repost, click, etc.) See phoenix/README.md for detailed architecture documentation. phoenix/README.md Candidate Pipeline Location: candidate-pipeline/ Location: candidate-pipeline/ A reusable framework for building recommendation pipelines. Defines traits for: Trait Purpose Source Fetch candidates from a data source Hydrator Enrich candidates with additional features Filter Remove candidates that shouldn't be shown Scorer Compute scores for ranking Selector Sort and select top candidates SideEffect Run async side effects (caching, logging) Trait Purpose Source Fetch candidates from a data source Hydrator Enrich candidates with additional features Filter Remove candidates that shouldn't be shown Scorer Compute scores for ranking Selector Sort and select top candidates SideEffect Run async side effects (caching, logging) Trait Purpose Trait Trait Purpose Purpose Source Fetch candidates from a data source Source Source Source Fetch candidates from a data source Fetch candidates from a data source Hydrator Enrich candidates with additional features Hydrator Hydrator Hydrator Enrich candidates with additional features Enrich candidates with additional features Filter Remove candidates that shouldn't be shown Filter Filter Filter Remove candidates that shouldn't be shown Remove candidates that shouldn't be shown Scorer Compute scores for ranking Scorer Scorer Scorer Compute scores for ranking Compute scores for ranking Selector Sort and select top candidates Selector Selector Selector Sort and select top candidates Sort and select top candidates SideEffect Run async side effects (caching, logging) SideEffect SideEffect SideEffect Run async side effects (caching, logging) Run async side effects (caching, logging) The framework runs sources and hydrators in parallel where possible, with configurable error handling and logging. How It Works Pipeline Stages Query Hydration: Fetch the user's recent engagements history and metadata (eg. following list) Candidate Sourcing: Retrieve candidates from: Thunder: Recent posts from followed accounts (in-network) Phoenix Retrieval: ML-discovered posts from the global corpus (out-of-network) Candidate Hydration: Enrich candidates with: Core post data (text, media, etc.) Author information (username, verification status) Video duration (for video posts) Subscription status Pre-Scoring Filters: Remove posts that are: Duplicates Too old From the viewer themselves From blocked/muted accounts Containing muted keywords Previously seen or recently served Ineligible subscription content Scoring: Apply multiple scorers sequentially: Phoenix Scorer: Get ML predictions from the Phoenix transformer model Weighted Scorer: Combine predictions into a final relevance score Author Diversity Scorer: Attenuate repeated author scores for diversity OON Scorer: Adjust scores for out-of-network content Selection: Sort by score and select the top K candidates Post-Selection Processing: Final validation of post candidates to be served Query Hydration: Fetch the user's recent engagements history and metadata (eg. following list) Query Hydration Candidate Sourcing: Retrieve candidates from: Thunder: Recent posts from followed accounts (in-network) Phoenix Retrieval: ML-discovered posts from the global corpus (out-of-network) Candidate Sourcing Thunder: Recent posts from followed accounts (in-network) Phoenix Retrieval: ML-discovered posts from the global corpus (out-of-network) Thunder: Recent posts from followed accounts (in-network) Thunder Phoenix Retrieval: ML-discovered posts from the global corpus (out-of-network) Phoenix Retrieval Candidate Hydration: Enrich candidates with: Core post data (text, media, etc.) Author information (username, verification status) Video duration (for video posts) Subscription status Candidate Hydration Core post data (text, media, etc.) Author information (username, verification status) Video duration (for video posts) Subscription status Core post data (text, media, etc.) Author information (username, verification status) Video duration (for video posts) Subscription status Pre-Scoring Filters: Remove posts that are: Duplicates Too old From the viewer themselves From blocked/muted accounts Containing muted keywords Previously seen or recently served Ineligible subscription content Pre-Scoring Filters Duplicates Too old From the viewer themselves From blocked/muted accounts Containing muted keywords Previously seen or recently served Ineligible subscription content Duplicates Too old From the viewer themselves From blocked/muted accounts Containing muted keywords Previously seen or recently served Ineligible subscription content Scoring: Apply multiple scorers sequentially: Phoenix Scorer: Get ML predictions from the Phoenix transformer model Weighted Scorer: Combine predictions into a final relevance score Author Diversity Scorer: Attenuate repeated author scores for diversity OON Scorer: Adjust scores for out-of-network content Scoring Phoenix Scorer: Get ML predictions from the Phoenix transformer model Weighted Scorer: Combine predictions into a final relevance score Author Diversity Scorer: Attenuate repeated author scores for diversity OON Scorer: Adjust scores for out-of-network content Phoenix Scorer: Get ML predictions from the Phoenix transformer model Phoenix Scorer Weighted Scorer: Combine predictions into a final relevance score Weighted Scorer Author Diversity Scorer: Attenuate repeated author scores for diversity Author Diversity Scorer OON Scorer: Adjust scores for out-of-network content OON Scorer Selection: Sort by score and select the top K candidates Selection Post-Selection Processing: Final validation of post candidates to be served Post-Selection Processing Scoring and Ranking The Phoenix Grok-based transformer model predicts probabilities for multiple engagement types: Predictions: ├── P(favorite) ├── P(reply) ├── P(repost) ├── P(quote) ├── P(click) ├── P(profile_click) ├── P(video_view) ├── P(photo_expand) ├── P(share) ├── P(dwell) ├── P(follow_author) ├── P(not_interested) ├── P(block_author) ├── P(mute_author) └── P(report) Predictions: ├── P(favorite) ├── P(reply) ├── P(repost) ├── P(quote) ├── P(click) ├── P(profile_click) ├── P(video_view) ├── P(photo_expand) ├── P(share) ├── P(dwell) ├── P(follow_author) ├── P(not_interested) ├── P(block_author) ├── P(mute_author) └── P(report) The Weighted Scorer combines these into a final score: Weighted Scorer Final Score = Σ (weight_i × P(action_i)) Final Score = Σ (weight_i × P(action_i)) Positive actions (like, repost, share) have positive weights. Negative actions (block, mute, report) have negative weights, pushing down content the user would likely dislike. Filtering Filters run at two stages: Pre-Scoring Filters: Pre-Scoring Filters: Filter Purpose DropDuplicatesFilter Remove duplicate post IDs CoreDataHydrationFilter Remove posts that failed to hydrate core metadata AgeFilter Remove posts older than threshold SelfpostFilter Remove user's own posts RepostDeduplicationFilter Dedupe reposts of same content IneligibleSubscriptionFilter Remove paywalled content user can't access PreviouslySeenPostsFilter Remove posts user has already seen PreviouslyServedPostsFilter Remove posts already served in session MutedKeywordFilter Remove posts with user's muted keywords AuthorSocialgraphFilter Remove posts from blocked/muted authors Filter Purpose DropDuplicatesFilter Remove duplicate post IDs CoreDataHydrationFilter Remove posts that failed to hydrate core metadata AgeFilter Remove posts older than threshold SelfpostFilter Remove user's own posts RepostDeduplicationFilter Dedupe reposts of same content IneligibleSubscriptionFilter Remove paywalled content user can't access PreviouslySeenPostsFilter Remove posts user has already seen PreviouslyServedPostsFilter Remove posts already served in session MutedKeywordFilter Remove posts with user's muted keywords AuthorSocialgraphFilter Remove posts from blocked/muted authors Filter Purpose Filter Filter Purpose Purpose DropDuplicatesFilter Remove duplicate post IDs DropDuplicatesFilter DropDuplicatesFilter DropDuplicatesFilter Remove duplicate post IDs Remove duplicate post IDs CoreDataHydrationFilter Remove posts that failed to hydrate core metadata CoreDataHydrationFilter CoreDataHydrationFilter CoreDataHydrationFilter Remove posts that failed to hydrate core metadata Remove posts that failed to hydrate core metadata AgeFilter Remove posts older than threshold AgeFilter AgeFilter AgeFilter Remove posts older than threshold Remove posts older than threshold SelfpostFilter Remove user's own posts SelfpostFilter SelfpostFilter SelfpostFilter Remove user's own posts Remove user's own posts RepostDeduplicationFilter Dedupe reposts of same content RepostDeduplicationFilter RepostDeduplicationFilter RepostDeduplicationFilter Dedupe reposts of same content Dedupe reposts of same content IneligibleSubscriptionFilter Remove paywalled content user can't access IneligibleSubscriptionFilter IneligibleSubscriptionFilter IneligibleSubscriptionFilter Remove paywalled content user can't access Remove paywalled content user can't access PreviouslySeenPostsFilter Remove posts user has already seen PreviouslySeenPostsFilter PreviouslySeenPostsFilter PreviouslySeenPostsFilter Remove posts user has already seen Remove posts user has already seen PreviouslyServedPostsFilter Remove posts already served in session PreviouslyServedPostsFilter PreviouslyServedPostsFilter PreviouslyServedPostsFilter Remove posts already served in session Remove posts already served in session MutedKeywordFilter Remove posts with user's muted keywords MutedKeywordFilter MutedKeywordFilter MutedKeywordFilter Remove posts with user's muted keywords Remove posts with user's muted keywords AuthorSocialgraphFilter Remove posts from blocked/muted authors AuthorSocialgraphFilter AuthorSocialgraphFilter AuthorSocialgraphFilter Remove posts from blocked/muted authors Remove posts from blocked/muted authors Post-Selection Filters: Post-Selection Filters: Filter Purpose VFFilter Remove posts that are deleted/spam/violence/gore etc. DedupConversationFilter Deduplicate multiple branches of the same conversation thread Filter Purpose VFFilter Remove posts that are deleted/spam/violence/gore etc. DedupConversationFilter Deduplicate multiple branches of the same conversation thread Filter Purpose Filter Filter Purpose Purpose VFFilter Remove posts that are deleted/spam/violence/gore etc. VFFilter VFFilter VFFilter Remove posts that are deleted/spam/violence/gore etc. Remove posts that are deleted/spam/violence/gore etc. DedupConversationFilter Deduplicate multiple branches of the same conversation thread DedupConversationFilter DedupConversationFilter DedupConversationFilter Deduplicate multiple branches of the same conversation thread Deduplicate multiple branches of the same conversation thread Key Design Decisions 1. No Hand-Engineered Features The system relies entirely on the Grok-based transformer to learn relevance from user engagement sequences. No manual feature engineering for content relevance. This significantly reduces the complexity in our data pipelines and serving infrastructure. 2. Candidate Isolation in Ranking During transformer inference, candidates cannot attend to each other—only to the user context. This ensures the score for a post doesn't depend on which other posts are in the batch, making scores consistent and cacheable. 3. Hash-Based Embeddings Both retrieval and ranking use multiple hash functions for embedding lookup 4. Multi-Action Prediction Rather than predicting a single "relevance" score, the model predicts probabilities for many actions. 5. Composable Pipeline Architecture The candidate-pipeline crate provides a flexible framework for building recommendation pipelines with: candidate-pipeline Separation of pipeline execution and monitoring from business logic Parallel execution of independent stages and graceful error handling Easy addition of new sources, hydrations, filters, and scorers Separation of pipeline execution and monitoring from business logic Parallel execution of independent stages and graceful error handling Easy addition of new sources, hydrations, filters, and scorers Note: The transformer implementation is ported from the Grok-1 open source release by xAI, adapted for recommendation system use cases. Note: The transformer implementation is ported from the Grok-1 open source release by xAI, adapted for recommendation system use cases. Note: Grok-1 open source release Grok-1 open source release This project is licensed under the Apache License 2.0. This project is licensed under the Apache License 2.0. Apache License 2.0