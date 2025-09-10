Every morning, millions of people open their phones to the same thing: a flood of headlines. Global politics, tech announcements, market swings, and local stories all compete for attention. Most of it isn’t relevant — but buried somewhere are the few stories that truly matter. You don’t need flashy “agentic AI” hype to solve this. What you need are well-designed tools with strong fundamentals: systems that can fetch information, process it, enrich it with structure, and deliver it in a way that fits your context. Large language models add value here — not by being the whole solution, but by refining, summarizing, and helping you iterate. AI Agents At its core, an agent is just a tool that connects a few dots. Think of simple functions that can make RPC/API calls, fetch data from a source, process it, and either pass it along to an LLM or hand it to other agents for more processing. In the context of large language models, an agent usually: Perceives through inputs like search results, APIs, or user instructions.\nReasons with the help of an LLM, deciding what to prioritize.\nActs by calling tools, running code, or presenting results. Perceives through inputs like search results, APIs, or user instructions. Perceives Reasons with the help of an LLM, deciding what to prioritize. Reasons Acts by calling tools, running code, or presenting results. Acts Let’s walk through this “agentic world” — the new hype in town — in the context of a personalized news feed. If you’ve ever prepped for a system design interview, you’ll know feed design always shows up: the Facebook News Feed, Twitter timeline, or (if you’re a 90s kid) RSS readers. This is the same challenge, reimagined for LLMs. The Simple Personalized News Agent Imagine you tell the agent you care about certain tags: AI, Apple, and Bay Area stories. It does three things: AI, Apple, and Bay Area stories Pulls the top news from the web.\nFilters the results by those keywords.\nSummarizes them into a quick digest. Pulls the top news from the web. Filters the results by those keywords. Summarizes them into a quick digest. On a given day, it might give you: Apple unveils new on-device AI model for Siri and iOS apps.\nBay Area rail expansion project secures funding.\nMarkets cool as AI chip demand slows after last quarter’s surge. Apple unveils new on-device AI model for Siri and iOS apps. Bay Area rail expansion project secures funding. Markets cool as AI chip demand slows after last quarter’s surge. This is already helpful. The firehose is reduced to a manageable list. But it’s flat. You don’t know why a story matters, or how it connects to others. Introducing Multiple Agents Instead of relying on one monolithic agent that does everything end-to-end, we can split the workflow across specialist agents, each focused on a single responsibility. This is the same principle as a newsroom: reporters gather raw material, researchers annotate it, analysts provide context, and editors package it for readers. specialist agents In our news pipeline, that looks like this: Fetcher Agent — retrieves full news articles from feeds or APIs.\nPassage Extractor Agent — highlights the most relevant sections of each article.\nNamed Entity Extractor Agent — pulls out people, companies, places, and products mentioned.\nEntity Disambiguation Agent — ensures “Apple” is Apple Inc., not the fruit.\nEntity Tagger Agent — assigns structured tags (e.g., Organization: Apple, Product: iPhone).\nTopic Classifier Agent — identifies broader themes such as AI, Finance, Bay Area.\nSentiment & Stance Agent — determines whether coverage is positive, negative, or neutral.\nTag Summarizer Agent — merges entities, topics, and sentiments into thematic sections.\nFact-Checker Agent — validates claims against trusted sources.\nPersonalization & Ranking Agent — prioritizes stories that match your interests and history.\nDigest Compiler Agent — assembles the polished digest in a reader-friendly format.\nDaily Digest Agent — delivers the final package (to your inbox, Slack, or app). Fetcher Agent — retrieves full news articles from feeds or APIs. Fetcher Agent Passage Extractor Agent — highlights the most relevant sections of each article. Passage Extractor Agent Named Entity Extractor Agent — pulls out people, companies, places, and products mentioned. Named Entity Extractor Agent Entity Disambiguation Agent — ensures “Apple” is Apple Inc., not the fruit. Entity Disambiguation Agent Entity Tagger Agent — assigns structured tags (e.g., Organization: Apple, Product: iPhone). Entity Tagger Agent Organization: Apple Product: iPhone Topic Classifier Agent — identifies broader themes such as AI, Finance, Bay Area. Topic Classifier Agent AI, Finance, Bay Area Sentiment & Stance Agent — determines whether coverage is positive, negative, or neutral. Sentiment & Stance Agent Tag Summarizer Agent — merges entities, topics, and sentiments into thematic sections. Tag Summarizer Agent Fact-Checker Agent — validates claims against trusted sources. Fact-Checker Agent Personalization & Ranking Agent — prioritizes stories that match your interests and history. Personalization & Ranking Agent Digest Compiler Agent — assembles the polished digest in a reader-friendly format. Digest Compiler Agent Daily Digest Agent — delivers the final package (to your inbox, Slack, or app). Daily Digest Agent Some of these agents operate sequentially (e.g., disambiguation must follow extraction), while others can run in parallel (topic classification, sentiment analysis, and entity extraction can all work on the same passage at once). The result is a coordinated pipeline of specialists, producing a far richer and more structured digest than any single agent could. sequentially in parallel What Comes In and What Goes Out -- Agent interfaces The table below summarizes what every agent would expect and what it would give back. I also tried to show where agents might interact with LLMs if they need help. Agent\n\nInputs\n\nOutputs\n\nLLM Needed?\n\n\n\nFetcher\n\nNews feed URL, RSS, API query\n\nFull article text, metadata (title, URL, timestamp, source)\n\n❌ No — HTTP/API call\n\n\n\nPassage Extractor\n\nFull article text\n\nKey passages, passage embeddings\n\n✅ Optional — LLM for salience, or embeddings/TF-IDF\n\n\n\nNamed Entity Extractor\n\nPassages\n\nEntity list, spans, embeddings\n\n❌/✅ — NER models are faster, LLM can catch novel entities\n\n\n\nEntity Disambiguation\n\nEntity list, context embeddings\n\nResolved entities with canonical IDs (e.g., Wikidata Q312)\n\n✅ Yes — reasoning helps resolve ambiguous names\n\n\n\nEntity Tagger\n\nDisambiguated entities\n\nEntities with categories (Org, Person, Product, Location)\n\n❌ No — deterministic classification\n\n\n\nTopic Classifier\n\nPassages, embeddings\n\nTopic labels (AI, Finance, Bay Area)\n\n❌/✅ — embeddings + clustering or LLM for nuance\n\n\n\nSentiment & Stance Analyzer\n\nPassages, entities\n\nSentiment score, stance (supportive/critical/neutral)\n\n✅ Optional — LLM for nuance, or sentiment models for speed\n\n\n\nTag Summarizer\n\nTagged entities, topics, sentiment\n\nStructured summaries grouped by tag\n\n✅ Yes — summarization requires LLM\n\n\n\nFact-Checker\n\nSummaries, claims\n\nVerified/Unverified claims, supporting references\n\n✅ Yes — requires claim extraction + retrieval reasoning\n\n\n\nPersonalization & Ranking\n\nValidated summaries, user profile\n\nRanked/weighted story list\n\n❌ No — ML heuristics suffice\n\n\n\nDigest Compiler\n\nRanked summaries\n\nFinal formatted digest (Markdown, HTML, JSON)\n\n❌/✅ — deterministic formatting, LLM optional for tone\n\n\n\nDaily Digest\n\nCompiled digest\n\nDelivery package (email, Slack, app notification)\n\n❌ No — just delivery Agent\n\nInputs\n\nOutputs\n\nLLM Needed?\n\n\n\nFetcher\n\nNews feed URL, RSS, API query\n\nFull article text, metadata (title, URL, timestamp, source)\n\n❌ No — HTTP/API call\n\n\n\nPassage Extractor\n\nFull article text\n\nKey passages, passage embeddings\n\n✅ Optional — LLM for salience, or embeddings/TF-IDF\n\n\n\nNamed Entity Extractor\n\nPassages\n\nEntity list, spans, embeddings\n\n❌/✅ — NER models are faster, LLM can catch novel entities\n\n\n\nEntity Disambiguation\n\nEntity list, context embeddings\n\nResolved entities with canonical IDs (e.g., Wikidata Q312)\n\n✅ Yes — reasoning helps resolve ambiguous names\n\n\n\nEntity Tagger\n\nDisambiguated entities\n\nEntities with categories (Org, Person, Product, Location)\n\n❌ No — deterministic classification\n\n\n\nTopic Classifier\n\nPassages, embeddings\n\nTopic labels (AI, Finance, Bay Area)\n\n❌/✅ — embeddings + clustering or LLM for nuance\n\n\n\nSentiment & Stance Analyzer\n\nPassages, entities\n\nSentiment score, stance (supportive/critical/neutral)\n\n✅ Optional — LLM for nuance, or sentiment models for speed\n\n\n\nTag Summarizer\n\nTagged entities, topics, sentiment\n\nStructured summaries grouped by tag\n\n✅ Yes — summarization requires LLM\n\n\n\nFact-Checker\n\nSummaries, claims\n\nVerified/Unverified claims, supporting references\n\n✅ Yes — requires claim extraction + retrieval reasoning\n\n\n\nPersonalization & Ranking\n\nValidated summaries, user profile\n\nRanked/weighted story list\n\n❌ No — ML heuristics suffice\n\n\n\nDigest Compiler\n\nRanked summaries\n\nFinal formatted digest (Markdown, HTML, JSON)\n\n❌/✅ — deterministic formatting, LLM optional for tone\n\n\n\nDaily Digest\n\nCompiled digest\n\nDelivery package (email, Slack, app notification)\n\n❌ No — just delivery Agent\n\nInputs\n\nOutputs\n\nLLM Needed? others are lightweight and deterministic. This split matters: for production, you’ll want as few LLM calls as possible (to save cost and latency), reserving them for reasoning-heavy tasks like disambiguation, summarization, and fact-checking. I’ve tried to show one of the ways how the split would look like. A Concrete Example: Bay Area Earthquake Let’s run a real article through our pipeline. The story: Title: Magnitude 3.2 earthquake hits near Pleasanton Title: Magnitude 3.2 earthquake hits near Pleasanton Magnitude 3.2 earthquake hits near Pleasanton Source: CBS Bay Area, Sept 7, 2025 Source: Snippet: “A magnitude 3.2 earthquake struck near Pleasanton on Sunday morning, according to the United States Geological Survey. The quake hit just after 10 a.m., about 3 miles north of Pleasanton. Residents across the East Bay reported weak shaking. No immediate reports of damage.” Snippet: “A magnitude 3.2 earthquake struck near Pleasanton on Sunday morning, according to the United States Geological Survey. The quake hit just after 10 a.m., about 3 miles north of Pleasanton. Residents across the East Bay reported weak shaking. No immediate reports of damage.” Each of the agents’ responsibilities is summarized below: Fetcher Agent: pulls the article text.\nPassage Extractor: highlights quake magnitude, timing, location, and shaking.\nEntity Extractor: identifies Pleasanton, USGS, East Bay.\nEntity Disambiguation: resolves to Pleasanton, CA, and the United States Geological Survey.\nEntity Tagger: classifies Pleasanton → Location; USGS → Organization.\nTopic Classifier: tags as Natural Disaster, Local News, Seismology.\nSentiment & Stance: neutral, informational.\nTag Summarizer:\n\nLocal News: “A 3.2-magnitude quake hit Pleasanton; residents felt weak shaking.”\nNatural Disaster: “USGS confirmed the quake’s magnitude; no damage reported.”\n\n\nFact-Checker: confirms magnitude via USGS and shaking reports via Patch.\nPersonalization & Ranking: emphasizes Local News (user profile weighted to the Bay Area).\nDigest Compiler + Delivery: sends email with subject “Your Bay Area Update — Earthquake Alert.” Fetcher Agent: pulls the article text. Fetcher Agent: Passage Extractor: highlights quake magnitude, timing, location, and shaking. Passage Extractor: Entity Extractor: identifies Pleasanton, USGS, East Bay. Entity Extractor: Pleasanton, USGS, East Bay Entity Disambiguation: resolves to Pleasanton, CA, and the United States Geological Survey. Entity Disambiguation: Pleasanton, CA, United States Geological Survey Entity Tagger: classifies Pleasanton → Location; USGS → Organization. Entity Tagger: Topic Classifier: tags as Natural Disaster, Local News, Seismology. Topic Classifier: Natural Disaster, Local News, Seismology Sentiment & Stance: neutral, informational. Sentiment & Stance: Tag Summarizer:\n\nLocal News: “A 3.2-magnitude quake hit Pleasanton; residents felt weak shaking.”\nNatural Disaster: “USGS confirmed the quake’s magnitude; no damage reported.” Tag Summarizer: Local News: “A 3.2-magnitude quake hit Pleasanton; residents felt weak shaking.”\nNatural Disaster: “USGS confirmed the quake’s magnitude; no damage reported.” Local News: “A 3.2-magnitude quake hit Pleasanton; residents felt weak shaking.” Local News Natural Disaster: “USGS confirmed the quake’s magnitude; no damage reported.” Natural Disaster Fact-Checker: confirms magnitude via USGS and shaking reports via Patch. Fact-Checker: Personalization & Ranking: emphasizes Local News (user profile weighted to the Bay Area). Personalization & Ranking: Local News Digest Compiler + Delivery: sends email with subject “Your Bay Area Update — Earthquake Alert.” Digest Compiler + Delivery: “Your Bay Area Update — Earthquake Alert.” What started as a raw headline became a structured, ranked, fact-checked digest. Beyond News: Generalizing to Other Feeds What’s powerful about this agent pipeline is that nothing in it is tied only to news. It’s really a framework for taking any content feed → extracting structure → producing a personalized digest. framework for taking any content feed → extracting structure → producing a personalized digest Let’s take another example: arXiv papers. arXiv papers Every day, hundreds of research papers drop across categories like Machine Learning, Computer Vision, or Quantum Computing. For a researcher, the challenge is the same as the news: too much volume, too little time, and only a few papers are truly relevant. How the Same Agents Apply Fetcher Agent Fetcher Agent Input: arXiv RSS feed or API query.\nOutput: Paper metadata (title, authors, abstract, category). Input: arXiv RSS feed or API query. Output: Paper metadata (title, authors, abstract, category). Passage Extractor Agent Passage Extractor Agent Input: Abstract text.\nOutput: Key sentences (problem statement, method, result). Input: Abstract text. Output: Key sentences (problem statement, method, result). Named Entity Extractor Agent Named Entity Extractor Agent Input: Abstract.\nOutput: Entities like “transformer,” “federated learning,” “TPU v5e.” Input: Abstract. Output: Entities like “transformer,” “federated learning,” “TPU v5e.” Entity Disambiguation Agent Entity Disambiguation Agent Input: Entities + context.\nOutput: Links to canonical IDs (e.g., arXiv subject codes, Wikipedia entries). Input: Entities + context. Output: Links to canonical IDs (e.g., arXiv subject codes, Wikipedia entries). Entity Tagger Agent Entity Tagger Agent Input: Resolved entities.\nOutput: Categories: Algorithm, Dataset, Hardware, Domain. Input: Resolved entities. Output: Categories: Algorithm, Dataset, Hardware, Domain. Topic Classifier Agent Topic Classifier Agent Input: Abstract embeddings.\nOutput: Tags like {Deep Learning, Reinforcement Learning, Distributed Systems}. Input: Abstract embeddings. Output: Tags like {Deep Learning, Reinforcement Learning, Distributed Systems}. Sentiment & Stance Agent Sentiment & Stance Agent Input: Abstract.\nOutput: “Positive result” (model beats SOTA by 2%), “Critical” (paper refutes prior claim). Input: Abstract. Output: “Positive result” (model beats SOTA by 2%), “Critical” (paper refutes prior claim). Tag Summarizer Agent Tag Summarizer Agent Input: Entities + topics.\nOutput:\n\n\nDistributed Training: “New optimizer reduces GPU communication overhead by 30%.”\n\n\nNLP: “Transformer variant improves long-context understanding.” Input: Entities + topics. Output:\n\n\nDistributed Training: “New optimizer reduces GPU communication overhead by 30%.”\n\n\nNLP: “Transformer variant improves long-context understanding.” Distributed Training: “New optimizer reduces GPU communication overhead by 30%.”\n\n\nNLP: “Transformer variant improves long-context understanding.” Distributed Training: “New optimizer reduces GPU communication overhead by 30%.” Distributed Training: “New optimizer reduces GPU communication overhead by 30%.” Distributed Training NLP: “Transformer variant improves long-context understanding.” NLP: “Transformer variant improves long-context understanding.” NLP Fact-Checker Agent Fact-Checker Agent Input: Claims in abstract.\nOutput: Basic validation against cited benchmarks, prior arXiv papers. Input: Claims in abstract. Output: Basic validation against cited benchmarks, prior arXiv papers. Personalization & Ranking Agent Personalization & Ranking Agent Input: Summaries + user profile.\nOutput: Weighted list — e.g., ML (0.9), Systems (0.7), Theory (0.2). Input: Summaries + user profile. Output: Weighted list — e.g., ML (0.9), Systems (0.7), Theory (0.2). Digest Compiler Agent Digest Compiler Agent Output: A daily “Research Digest” grouped by topics you care about. Output: A daily “Research Digest” grouped by topics you care about. Daily Digest Agent Daily Digest Agent Output: Email / Slack message titled “Your Research Updates — Sept 7, 2025.” Output: Email / Slack message titled “Your Research Updates — Sept 7, 2025.” “Your Research Updates — Sept 7, 2025.” Example Output Machine Learning Machine Learning “A new optimizer for distributed training reduces GPU communication overhead by 30%.”\n“Transformer variant improves long-context understanding.” “A new optimizer for distributed training reduces GPU communication overhead by 30%.” “A new optimizer for distributed training reduces GPU communication overhead by 30%.” “Transformer variant improves long-context understanding.” “Transformer variant improves long-context understanding.” Systems Systems “Novel checkpointing approach for TPU workloads improves reliability.” “Novel checkpointing approach for TPU workloads improves reliability.” “Novel checkpointing approach for TPU workloads improves reliability.” Theory Theory “Paper refutes prior bounds on sparse recovery in high-dimensional settings.” “Paper refutes prior bounds on sparse recovery in high-dimensional settings.” “Paper refutes prior bounds on sparse recovery in high-dimensional settings.” The General Principle Whether it’s: News articles (politics, finance, Bay Area local updates),\nAcademic papers (arXiv, PubMed),\nInternal company reports (logs, metrics dashboards), News articles (politics, finance, Bay Area local updates), News articles Academic papers (arXiv, PubMed), Academic papers Internal company reports (logs, metrics dashboards), Internal company reports …the same agent pipeline applies. agent pipeline applies You’re always doing: Fetch content.\nExtract passages.\nIdentify entities, disambiguate them.\nTag and classify.\nSummarize and fact-check.\nRank based on user profile.\nDeliver as a digest. Fetch content. Extract passages. Identify entities, disambiguate them. Tag and classify. Summarize and fact-check. Rank based on user profile. Deliver as a digest. That’s the feed-to-digest pattern, and agents are a natural way to implement it. feed-to-digest pattern MCP: The Protocol That Lets Agents Talk When you chain multiple agents together, two big challenges show up: Inter-agent communication — How does the Passage Extractor know how to hand results to the Entity Disambiguation Agent?\n\n\nExternal integrations — How do agents fetch data from APIs (like arXiv, USGS, or RSS feeds) without each agent reinventing its own protocol? Inter-agent communication — How does the Passage Extractor know how to hand results to the Entity Disambiguation Agent? Inter-agent communication — How does the Passage Extractor know how to hand results to the Entity Disambiguation Agent? Inter-agent communication External integrations — How do agents fetch data from APIs (like arXiv, USGS, or RSS feeds) without each agent reinventing its own protocol? External integrations — How do agents fetch data from APIs (like arXiv, USGS, or RSS feeds) without each agent reinventing its own protocol? External integrations This is where MCP (Model Context Protocol) comes in. MCP (Model Context Protocol) What is MCP? Think of MCP as the USB standard for AI agents. USB standard for AI agents It defines interfaces for tools and services.\nIt specifies how agents pass context (inputs, outputs, metadata).\nIt allows interoperability — meaning you can swap one agent out for another without breaking the pipeline. It defines interfaces for tools and services. interfaces It specifies how agents pass context (inputs, outputs, metadata). pass context It allows interoperability — meaning you can swap one agent out for another without breaking the pipeline. interoperability With MCP, the Passage Extractor doesn’t need to “know” the implementation details of the Entity Tagger. It just sends structured data (text + embeddings + tags) in a format MCP understands. Internal Communication Inside our pipeline: Fetcher Agent outputs {title, body, url, timestamp} in MCP format.\nPassage Extractor takes {body} and returns {passages, embeddings}.\nNamed Entity Extractor consumes {passages} and produces {entities}.\nEntity Disambiguation consumes {entities, context} and produces {entity_id}. Fetcher Agent outputs {title, body, url, timestamp} in MCP format. {title, body, url, timestamp} Passage Extractor takes {body} and returns {passages, embeddings}. {body} {passages, embeddings} Named Entity Extractor consumes {passages} and produces {entities}. {passages} {entities} Entity Disambiguation consumes {entities, context} and produces {entity_id}. {entities, context} {entity_id} Each agent talks the same “language” thanks to MCP. External Communication MCP also works outward. For example: The Fetcher Agent uses MCP to call an arXiv API or an RSS feed.\nThe Fact-Checker Agent uses MCP to query Wikipedia or a news database.\nThe Daily Digest Agent uses MCP to deliver results via email or Slack. The Fetcher Agent uses MCP to call an arXiv API or an RSS feed. Fetcher Agent The Fact-Checker Agent uses MCP to query Wikipedia or a news database. Fact-Checker Agent The Daily Digest Agent uses MCP to deliver results via email or Slack. Daily Digest Agent The benefit is that agents can integrate with any external tool as long as that tool speaks MCP, just like plugging any USB device into your laptop. any external tool Why This Matters Without MCP, every agent would need custom adapters — a brittle mess of one-off integrations. With MCP: Standardized contracts → each agent’s input/output is predictable.\nPlug-and-play architecture → you can replace the Sentiment Agent with a better one tomorrow.\nScalability → dozens of agents can coordinate without spaghetti code. Standardized contracts → each agent’s input/output is predictable. Standardized contracts Plug-and-play architecture → you can replace the Sentiment Agent with a better one tomorrow. Plug-and-play architecture Scalability → dozens of agents can coordinate without spaghetti code. Scalability In other words, MCP is what turns a collection of scripts into a modular, extensible agent platform. modular, extensible agent platform Closing Thoughts The journey from a flat, keyword-based feed → to a newsroom of agents → to a generalized digesting platform mirrors how software evolves: from scripts to systems to ecosystems. News today, arXiv tomorrow, logs and dashboards the day after. The pattern is the same: feed-to-digest, powered by agents. And with MCP providing the glue, these agents stop being isolated hacks and start working as part of a larger, interoperable system. feed-to-digest, powered by agents. Don’t get caught up in the “agentic AI” hype — write better tools with strong fundamentals, and leverage LLMs where they add value: to refine, summarize, and iterate. In the next part, I’ll dive into how you can implement the multi-agent systems with MCP.