In my previous article I tried to explain how we could leverage agents to make our news feed personalized. In this article I'll try to show how we can implement such a system.
Pre-requisite
If you want to understand what is MCP and how it works refer here. In short, MCP is to Agents + Rag + LLMs what HTTP was to the internet. MCP conforms agents to adhere to a fixed verbiage when the interact with outside world or with LLMs or among themselves i.e. its the language the agents use to communicate. This is good because it helps to standardize things. Its also cautionary to not build entire castles and kingdoms with MCP both because MCP is new and things change light-speed in LLM-land.
Recap
Here are the different agents I'd mentioned in my previous article, now showing the tool they will implement and for the theme of fun, lets name each agent.
Agent | Inputs | Outputs | LLM Needed? | MCP Role |
---|---|---|---|---|
Fetcher (Harriet 🕵️) | News feed URL, RSS, API query | Full article text, metadata (title, URL, timestamp, source) | ❌ No — plain HTTP/API | MCP Tool → fetch_articles(source, since) |
Passage Extractor (Clarence ✂️) | Full article text | Key passages, passage embeddings | ✅ Optional — LLM for salience, or embeddings/TF-IDF | MCP Tool → extract_passages(article) |
Named Entity Extractor (Fiona 🔍) | Passages | Entity list, spans, embeddings | ❌/✅ NER models are fast, LLMs catch novel entities | MCP Tool → extract_entities(passages) |
Entity Disambiguator (Dexter 🧩) | Entity list, context embeddings | Resolved entities with canonical IDs (e.g. Wikidata Q312) | ✅ Yes — reasoning for “Apple” the 🍎 vs | MCP Workflow → disambiguate_entities(entity, context) |
Entity Tagger (Tess 🏷️) | Disambiguated entities | Entities + categories (Org, Person, Product, Location) | ❌ No — deterministic taxonomy | MCP Tool → tag_entities(resolved_entities) |
Topic Classifier (Theo 📚) | Passages, embeddings | Topic labels (AI, Finance, Bay Area) | ❌/✅ Embeddings + clustering or LLM for nuance | MCP Tool → classify_topic(passages) |
Sentiment & Stance Analyzer (Sana 💬) | Passages, entities | Sentiment score + stance (supportive / critical / neutral) | ✅ Optional — LLM for subtlety | MCP Tool → analyze_sentiment(passage) |
Tag Summarizer (Sumi ✨) | Tagged entities, topics, sentiment | Structured summaries grouped by tag | ✅ Yes — summarization is LLM-heavy | MCP Workflow → summarize(tags) |
Fact-Checker (Frank ✅) | Summaries, claims | Verified/Unverified claims + references | ✅ Yes — retrieval + reasoning | MCP Workflow → fact_check(claims) |
Personalization & Ranking (Loretta 🎯) | Validated summaries, user profile | Ranked/weighted story list | ❌ No — ML heuristics / rules | MCP Tool → rank(user_profile, summaries) |
Digest Compiler (Daisy 📄) | Ranked summaries | Final digest (Markdown, HTML, JSON) | ❌/✅ Formatting deterministic; LLM optional for tone | MCP Tool → compile_digest(rankings) |
Daily Digest (Courier Carl 📬) | Compiled digest | Delivery package (email, Slack, app notification) | ❌ No — just delivery | MCP Client → pushes via chosen channel |
For sake of brevity I'll try to show how to define some of these tools and how agents can use them.
MCP Tool: fetcher.py
Every newsroom starts with a reporter. Here, that’s Harriet (Fetcher), who pulls in articles from RSS feeds or sample JSON.
def fetch_articles(source: str, since: Optional[str] = None, limit: int = 10) -> Dict[str, List[Article]]:
"""Fetch the latest news articles from a given source.
URL sources are fetched live with ``httpx``. All other values fall back to the demo
corpus stored in ``resources/sample_articles.json`` so the server remains usable
offline.
"""
if _looks_like_url(source):
try:
transport = httpx.HTTPTransport(retries=2)
with httpx.Client(
timeout=10.0,
headers={"User-Agent": "newsroom-server/0.1"},
follow_redirects=True,
http2=False,
transport=transport,
) as client:
response = client.get(source)
response.raise_for_status()
except httpx.HTTPError as exc:
raise RuntimeError(f"Failed to fetch RSS feed '{source}': {exc}") from exc
articles = _parse_rss_feed(response.text, source=source, limit=limit)
articles = _filter_since(articles, since)
articles = sorted(articles, key=lambda item: item["timestamp"], reverse=True)
return {"articles": articles[:limit]}
articles_by_source = _load_articles()
if source not in articles_by_source:
raise ValueError(
f"Unknown news source '{source}'. Available sources: {sorted(articles_by_source)}"
)
articles = _filter_since(articles_by_source[source], since)
articles = sorted(articles, key=lambda item: item["timestamp"], reverse=True)
return {"articles": articles[:limit]}
Harriet normalizes the source, fetches, filters by date, and returns a consistent JSON of articles.
MCP Tool: passage_extractor.py
Long articles overwhelm downstream tools. Clarence (Passage Extractor) chops them into short, coherent passages.
def extract_passages(
article_id: str,
content: str,
max_length: int = 320,
llm_mode: bool = False,
model: Optional[str] = None,
fallback_on_error: bool = True,
) -> Dict[str, List[Passage]]:
"""Split full article text into coherent passages.
The helper keeps passages short enough for downstream tools while preserving the
original order. When ``llm_mode`` is enabled, passage splitting is delegated to an
LLM and falls back to the rule-based strategy if necessary.
"""
if llm_mode and content.strip():
try:
llm_passages = extract_passages_with_llm(
article_id=article_id,
content=content,
max_length=max_length,
model=model,
)
except RuntimeError as exc:
if not fallback_on_error:
raise
print(f"[newsroom] llm passage extraction fallback: {exc}", file=sys.stderr)
else:
return {"passages": llm_passages} # type: ignore[return-value]
By default, Clarence uses a rule-based splitter, but can call an LLM for smarter boundaries. When we involve an LLM here, the prompt needs to be explicit: split text into coherent sections, don’t cut mid-sentence, and keep each segment under the max_length
limit. The temperature is pinned low for determinism, and any malformed outputs are caught by schema checks so we can fall back to the rule-based splitter. In other words, Clarence gets more fluent passages with LLM help, but only because we constrain him carefully.
MCP Tool: entity_extractor.py
Once passages are available, Fiona (Entity Extractor) identifies names, places, and organizations.
def extract_entities(
passages: List[Passage],
llm_mode: bool = False,
embeddings: bool = False,
model: Optional[str] = None,
fallback_on_error: bool = True,
) -> Dict[str, List[EntityMention]]:
"""Identify named entities in passages with optional LLM support."""
if llm_mode:
try:
llm_entities = extract_entities_with_llm(passages, model=model)
except RuntimeError as exc:
if not fallback_on_error:
raise
print(f"[newsroom] llm entity extraction fallback: {exc}", file=sys.stderr)
else:
return {"entities": llm_entities} # type: ignore[return-value]
mentions = _rule_based_entities(passages)
return {"entities": mentions}
Like Clarence, Fiona can operate rule-based or lean on an LLM for tricky cases. If Fiona uses an LLM, the prompt must demand structured JSON like { "entity": "OpenAI", "type": "Org" }
. By keeping randomness close to zero, we avoid the model drifting into free prose. Schema validation enforces the contract, and a fallback NER model ensures we don’t stall when the LLM fails.
MCP Tool: disambiguator.py
Is “Apple” a fruit 🍎 or a company ? That’s where Dexter (Disambiguator) comes in.
def disambiguate_entities(
entities: List[EntityMention],
context: str = "",
llm_mode: bool = False,
model: Optional[str] = None,
fallback_on_error: bool = True,
) -> Dict[str, List[ResolvedEntity]]:
"""Resolve ambiguous entities to canonical IDs with optional LLM assistance."""
if llm_mode and entities:
try:
resolved_llm = resolve_entities_with_llm(entities, context=context, model=model)
except RuntimeError as exc:
if not fallback_on_error:
raise
print(f"[newsroom] llm disambiguation fallback: {exc}", file=sys.stderr)
else:
return {"resolved_entities": resolved_llm} # type: ignore[return-value]
resolved = _rule_based_disambiguation(entities)
return {"resolved_entities": resolved}
Dexter uses heuristics for common cases but can call an LLM to reason with context. Prompting is about context. We explicitly provide the passage and metadata, then ask the model to map each entity to a canonical ID like a Wikidata QIDs. The LLM is also asked to return a confidence score and short justification. If it can’t comply, we drop back to rule-based linking or some cached.
MCP Tool tagger.py
With entities resolved, we can categorize and classify.
- Tess (Tagger) assigns categories like
Person
,Org
, orLocation
.
def tag_entities(
resolved_entities: List[ResolvedEntity],
llm_mode: bool = False,
model: Optional[str] = None,
fallback_on_error: bool = True,
) -> Dict[str, List[TaggedEntity]]:
"""Assign newsroom-specific categories to entities."""
if llm_mode and resolved_entities:
try:
llm_tags = tag_entities_with_llm(resolved_entities, model=model)
except RuntimeError as exc:
if not fallback_on_error:
raise
print(f"[newsroom] llm tagging fallback: {exc}", file=sys.stderr)
else:
return {"tagged_entities": llm_tags} # type: ignore[return-value]
When Tess uses an LLM, the prompt lists the taxonomy explicitly and instructs: “Use only these categories.”
MCP Tool topic_classifier.py
- Theo (Topic Classifier) clusters passages into topics like
AI
,Finance
, orBay Area
.
def classify_topic(
passages: List[Passage],
llm_mode: bool = False,
model: Optional[str] = None,
fallback_on_error: bool = True,
) -> Dict[str, List[TopicPrediction]]:
"""Classify passages into newsroom beats with optional LLM assistance."""
if llm_mode and passages:
try:
llm_topics = classify_topics_with_llm(passages, model=model)
except RuntimeError as exc:
if not fallback_on_error:
raise
print(f"[newsroom] llm topic classification fallback: {exc}", file=sys.stderr)
else:
return {"topics": llm_topics} # type: ignore[return-value]
For Theo, we either give a whitelist of newsroom topics or ask for a probability distribution across them. Without such constraints, the model might invent new labels, which would break consistency downstream.
MCP Tool fact_checker.py
We want trustworthy information. This is where Frank (Fact Checker) steps in. His job is to validate claims by cross-checking references or delegating to an LLM when reasoning is required.
def fact_check(
claims: List[str],
llm_mode: bool = False,
model: Optional[str] = None,
fallback_on_error: bool = True,
) -> Dict[str, List[Dict]]:
"""Verify claims with canned references suitable for the demo."""
if llm_mode and claims:
try:
checked = fact_check_with_llm(claims, model=model)
except RuntimeError as exc:
if not fallback_on_error:
raise
print(f"[newsroom] llm fact-check fallback: {exc}", file=sys.stderr)
else:
return {"checked_claims": checked}
Here, prompts must force the LLM to decompose text into atomic claims, check each claim against evidence, and return { claim, status, reference }
. Temperature stays near zero, and "Unverified"
is always an acceptable outcome. That safeguard is especially important for breaking news: if the world hasn’t published a source yet, the model must not fabricate. Instead, we display “Unverified” and queue the claim for re-checking once retrieval catches up.
Handling 0-Day Facts (Breaking News)
Here’s the real challenge: what happens if the claim is so fresh that no retrieval source has indexed it yet?
- Fallback status: For breaking news, the safest output is
"Unverified"
with a note like “No reliable references found within the current knowledge window.” - Incremental updates: The claim could be queued for re-verification after X minutes/hours once external knowledge bases refresh.
- Source prioritization: Prefer live sources (wire services, APIs like Associated Press/Reuters) over static knowledge bases for emerging events.
- Transparency to users: Instead of faking certainty, the digest should surface this clearly:
“⚠️ This claim is unverified — it may relate to breaking news. Check back later for updates.”
In other words: an LLM is useful for structuring and reasoning about the fact-check, but truth ultimately depends on retrieval freshness. If the world hasn’t published it yet, the best answer is “we don’t know yet.”
MCP Tool ranker.py
Finally, we need to tailor results for each user. Loretta (Ranker) scores stories based on profile preferences, blocked sources, and topic matches.
def rank_stories(
user_profile: Dict,
summaries: List[TagSummary],
articles: Optional[List[Article]] = None,
) -> Dict[str, List[RankedStory]]:
"""Rank and personalise stories based on the user's interests."""
preferred_topics = {topic.lower() for topic in user_profile.get("preferred_topics", [])}
blocked_sources = {source.lower() for source in user_profile.get("blocked_sources", [])}
article_lookup = {article["id"]: article for article in articles or []}
ranked: List[RankedStory] = []
for summary in summaries:
article_ids = [article_id for article_id in summary["article_ids"] if article_id]
for article_id in article_ids:
article = article_lookup.get(article_id)
if article and article["source"].lower() in blocked_sources:
continue
score = 1.0
reason_parts = [f"Entity: {summary['tag']}"]
if any(topic in summary["category"].lower() for topic in preferred_topics):
score += 1.0
reason_parts.append("Matches preferred topic")
title = summary["tag"]
url = ""
if article:
title = article["title"]
url = article["url"]
reason_parts.append(f"Source: {article['source']}")
ranked.append(
{
"article_id": article_id,
"title": title,
"url": url,
"score": score,
"reason": ", ".join(reason_parts),
}
)
ranked.sort(key=lambda record: record["score"], reverse=True)
return {"ranked_summaries": ranked}
Loretta makes sure your digest isn’t just all news, but your news. If Loretta uses an LLM, the prompt explicitly requires numeric scores and reasons tied to the user profile — e.g., “Matched topic: Finance.” Daisy, when compiling digests, is told to output in strict Markdown or JSON formats. This prevents the LLM from adding “creative” sections that don’t integrate cleanly with delivery.
Again here my personal experience tells me rather than relying on LLMs sole ranking it might be better off to use a dedicated custom ranker model. For example, the custom ranker could be a linear or gradient-boosted scoring model with a formula like:
Score(article,user)=w1⋅ProfileMatch+w2⋅Recency+w3⋅SourceCredibility+w4⋅Novelty
Weights can be tuned manually at first, then learned from click/log data as feedback accumulates.
The biggest advantage over LLM-based ranking is stability: every article gets a reproducible score, and every ranking is explainable. Instead of “the model said so,” you can show the user: “This story ranked higher because it matched your interest in AI, came from a preferred source, and was published recently.”
In practice, this hybrid works best: use embeddings or lightweight ML to compute similarities and preferences, and only call an LLM to generate natural language justifications after the deterministic ranker has made its decision. That way, hallucination never affects ranking itself — it only affects how nicely we explain it.
✅ Now you’ve got the following MCP-compliant tools:
fetch_articles
→ Harriet 🕵️ (Fetcher)extract_passages
→ Clarence ✂️ (Passage Extractor)extract_entities
→ Fiona 🔍 (Entity Extractor)disambiguate_entities
→ Dexter 🧩 (Entity Disambiguator)tag_entities
→ Tess 🏷️ (Entity Tagger)topic_classifier
→ Theo 📚 (Topic Classifier)rank_stories
→ Loretta 🎯 (Ranker)
The LLM functions for each of them could be defined in the llm.py
as show below.
def extract_passages_with_llm(
article_id: str,
content: str,
max_length: int = 320,
model: Optional[str] = None,
) -> List[Dict[str, Any]]:
if not content.strip():
return []
messages: List[PromptMessage] = list(_PASSAGE_SYSTEM_PROMPT)
messages.append(
_message(
"user",
json.dumps(
{
"article_id": article_id,
"max_length": max_length,
"content": content,
}
),
)
)
parsed = _call_json_response(messages, model=model)
raw_passages = parsed.get("passages", [])
if not isinstance(raw_passages, list):
raise RuntimeError("LLM returned passages in unexpected format")
passages: List[Dict[str, Any]] = []
order = 0
for raw in raw_passages:
if isinstance(raw, dict):
text = str(raw.get("text", "")).strip()
elif isinstance(raw, str):
text = raw.strip()
else:
continue
if not text:
continue
for chunk in _chunk_text(text, max_length) or [text]:
cleaned = chunk.strip()
if not cleaned:
continue
order += 1
passages.append(
{
"id": f"{article_id}-p{order}",
"article_id": article_id,
"order": order,
"text": cleaned,
}
)
return passages
def tag_entities_with_llm(
resolved_entities: List[Dict[str, Any]],
model: Optional[str] = None,
) -> List[Dict[str, Any]]:
if not resolved_entities:
return []
messages: List[PromptMessage] = list(_TAGGING_SYSTEM_PROMPT)
messages.append(
_message("user", json.dumps({"resolved_entities": resolved_entities}))
)
parsed = _call_json_response(messages, model=model)
tagged = parsed.get("tagged_entities", [])
return [record for record in tagged if isinstance(record, dict)]
def extract_entities_with_llm(passages: List[Dict[str, Any]], model: Optional[str] = None) -> List[Dict[str, Any]]:
messages: List[PromptMessage] = list(_ENTITY_SYSTEM_PROMPT)
for passage in passages:
messages.append(
_message(
"user",
json.dumps(
{
"passage_id": passage.get("id", ""),
"article_id": passage.get("article_id", ""),
"text": passage.get("text", ""),
}
),
)
)
parsed = _call_json_response(messages, model=model)
entities = parsed.get("entities", [])
return [entity for entity in entities if isinstance(entity, dict)]
def classify_topics_with_llm(
passages: List[Dict[str, Any]],
model: Optional[str] = None,
) -> List[Dict[str, Any]]:
messages: List[PromptMessage] = list(_TOPIC_SYSTEM_PROMPT)
for passage in passages:
messages.append(
_message(
"user",
json.dumps(
{
"passage_id": passage.get("id", ""),
"article_id": passage.get("article_id", ""),
"text": passage.get("text", ""),
}
),
)
)
parsed = _call_json_response(messages, model=model)
topics = parsed.get("topics", [])
return [topic for topic in topics if isinstance(topic, dict)]
def resolve_entities_with_llm(
entities: List[Dict[str, Any]],
context: str,
model: Optional[str] = None,
) -> List[Dict[str, Any]]:
messages: List[PromptMessage] = list(_DISAMBIGUATION_SYSTEM_PROMPT)
messages.append(_message("user", json.dumps({"context": context})))
messages.append(_message("user", json.dumps({"entities": entities})))
parsed = _call_json_response(messages, model=model)
resolved = parsed.get("resolved_entities", [])
return [record for record in resolved if isinstance(record, dict)]
def summarize_tags_with_llm(
tags: List[Dict[str, Any]],
passages: List[Dict[str, Any]],
model: Optional[str] = None,
) -> List[Dict[str, Any]]:
messages: List[PromptMessage] = list(_TAG_SUMMARY_SYSTEM_PROMPT)
messages.append(_message("user", json.dumps({"tags": tags, "passages": passages})))
parsed = _call_json_response(messages, model=model, temperature=0.2)
summaries = parsed.get("tag_summaries", [])
return [summary for summary in summaries if isinstance(summary, dict)]
def fact_check_with_llm(
claims: List[str],
model: Optional[str] = None,
) -> List[Dict[str, Any]]:
messages: List[PromptMessage] = list(_FACT_CHECK_SYSTEM_PROMPT)
messages.append(_message("user", json.dumps({"claims": claims})))
parsed = _call_json_response(messages, model=model, temperature=0.1)
checked = parsed.get("checked_claims", [])
return [item for item in checked if isinstance(item, dict)]
Running MCP server locally
You can use uv
to run the MCP server and the use the tools interactively as shown below.
uv run mcp dev ./server.py
Once you have implemented rest of the agents you would have a folder structure like this
mcp-newsroom/
├── main.py
├── newsroom
│ ├── llm.py
│ └── types.py
├── pyproject.toml
├── README.md
├── resources
│ ├── sample_articles.json
│ └── user_profile_store.py
├── server.py
├── tools
│ ├── compiler.py
│ ├── deliverer.py
│ ├── disambiguator.py
│ ├── entity_extractor.py
│ ├── fact_checker.py
│ ├── fetcher.py
│ ├── passage_extractor.py
│ ├── personalizer.py
│ ├── ranker.py
│ ├── sentiment_analyzer.py
│ ├── tag_summarizer.py
│ ├── tagger.py
│ └── topic_classifier.py
└── uv.lock
Prompts are structured templates (with placeholders + descriptions) that clients can call, just like tools. They’re defined in prompts/
and discoverable via prompts/list
.
Prompt: daily_digest
{
"name": "daily_digest",
"description": "Generate and deliver a personalized daily digest of news articles for a given user.",
"arguments": {
"user_id": {
"type": "string",
"description": "The ID of the user requesting the digest."
},
"topic_filter": {
"type": "string",
"description": "Optional topic filter (e.g. 'AI', 'Finance', 'Sports')."
},
"delivery_channel": {
"type": "string",
"enum": ["email", "slack", "app"],
"description": "Where to send the digest."
}
}
}
The complete article is available here.
General Guardrails Across Agents
- Low temperature (0–0.3) → keeps output consistent and schema-friendly.
- Schema enforcement → JSON schema, Pydantic models, regex checks.
- Fallbacks → deterministic backup logic when LLM fails validation.
- Chain-of-thought steering → instead of free prose, explicitly request:
- Decompose → Analyze → Decide → Output JSON.
- Transparency → store both input prompt + output for audits/debugging.
Prompting Playbook
Not all prompts are created equal. Each agent in our newsroom requires a different style of instruction depending on the problem it is solving. For something deterministic like passage extraction, prompts work best when they are strict and mechanical: “Split this text into passages of no more than 320 characters. Do not cut mid-sentence. Return JSON with id, article_id, order, and text.” This kind of rigid scaffolding keeps the model from wandering and ensures consistency across runs. Below I've summarized the prompting style that might work best for each of our agents. YMMV.
Agent | Prompting Style | Why it Works |
---|---|---|
Passage Extractor (Clarence) | Rigid, rule-based instructions (“Split into ≤320 chars, don’t cut mid-sentence, return JSON…”) | Prevents rambling, ensures passages are uniform and machine-usable |
Entity Extractor (Fiona) | Schema-driven prompts (“Always return JSON with entity, type, span”) | Enforces structure, easy to validate, avoids prose output |
Disambiguator (Dexter) | Reasoning-oriented prompts (“Explain choice, then map to canonical ID”) | Encourages chain-of-thought, reduces blind guessing, still structured |
Tagger (Tess) | Menu prompts (“Choose only from [Person, Org, Product, Location]”) | Restricts creativity, guarantees consistency across runs |
Topic Classifier (Theo) | Closed-set classification (“Pick from predefined newsroom beats”) | Keeps labels stable, avoids taxonomy drift |
Sentiment Analyzer (Sana) | Calibration prompts (“Give stance label + numeric score −1..+1”) | Produces both qualitative and quantitative outputs, stabilizes judgments |
Tag Summarizer (Sumi) | Constrained summarization (“≤3 sentences, max 60 words, only use provided text”) | Prevents hallucinations, keeps summaries concise |
Fact Checker (Frank) | Decomposition prompts (“Break into atomic claims, check each, return status + reference, allow ‘Unverified’”) | Increases reliability, avoids forcing premature verdicts, handles 0-day news |
Ranker (Loretta) | Structured ranking (“Return top N with numeric scores + reasons tied to user profile”) | Transparent ranking, easy to debug and audit |
Compiler (Daisy) | Strict formatting (“Output Markdown/JSON only, no free prose”) | Guarantees digest is parsable and delivery-ready |
Wrapping Up
We started with a vision — personalized news through agents. With MCP, each role became a composable tool, stitched together into a full pipeline: fetching, splitting, extracting, disambiguating, tagging, classifying, ranking, and delivering.