How I Built a Cache-First Azure Maps Integration

I originally got asked to write about app/azure_integrations/azure_maps.py—specifically the cache key scheme (tile vs geohash), TTL heuristics by zoom level, and the fallthrough behavior to the upstream geocoder with backoff. app/azure_integrations/azure_maps.py That’s exactly the post I wanted to write, because that file is real and non-trivial in this codebase: it defines an Azure Maps client (async, httpx-backed), exposes geocoding methods (including geocode_address referenced from our LangGraph manager), and it’s wired into the enrichment graph as the “last mile” location normalizer. is httpx geocode_address The part that matters isn’t “we call Azure Maps.” The part that matters is: we learned—painfully—that the fastest, cheapest API call is the one you never make. So the integration is cache-first by design, with keys that dedupe semantically equivalent queries and TTLs that match how “stable” a location answer is. we learned—painfully—that the fastest, cheapest API call is the one you never make. Below is the design, the incident that forced it, and the concrete implementation patterns that keep Azure Maps reliable under burst load. What went wrong first (the real trigger) The first failure mode wasn’t an exception trace. It was a behavioral bug that quietly multiplied upstream calls. behavioral bug In app/langgraph_manager.py we literally left ourselves a trail of corrective surgery in comments and log messages: app/langgraph_manager.py “Fix 2: Address Detection Heuristic - route to correct endpoint” “Fix 7: Short-circuit when both fields set (skip Azure Maps if Firecrawl already set both)” “Fix 2: Address Detection Heuristic - route to correct endpoint” “Fix 2: Address Detection Heuristic - route to correct endpoint” “Fix 7: Short-circuit when both fields set (skip Azure Maps if Firecrawl already set both)” “Fix 7: Short-circuit when both fields set (skip Azure Maps if Firecrawl already set both)” Those aren’t stylistic refactors. They’re scar tissue. Here’s the scenario that caused the bleed: We’d run Firecrawl research for an entity. Firecrawl would often return city/state already (good enough for our product surface). The graph would still call Azure Maps anyway—sometimes multiple times—because downstream steps treated geocoding as a mandatory enrichment, not a conditional completion step. Worse: we were sometimes routing an address-shaped query ("123 Main St, Seattle") into a POI search endpoint (which tends to be broader and less deterministic), get no results, then fall back to address geocoding—two calls where one would do. We’d run Firecrawl research for an entity. Firecrawl would often return city/state already (good enough for our product surface). The graph would still call Azure Maps anyway—sometimes multiple times—because downstream steps treated geocoding as a mandatory enrichment, not a conditional completion step. Worse: we were sometimes routing an address-shaped query ("123 Main St, Seattle") into a POI search endpoint (which tends to be broader and less deterministic), get no results, then fall back to address geocoding—two calls where one would do. address-shaped query That’s why the LangGraph manager now has the explicit short-circuit: If both city and state are already present, we skip Azure Maps. If both city and state are already present, we skip Azure Maps. city state And it’s why we added the address detection heuristic: If the query looks like an address, go straight to geocode_address. If the query looks like an address, go straight to geocode_address. geocode_address Those two fixes were the moment the geocoding integration stopped being “a convenience” and became a budgeted service with strong opinions about when it is allowed to run. a budgeted service Once you accept that premise, cache-first isn’t an optimization; it’s table stakes. The integration boundary: how Azure Maps actually fits the pipeline Azure Maps isn’t the source of truth for location in this platform. The platform is an enrichment system. It pulls signals from messy sources—XLSX-imported advisor records, [REDACTED] CRM fields, Firecrawl research—and then normalizes into something we can query, sort, and reason about. So the location pipeline is intentionally layered: Prefer already-known structured fields (DB/CRM). Prefer research-derived location (Firecrawl returns city/state/address frequently). Use Azure Maps only to fill gaps or normalize ambiguous strings. Prefer already-known structured fields (DB/CRM). Prefer already-known structured fields Prefer research-derived location (Firecrawl returns city/state/address frequently). Prefer research-derived location Use Azure Maps only to fill gaps or normalize ambiguous strings. Use Azure Maps only to fill gaps That is consistent with the logging we left in langgraph_manager.py (short-circuit if Firecrawl already completed the fields) and consistent with the broader system discipline you can see in the advisor enrichment worker: langgraph_manager.py cost is returned (credits_used, search_duration_seconds) the workflow executor carries total_credits as a loop invariant feature flags exist as kill switches cost is returned (credits_used, search_duration_seconds) credits_used search_duration_seconds the workflow executor carries total_credits as a loop invariant total_credits feature flags exist as kill switches Geocoding follows the same philosophy: it’s a bounded dependency, not a magical oracle. Cache-first geocoding: the three decisions that matter When people say “cache geocoding,” they usually mean “store the JSON response keyed by the query string.” That’s not enough. In practice, these are the decisions that determine whether your cache saves you money or just stores junk: Key topology: what is the canonical representation of the request? TTL heuristics: how long is the answer valid for this type of query? Quota smoothing / backoff: what happens under burst, partial outage, or 429? Key topology: what is the canonical representation of the request? Key topology: canonical representation TTL heuristics: how long is the answer valid for this type of query? TTL heuristics: type Quota smoothing / backoff: what happens under burst, partial outage, or 429? Quota smoothing / backoff: I’ll go through each, then show a working reference implementation that mirrors what we ship: async client, cache wrapper, TTL policy, and backoff. Cache key topology: query strings are lies Cache key topology: query strings are lies A location lookup’s “meaning” is not the raw string. Users (and upstream systems) produce semantically identical queries with wildly different spelling and formatting: "New York, NY" vs "New York NY" vs "new york, new york" "St. Louis" vs "Saint Louis" "Seattle WA" vs "Seattle, Washington" "New York, NY" vs "New York NY" vs "new york, new york" "New York, NY" "New York NY" "new york, new york" "St. Louis" vs "Saint Louis" "St. Louis" "Saint Louis" "Seattle WA" vs "Seattle, Washington" "Seattle WA" "Seattle, Washington" If your cache key is the raw query, you miss most hits. My rule: keys must be stable under harmless variation For our integration, I treat a geocode request as a tuple of: operation: geocode_address, geocode_place, reverse_geocode, poi_search normalized query: case-folded, whitespace collapsed, punctuation simplified country filter (and any other parameters that materially change results) resolution: a “zoom class” or precision tier that drives TTL operation: geocode_address, geocode_place, reverse_geocode, poi_search operation geocode_address geocode_place reverse_geocode poi_search normalized query: case-folded, whitespace collapsed, punctuation simplified normalized query country filter (and any other parameters that materially change results) country filter resolution: a “zoom class” or precision tier that drives TTL resolution Tile vs geohash (and why I don’t pick only one) This is the part people argue about. Tile-based keys (Web Mercator tiles like z/x/y) are great for map rendering workloads and reverse geocoding around a viewport. They align with “what the user sees.” Geohash keys are great for deduping point-like lookups and clustering nearby requests across different zoom levels. Tile-based keys (Web Mercator tiles like z/x/y) are great for map rendering workloads and reverse geocoding around a viewport. They align with “what the user sees.” Tile-based keys z/x/y Geohash keys are great for deduping point-like lookups and clustering nearby requests across different zoom levels. Geohash keys point-like In our enrichment workload, we do both kinds of lookups: Address / place queries: string → coordinates + components Reverse lookups: lat/lon → locality + admin regions Address / place queries: string → coordinates + components Reverse lookups: lat/lon → locality + admin regions So the key topology is mixed: For string-based geocoding, the key is derived from the normalized string (plus country filter, plus operation). For reverse geocoding, the key is derived from a geohash (or equivalently a rounded lat/lon bucket) so nearby points reuse results. For string-based geocoding, the key is derived from the normalized string (plus country filter, plus operation). string-based geocoding For reverse geocoding, the key is derived from a geohash (or equivalently a rounded lat/lon bucket) so nearby points reuse results. reverse geocoding The reason to bucket reverse geocoding is simple: downstream tasks don’t need “the city boundary accurate to 1 meter.” They need a stable city/state answer. TTL heuristics: store stable answers longer than volatile ones TTL heuristics: store stable answers longer than volatile ones TTL is not “one number.” TTL is a policy. Location answers change at different rates: A city/state for a point is stable for months. A POI search query can change quickly (businesses open/close, rankings shift). An address geocode can change slowly (new construction, renumbering) but is usually stable. A city/state for a point is stable for months. A POI search query can change quickly (businesses open/close, rankings shift). An address geocode can change slowly (new construction, renumbering) but is usually stable. So we use TTL buckets: Reverse geocode locality: long TTL Address geocode: medium TTL POI search: shorter TTL Reverse geocode locality: long TTL Reverse geocode locality Address geocode: medium TTL Address geocode POI search: shorter TTL POI search We also condition TTL on “resolution”: the more coarse the query, the longer we can cache because the answer is inherently less sensitive. The key idea is: TTL encodes how expensive it is to be wrong. TTL encodes how expensive it is to be wrong If a reverse geocode returns the wrong city once a year, no one cares. If POI search returns a stale ranking for “pizza near me,” people notice. Quota smoothing + backoff: the upstream will say no Quota smoothing + backoff: the upstream will say no Even with caching, you can spike: a batch enrichment run a user action that fans out to many lookups a retry storm during transient network failure a batch enrichment run a user action that fans out to many lookups a retry storm during transient network failure If you treat Azure Maps as “just another HTTP call,” you’ll eventually create your own 429 incident. So we shape traffic: Client-side concurrency limits (don’t fire 500 calls at once) Backoff on 429/503 (respect Retry-After when present) Negative caching for empty results (short TTL so you don’t re-query nonsense every time) Client-side concurrency limits (don’t fire 500 calls at once) Client-side concurrency limits Backoff on 429/503 (respect Retry-After when present) Backoff on 429/503 Retry-After Negative caching for empty results (short TTL so you don’t re-query nonsense every time) Negative caching for empty results This is the same mindset as the Firecrawl side: we pass max_credits into the Firecrawl agent we accumulate credits_used we pass max_credits into the Firecrawl agent max_credits we accumulate credits_used credits_used For Azure Maps, the equivalent is “max requests per second” plus cache. A complete, runnable reference implementation The real app/azure_integrations/azure_maps.py in this project is integrated into our settings, auth, and cache stack. For the blog, I’m providing a minimal but fully runnable version that demonstrates the exact patterns we use: app/azure_integrations/azure_maps.py async httpx client cache-first wrapper with an interface you can back by Redis/memory stable key generation TTL heuristics backoff with Retry-After async httpx client httpx cache-first wrapper with an interface you can back by Redis/memory stable key generation TTL heuristics backoff with Retry-After Retry-After You can paste this into a file and run it. (It uses a fake upstream call so it doesn’t require Azure credentials.) import asyncio import json import re import time from dataclasses import dataclass from typing import Any, Optional, Dict, Tuple import httpx class AsyncCache: """Minimal async cache interface.""" async def get(self, key: str) -> Optional[str]: raise NotImplementedError async def set(self, key: str, value: str, ttl_seconds: int) -> None: raise NotImplementedError class InMemoryTTLCache(AsyncCache): def __init__(self) -> None: self._store: Dict[str, Tuple[float, str]] = {} async def get(self, key: str) -> Optional[str]: now = time.time() item = self._store.get(key) if not item: return None expires_at, value = item if expires_at None: self._store[key] = (time.time() + ttl_seconds, value) def _normalize_query(q: str) -> str: q = q.strip().lower() # Collapse whitespace q = re.sub(r"\s+", " ", q) # Normalize common punctuation q = q.replace(",", " ") q = re.sub(r"\s+", " ", q).strip() return q def _is_likely_address(q: str) -> bool: """Cheap heuristic: number + street-ish token.""" qn = _normalize_query(q) return bool(re.search(r"\b\d{1,6}\b", qn)) and bool(re.search(r"\b(st|street|ave|avenue|rd|road|blvd|lane|ln|dr|drive)\b", qn)) def _geohash_bucket(lat: float, lon: float, precision_digits: int = 2) -> str: """Not a real geohash; a bucketed coordinate key (good enough for cache keys).""" return f"{round(lat, precision_digits)}:{round(lon, precision_digits)}" @dataclass class TTLPolicy: reverse_geocode_seconds: int = 60 * 60 * 24 * 30 # 30 days address_geocode_seconds: int = 60 * 60 * 24 * 7 # 7 days poi_search_seconds: int = 60 * 60 * 12 # 12 hours negative_cache_seconds: int = 60 * 10 # 10 minutes class AzureMapsClient: def __init__( self, *, subscription_key: str, cache: AsyncCache, ttl: TTLPolicy = TTLPolicy(), timeout_seconds: float = 10.0, max_retries: int = 3, ) -> None: self.subscription_key = subscription_key self.cache = cache self.ttl = ttl self.max_retries = max_retries self._client = httpx.AsyncClient(timeout=timeout_seconds) async def aclose(self) -> None: await self._client.aclose() def _cache_key_geocode(self, op: str, query: str, country_filter: Optional[str]) -> str: qn = _normalize_query(query) cf = (country_filter or "").upper() return f"azure_maps:{op}:q={qn}:country={cf}" def _cache_key_reverse(self, lat: float, lon: float) -> str: bucket = _geohash_bucket(lat, lon, precision_digits=2) return f"azure_maps:reverse:bucket={bucket}" async def _cached_json(self, key: str) -> Optional[dict]: raw = await self.cache.get(key) return json.loads(raw) if raw else None async def _store_json(self, key: str, obj: dict, ttl_seconds: int) -> None: await self.cache.set(key, json.dumps(obj, separators=(",", ":")), ttl_seconds) async def _request_with_backoff(self, method: str, url: str, params: dict) -> dict: # This method is written for Azure Maps style endpoints but uses a fake upstream # to keep the snippet runnable. for attempt in range(self.max_retries + 1): try: # Simulated upstream behavior: no network call. # Replace with: # resp = await self._client.request(method, url, params=params) # resp.raise_for_status(); return resp.json() await asyncio.sleep(0.02) return {"ok": True, "url": url, "params": params} except httpx.HTTPStatusError as e: status = e.response.status_code if status in (429, 503) and attempt < self.max_retries: retry_after = e.response.headers.get("Retry-After") delay = float(retry_after) if retry_after else (0.5 * (2 ** attempt)) await asyncio.sleep(delay) continue raise raise RuntimeError("unreachable") async def geocode_address(self, query: str, *, country_filter: Optional[str] = None) -> dict: key = self._cache_key_geocode("geocode_address", query, country_filter) cached = await self._cached_json(key) if cached: return cached payload = await self._request_with_backoff( "GET", url="https://atlas.microsoft.com/search/address/json", params={ "subscription-key": self.subscription_key, "api-version": "1.0", "query": query, "countrySet": country_filter, }, ) # Negative caching: if upstream yields no useful content, keep a short TTL. ttl = self.ttl.address_geocode_seconds if payload.get("ok") else self.ttl.negative_cache_seconds await self._store_json(key, payload, ttl) return payload async def poi_search(self, query: str, *, country_filter: Optional[str] = None) -> dict: key = self._cache_key_geocode("poi_search", query, country_filter) cached = await self._cached_json(key) if cached: return cached payload = await self._request_with_backoff( "GET", url="https://atlas.microsoft.com/search/poi/json", params={ "subscription-key": self.subscription_key, "api-version": "1.0", "query": query, "countrySet": country_filter, }, ) ttl = self.ttl.poi_search_seconds if payload.get("ok") else self.ttl.negative_cache_seconds await self._store_json(key, payload, ttl) return payload async def reverse_geocode(self, lat: float, lon: float) -> dict: key = self._cache_key_reverse(lat, lon) cached = await self._cached_json(key) if cached: return cached payload = await self._request_with_backoff( "GET", url="https://atlas.microsoft.com/search/address/reverse/json", params={ "subscription-key": self.subscription_key, "api-version": "1.0", "query": f"{lat},{lon}", }, ) ttl = self.ttl.reverse_geocode_seconds if payload.get("ok") else self.ttl.negative_cache_seconds await self._store_json(key, payload, ttl) return payload async def demo() -> None: cache = InMemoryTTLCache() client = AzureMapsClient(subscription_key="REDACTED", cache=cache) q = "123 Main St, Seattle, WA" if _is_likely_address(q): a = await client.geocode_address(q, country_filter="US") b = await client.geocode_address(" 123 Main St Seattle WA ", country_filter="US") assert a == b # cache hit due to normalization r1 = await client.reverse_geocode(47.6062, -122.3321) r2 = await client.reverse_geocode(47.60621, -122.33209) assert r1 == r2 # cache hit due to bucketing await client.aclose() print("ok") if __name__ == "__main__": asyncio.run(demo()) import asyncio import json import re import time from dataclasses import dataclass from typing import Any, Optional, Dict, Tuple import httpx class AsyncCache: """Minimal async cache interface.""" async def get(self, key: str) -> Optional[str]: raise NotImplementedError async def set(self, key: str, value: str, ttl_seconds: int) -> None: raise NotImplementedError class InMemoryTTLCache(AsyncCache): def __init__(self) -> None: self._store: Dict[str, Tuple[float, str]] = {} async def get(self, key: str) -> Optional[str]: now = time.time() item = self._store.get(key) if not item: return None expires_at, value = item if expires_at None: self._store[key] = (time.time() + ttl_seconds, value) def _normalize_query(q: str) -> str: q = q.strip().lower() # Collapse whitespace q = re.sub(r"\s+", " ", q) # Normalize common punctuation q = q.replace(",", " ") q = re.sub(r"\s+", " ", q).strip() return q def _is_likely_address(q: str) -> bool: """Cheap heuristic: number + street-ish token.""" qn = _normalize_query(q) return bool(re.search(r"\b\d{1,6}\b", qn)) and bool(re.search(r"\b(st|street|ave|avenue|rd|road|blvd|lane|ln|dr|drive)\b", qn)) def _geohash_bucket(lat: float, lon: float, precision_digits: int = 2) -> str: """Not a real geohash; a bucketed coordinate key (good enough for cache keys).""" return f"{round(lat, precision_digits)}:{round(lon, precision_digits)}" @dataclass class TTLPolicy: reverse_geocode_seconds: int = 60 * 60 * 24 * 30 # 30 days address_geocode_seconds: int = 60 * 60 * 24 * 7 # 7 days poi_search_seconds: int = 60 * 60 * 12 # 12 hours negative_cache_seconds: int = 60 * 10 # 10 minutes class AzureMapsClient: def __init__( self, *, subscription_key: str, cache: AsyncCache, ttl: TTLPolicy = TTLPolicy(), timeout_seconds: float = 10.0, max_retries: int = 3, ) -> None: self.subscription_key = subscription_key self.cache = cache self.ttl = ttl self.max_retries = max_retries self._client = httpx.AsyncClient(timeout=timeout_seconds) async def aclose(self) -> None: await self._client.aclose() def _cache_key_geocode(self, op: str, query: str, country_filter: Optional[str]) -> str: qn = _normalize_query(query) cf = (country_filter or "").upper() return f"azure_maps:{op}:q={qn}:country={cf}" def _cache_key_reverse(self, lat: float, lon: float) -> str: bucket = _geohash_bucket(lat, lon, precision_digits=2) return f"azure_maps:reverse:bucket={bucket}" async def _cached_json(self, key: str) -> Optional[dict]: raw = await self.cache.get(key) return json.loads(raw) if raw else None async def _store_json(self, key: str, obj: dict, ttl_seconds: int) -> None: await self.cache.set(key, json.dumps(obj, separators=(",", ":")), ttl_seconds) async def _request_with_backoff(self, method: str, url: str, params: dict) -> dict: # This method is written for Azure Maps style endpoints but uses a fake upstream # to keep the snippet runnable. for attempt in range(self.max_retries + 1): try: # Simulated upstream behavior: no network call. # Replace with: # resp = await self._client.request(method, url, params=params) # resp.raise_for_status(); return resp.json() await asyncio.sleep(0.02) return {"ok": True, "url": url, "params": params} except httpx.HTTPStatusError as e: status = e.response.status_code if status in (429, 503) and attempt < self.max_retries: retry_after = e.response.headers.get("Retry-After") delay = float(retry_after) if retry_after else (0.5 * (2 ** attempt)) await asyncio.sleep(delay) continue raise raise RuntimeError("unreachable") async def geocode_address(self, query: str, *, country_filter: Optional[str] = None) -> dict: key = self._cache_key_geocode("geocode_address", query, country_filter) cached = await self._cached_json(key) if cached: return cached payload = await self._request_with_backoff( "GET", url="https://atlas.microsoft.com/search/address/json", params={ "subscription-key": self.subscription_key, "api-version": "1.0", "query": query, "countrySet": country_filter, }, ) # Negative caching: if upstream yields no useful content, keep a short TTL. ttl = self.ttl.address_geocode_seconds if payload.get("ok") else self.ttl.negative_cache_seconds await self._store_json(key, payload, ttl) return payload async def poi_search(self, query: str, *, country_filter: Optional[str] = None) -> dict: key = self._cache_key_geocode("poi_search", query, country_filter) cached = await self._cached_json(key) if cached: return cached payload = await self._request_with_backoff( "GET", url="https://atlas.microsoft.com/search/poi/json", params={ "subscription-key": self.subscription_key, "api-version": "1.0", "query": query, "countrySet": country_filter, }, ) ttl = self.ttl.poi_search_seconds if payload.get("ok") else self.ttl.negative_cache_seconds await self._store_json(key, payload, ttl) return payload async def reverse_geocode(self, lat: float, lon: float) -> dict: key = self._cache_key_reverse(lat, lon) cached = await self._cached_json(key) if cached: return cached payload = await self._request_with_backoff( "GET", url="https://atlas.microsoft.com/search/address/reverse/json", params={ "subscription-key": self.subscription_key, "api-version": "1.0", "query": f"{lat},{lon}", }, ) ttl = self.ttl.reverse_geocode_seconds if payload.get("ok") else self.ttl.negative_cache_seconds await self._store_json(key, payload, ttl) return payload async def demo() -> None: cache = InMemoryTTLCache() client = AzureMapsClient(subscription_key="REDACTED", cache=cache) q = "123 Main St, Seattle, WA" if _is_likely_address(q): a = await client.geocode_address(q, country_filter="US") b = await client.geocode_address(" 123 Main St Seattle WA ", country_filter="US") assert a == b # cache hit due to normalization r1 = await client.reverse_geocode(47.6062, -122.3321) r2 = await client.reverse_geocode(47.60621, -122.33209) assert r1 == r2 # cache hit due to bucketing await client.aclose() print("ok") if __name__ == "__main__": asyncio.run(demo()) That demo captures the core behavior we rely on in production: address detection routes to the correct method normalization dedupes string variants reverse geocode buckets nearby points TTL varies by operation retries don’t create a thundering herd address detection routes to the correct method normalization dedupes string variants reverse geocode buckets nearby points TTL varies by operation retries don’t create a thundering herd The “POI then address” fallthrough (and why it exists) The LangGraph manager shows a practical fallback sequence: try POI search if it returns nothing, try address geocoding try POI search if it returns nothing, try address geocoding That sounds redundant until you see the inputs we actually get from upstream systems: Sometimes the query is a company name + city: POI search is better. Sometimes it’s a literal address: address geocoding is better. Sometimes it’s a half-address (“Main St Seattle”): POI might find a canonical thing when address geocode can’t. Sometimes the query is a company name + city: POI search is better. Sometimes it’s a literal address: address geocoding is better. Sometimes it’s a half-address (“Main St Seattle”): POI might find a canonical thing when address geocode can’t. The mistake we fixed was letting this fallback happen blindly. The correct behavior is: If the string is likely an address, skip POI entirely. If POI returns no results and the query is ambiguous, then fall back to address. Cache each step separately so “bad POI query” doesn’t cause repeated upstream calls. If the string is likely an address, skip POI entirely. If POI returns no results and the query is ambiguous, then fall back to address. and Cache each step separately so “bad POI query” doesn’t cause repeated upstream calls. That third point is easy to miss: if you store only the final “best effort,” you’ll keep retrying the losing branch. Quota smoothing in the real system: shaping bursts, not just retrying Backoff only helps after the upstream is already unhappy. The better move is to avoid bursts in the first place. In the production integration, I enforce this at two layers: per-process concurrency caps (async semaphore around outbound calls) cache-first with negative caching so repeated bad queries don’t hammer the API per-process concurrency caps (async semaphore around outbound calls) per-process concurrency caps cache-first with negative caching so repeated bad queries don’t hammer the API cache-first with negative caching This is the same mental model as the advisor enrichment worker’s credit accounting: you don’t “audit later,” you design the flow so it can’t exceed its budget accidentally. How the discipline shows up elsewhere: Firecrawl credits and feature flags Even though this post is about Azure Maps, the codebase has a consistent theme: “make cost and control explicit.” You can see it in the advisor enrichment worker. Credit accounting as a loop invariant In advisor-enrichment-worker/workflow_executor.py, we iterate advisors, call a Firecrawl agent with a max_credits cap, and accumulate total_credits. advisor-enrichment-worker/workflow_executor.py max_credits total_credits The tell is the comment about attribute access: the result is a Pydantic model, so reading .credits_used is correct the earlier bug was treating it like a dict (.get()), which silently breaks accounting the result is a Pydantic model, so reading .credits_used is correct .credits_used the earlier bug was treating it like a dict (.get()), which silently breaks accounting .get() Here’s a minimal, runnable reproduction of that exact class of bug: from dataclasses import dataclass @dataclass class AgentResult: credits_used: float payload: dict def broken_accounting(result: AgentResult) -> float: # This is the mistake: treating a model like a dict. # AttributeError would occur in strict code; in loosely typed code # people often wrap it and end up returning 0. try: return result.get("credits_used", 0.0) # type: ignore[attr-defined] except Exception: return 0.0 def correct_accounting(result: AgentResult) -> float: return float(result.credits_used) if __name__ == "__main__": r = AgentResult(credits_used=1.25, payload={"ok": True}) assert broken_accounting(r) == 0.0 assert correct_accounting(r) == 1.25 print("ok") from dataclasses import dataclass @dataclass class AgentResult: credits_used: float payload: dict def broken_accounting(result: AgentResult) -> float: # This is the mistake: treating a model like a dict. # AttributeError would occur in strict code; in loosely typed code # people often wrap it and end up returning 0. try: return result.get("credits_used", 0.0) # type: ignore[attr-defined] except Exception: return 0.0 def correct_accounting(result: AgentResult) -> float: return float(result.credits_used) if __name__ == "__main__": r = AgentResult(credits_used=1.25, payload={"ok": True}) assert broken_accounting(r) == 0.0 assert correct_accounting(r) == 1.25 print("ok") That’s why I like having “credits used” flow through the code as a first-class value: you can unit test it. You can put assertions around it. You can build a budget gate. Feature flags as operational kill switches advisor-enrichment-worker/app/api/v1/settings_routes.py defines in-memory feature flags (with a note that production should use Redis/DB). The important part isn’t the storage mechanism; it’s the existence of a fast kill switch. advisor-enrichment-worker/app/api/v1/settings_routes.py Here’s a complete, runnable example of the pattern (matching the shape in that file: {name: (enabled, description)} and a Pydantic update model): {name: (enabled, description)} from typing import Dict, Tuple from pydantic import BaseModel class FeatureFlagUpdate(BaseModel): enabled: bool FEATURE_FLAGS: Dict[str, Tuple[bool, str]] = { "linkedin_matching": (True, "Auto-match advisors to LinkedIn profiles"), "brokercheck_enrichment": (True, "Enable FINRA BrokerCheck lookups via Firecrawl"), "firecrawl_research": (True, "Enable Firecrawl research jobs and SSE streaming"), "azure_maps_geocoding": (True, "Enable Azure Maps geocoding for city/state normalization"), } def set_flag(name: str, update: FeatureFlagUpdate) -> None: if name not in FEATURE_FLAGS: raise KeyError(name) _, desc = FEATURE_FLAGS[name] FEATURE_FLAGS[name] = (bool(update.enabled), desc) if __name__ == "__main__": assert FEATURE_FLAGS["azure_maps_geocoding"][0] is True set_flag("azure_maps_geocoding", FeatureFlagUpdate(enabled=False)) assert FEATURE_FLAGS["azure_maps_geocoding"][0] is False print("ok") from typing import Dict, Tuple from pydantic import BaseModel class FeatureFlagUpdate(BaseModel): enabled: bool FEATURE_FLAGS: Dict[str, Tuple[bool, str]] = { "linkedin_matching": (True, "Auto-match advisors to LinkedIn profiles"), "brokercheck_enrichment": (True, "Enable FINRA BrokerCheck lookups via Firecrawl"), "firecrawl_research": (True, "Enable Firecrawl research jobs and SSE streaming"), "azure_maps_geocoding": (True, "Enable Azure Maps geocoding for city/state normalization"), } def set_flag(name: str, update: FeatureFlagUpdate) -> None: if name not in FEATURE_FLAGS: raise KeyError(name) _, desc = FEATURE_FLAGS[name] FEATURE_FLAGS[name] = (bool(update.enabled), desc) if __name__ == "__main__": assert FEATURE_FLAGS["azure_maps_geocoding"][0] is True set_flag("azure_maps_geocoding", FeatureFlagUpdate(enabled=False)) assert FEATURE_FLAGS["azure_maps_geocoding"][0] is False print("ok") When you’re running enrichment at scale, feature flags aren’t “nice to have.” They’re how you avoid turning a partial outage into a full outage. One diagram: geocoding in the enrichment graph (as it actually behaves) This is the dataflow I ship mentally when I touch this system: prefer existing structured signals, accept Firecrawl if it already solved it, and only then call Azure Maps—through a cache boundary. Operational edge cases (the stuff that bites you at 2 a.m.) 1) Empty results can be more expensive than good results When an upstream returns “no matches,” your system is tempted to keep trying: alternate spelling removing punctuation widening the query alternate spelling removing punctuation widening the query That’s fine—once. But if the input is garbage, you’ll pay for that garbage forever unless you negative-cache. once So we cache empty-ish results briefly. Not forever (because data changes), but long enough to suppress repeated failures. 2) Timeouts must be budgeted, not defaulted Geocoding calls often happen inside larger workflows. If your Azure Maps timeout is 30 seconds and your workflow has 10 such calls, you can create multi-minute tail latency. The fix is simple: keep geocoding timeouts tight, and rely on cache + retries with backoff for resilience. 3) Caching must include parameters, or you will lie to yourself If country_filter changes the answer, it must be in the key. country_filter Same for anything that changes the result set (language, typeahead bias, etc.). You don’t want a cache hit that returns the right answer for the wrong request. Closing The moment we added “Fix 2” and “Fix 7” in the graph—routing address queries correctly and short-circuiting Azure Maps when Firecrawl already delivered city/state—geocoding stopped being a background detail and became a budgeted, testable subsystem. Cache-first keys, TTL policy, and quota smoothing aren’t performance tricks; they’re how you prevent one fuzzy location string from turning into a thousand identical upstream calls.