From monitoring vans scanning 9 kHz–8.5 GHz with readiness dashboards to instrument first refactors at GEICO: how observability before change prevents outages and makes modernizations predictable. before When reliability is non-negotiable, you don't start by changing code; you begin by measuring reality. That was true when Sheriff Adepoju helped bring a national spectrum monitoring program online, and it proved just as true when he later led open-source modernizations inside a major U.S. insurer. The connective tissue across those worlds is an instrument to migrate discipline: make the system observable, set explicit reliability targets, classify failure modes, and only then touch the architecture. Done right, migrations stop feeling like cliff jumps and start looking like controlled crossings. Measure everything Measure everything On the spectrum side, readiness hinged on what the vans and the National Control Center behind them could see. Field teams commissioned mobile monitoring units, verified antenna chains, and stood up dashboards that told operators whether each link in the 9 kHz–8.5 GHz chain was healthy before they tried to enforce policy at scale. Readiness KPIs were concrete: uptime of RF front-ends, calibration drift, GPS lock quality, data latency from vans to the control room, and alert-to-acknowledge times. Those numbers didn't just please auditors; they told engineers which changes were safe. see before Translate that mindset to enterprise apps and the first move becomes equally obvious: define service SLOs, track P99 latencies, and establish error budgets before a line of migration code lands. At GEICO, Adepoju's team instrumented the existing stack to develop a behavioral baseline: endpoint P99s and P999s, request and job throughput, cold start vs. warm path timings, and saturations on data stores. Only after the numbers settled did the team begin the React/Django/PostgreSQL cutover, with SLOs and error budgets serving as the migration guardrails. The result was a migration plan that optimized what users felt instead of what diagrams promised, and cost and throughput deltas could be tied back to SLO movement, not vibes. before Build a fault taxonomy Build a fault taxonomy Every "the radio doesn't work" page in spectrum operations hides a specific physics problem: front-end overload, adjacent channel interference, intermodulation, antenna feed issues, or plain old operator error. The team wrote a fault taxonomy that turned those phenomena into typed diagnoses with standard run books and escalations. Operators weren't just told that something broke; they saw what kind of break it was and what first steps to take. what kind Enterprise apps need the same precision. Instead of an undifferentiated “500,” build a typed error system that separates policy errors (validation, authorization), resource errors (timeouts, back pressure), integration errors (schema drift, contract failures), and unknowns. Adepoju’s migration work paired typed exceptions with automated triage: logs enriched with correlation IDs, request context, and a fault class label; alert routes keyed to the class (SRE on call for resource saturation, product owner for policy rejections). That classification didn't just shrink MTTR; it also kept error budgets honest by preventing policy bugs from masquerading as availability issues. It's one reason the team reported thousands of avoided manual "app passes" and, paired with the stack change, six-figure annual savings once the new path stabilized. Operational readiness comes first Operational readiness comes first Before any van rolled, spectrum teams ran factory inspections and commissioning: witness tests, acceptance criteria, documentation checks, and operator training. The habits transfer neatly to software: Pre prod canaries. Route a sliver of real traffic to the new path while keeping blast radius bounded.Smoke tests. Automate end-to-end probes that mimic the highest value user journeys; fail fast if anything regresses.Rollback drills. Practice "reversions" with production-like data so the playbook is muscle memory and not a PDF. Pre prod canaries. Route a sliver of real traffic to the new path while keeping blast radius bounded. Smoke tests. Automate end-to-end probes that mimic the highest value user journeys; fail fast if anything regresses. Rollback drills. Practice "reversions" with production-like data so the playbook is muscle memory and not a PDF. Cutovers were scheduled against explicit “no downtime" windows periods where the team had enough headroom to fail safe—and the go/no go was tied to SLO burn rate, not feelings. When a guardrail tripped, the rollout paused, the fault class told you who should act, and the rollback drill made reversal routine. It's an operational theater without the drama. without What to show your stakeholders (and your future self) What to show your stakeholders (and your future self) The instrument then migrates, and the approach pays off in credibility because you can show the system getting better: Pre/post SLO dashboards. Graph the user-facing SLOs (availability, latency) for old and new paths with explicit annotation of change windows.Regression budget graphs. Visualize budget burn during canaries and the full cutover; prove you never exceeded the allowed risk.Cutover timeline. A one-pager with decision points, defense in depth (canaries, feature flags), and no downtime windows.Cost and throughput deltas. Present the dollars and requests per second side by side; tie savings and capacity gains to the observability record rather than architecture hand waving. At GEICO, the modernization produced a well-documented net annual saving on the order of $590,000, an outcome made presentable precisely because the team could trace the gain to measurable changes. Pre/post SLO dashboards. Graph the user-facing SLOs (availability, latency) for old and new paths with explicit annotation of change windows. Regression budget graphs. Visualize budget burn during canaries and the full cutover; prove you never exceeded the allowed risk. Cutover timeline. A one-pager with decision points, defense in depth (canaries, feature flags), and no downtime windows. Cost and throughput deltas. Present the dollars and requests per second side by side; tie savings and capacity gains to the observability record rather than architecture hand waving. At GEICO, the modernization produced a well-documented net annual saving on the order of $590,000, an outcome made presentable precisely because the team could trace the gain to measurable changes. Why this works across domains It’s tempting to treat spectrum monitoring, enterprise insurance apps, and cloud support automation as unrelated. They aren’t. In each case, the system’s most dangerous enemy is change under uncertainty. Instrumentation shrinks that uncertainty; taxonomies channel it; rehearsed operations fence it in. Adepoju’s throughline from LiviaSoft’s national scale spectrum program to GEICO's regulated, customer facing workloads shows that the method travels well: build the measuring sticks first, then move. A note to teams planning their cutover If you inherit a legacy line of business app and feel pressure to "just rewrite it," resist. Put the monitors first, name your failures, and set your budgets. You'll discover problems you can fix without a rewrite and, when a rewrite is warranted, you’ll execute in weeks what would otherwise sprawl across quarters. Most importantly, you’ll migrate safely, with users noticing that things have only gotten faster. without safely

Walkthroughs, tutorials, guides, and tips. This story will teach you how to do something new or how to do something better.

This story contains new, firsthand information uncovered by the writer.

This story contains AI-generated text. The author has used AI either for research, to generate outlines, or write the text itself. 

Every

What Are Public and Private Keys: An Intro to the Mathematics of Crypto Security

Instrument, Then Migrate: Observability Lessons From Mobile Monitoring Vans to Fortune-100 Apps

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

3 AI and Analytics Solutions Startups and SMEs Should Use Right Now

100 Days of AI Day 7: Building Your Own ChatGPT with Langchain

100 Days of AI Day 4: Maximizing Productivity & Creativity with ChatGPT

100 Days of AI Day 6: Retrieval Techniques and Their Use Cases

10 Tips to Get the Most out of ChatGPT

100 Days of AI Day 2: Enhancing Prompt Engineering for ChatGPT

3 AI and Analytics Solutions Startups and SMEs Should Use Right Now

100 Days of AI Day 7: Building Your Own ChatGPT with Langchain

100 Days of AI Day 4: Maximizing Productivity & Creativity with ChatGPT

100 Days of AI Day 6: Retrieval Techniques and Their Use Cases

10 Tips to Get the Most out of ChatGPT

100 Days of AI Day 2: Enhancing Prompt Engineering for ChatGPT

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps