A dashboard was the reason I started this project. We had two “trusted” views of the same student-loan population. One was in a KPI dashboard. The other was in a downstream report used for internal reviews. They disagreed—by enough that people started doing what teams always do in that moment: debating definitions, refreshing extracts, screenshotting filters, and arguing about which number “felt right.” That’s when it clicked: we didn’t have a measurement problem. We had a control problem. The data wasn’t missing. It was untrustworthy. we didn’t have a measurement problem. We had a control problem. And in student loans, “untrustworthy” isn’t an analytics inconvenience. It can become borrower harm and compliance exposure fast. So I built a controls layer in Snowflake and used Tableau as the triage console—not the brains. My thesis (say this upfront, because it’s the whole point) Most data teams confuse dashboards with controls. Dashboards summarize.Controls produce evidence.If your “data quality” system can’t output which borrower / which record / which event sequence broke the rule, you don’t have a control—you have a metric that’s going to get ignored. Dashboards summarize. Controls produce evidence. If your “data quality” system can’t output which borrower / which record / which event sequence broke the rule, you don’t have a control—you have a metric that’s going to get ignored. Two principles ran the whole design: If you don’t output evidence rows, you have metrics—not a control.If you don’t model effective dates and event order deterministically, you build a bug amplifier. If you don’t output evidence rows, you have metrics—not a control. If you don’t model effective dates and event order deterministically, you build a bug amplifier. Everything else—Snowflake, Tableau, window functions—was implementation detail. Why student-loan data breaks “quietly” Student-loan systems are fundamentally temporal. “Truth” depends on what was effective when, not what’s currently in the table. what was effective when A record can look perfectly reasonable in isolation and still be wrong in a timeline. The classic failures that kept showing up for us were: Benefit rates changing only because an enrollment status arrived lateDuplicate events that looked harmless until jobs assumed uniquenessPayment adjustments applied twice during replayStatus changes arriving via mixed vendor feeds Benefit rates changing only because an enrollment status arrived late Duplicate events that looked harmless until jobs assumed uniqueness Payment adjustments applied twice during replay Status changes arriving via mixed vendor feeds And the uncomfortable part: those issues often don’t move top-line KPIs right away. They just sit there until the wrong borrower gets the wrong outcome—or until an audit asks you to prove your logic. That’s why I stopped trying to “perfect the dashboard.” I started building controls that could answer questions like: Did a borrower’s benefit rate change when it shouldn’t?Did enrollment status move backward without a valid transition?Do payment adjustments net to zero when policy requires?Do interest calculations match an independent recomputation? Did a borrower’s benefit rate change when it shouldn’t? Did enrollment status move backward without a valid transition? Do payment adjustments net to zero when policy requires? Do interest calculations match an independent recomputation? Repeatable. Queryable. Auditable. The architecture (kept intentionally boring) I treated controls like a product, not a report. The system had three layers: Curated “truth” tables in Snowflake: Temporal by design: effective windows, source timestamps, and deterministic tie-breakers.Control queries that produce findings (rows, not counts): Every control writes to a findings table: who failed, what failed, when, severity, and the evidence fields needed to debug(A count isn’t a control. It’s a symptom.). Tableau views for monitoring and triage: Tableau displays findings, trends, owners, SLAs, and drilldowns. It does not compute the truth. Curated “truth” tables in Snowflake: Temporal by design: effective windows, source timestamps, and deterministic tie-breakers. Control queries that produce findings (rows, not counts): Every control writes to a findings table: who failed, what failed, when, severity, and the evidence fields needed to debug(A count isn’t a control. It’s a symptom.). Tableau views for monitoring and triage: Tableau displays findings, trends, owners, SLAs, and drilldowns. It does not compute the truth. not This separation mattered because it prevented a failure mode I’ve seen repeatedly: someone “fixes” a control by editing a Tableau calculated field or filter logic. That’s not a fix. That’s sweeping the evidence under the rug. The Snowflake patterns that made the controls reliable 1) Deterministic “latest record” selection (stop trusting max(timestamp)) Early false positives came from lazy max(timestamp) joins. When two rows share the same timestamp, your control becomes nondeterministic—meaning it can flip results between runs with no underlying data change. That destroys trust, and once a control loses trust, it stops being used. max(timestamp) What fixed it was treating “latest” as an ordering problem and always enforcing a stable tie-breaker: SELECT * FROM curated_enrollment_events QUALIFY ROW_NUMBER() OVER ( PARTITION BY borrower_id, school_id ORDER BY source_event_ts DESC, ingestion_ts DESC, event_id DESC ) = 1; SELECT * FROM curated_enrollment_events QUALIFY ROW_NUMBER() OVER ( PARTITION BY borrower_id, school_id ORDER BY source_event_ts DESC, ingestion_ts DESC, event_id DESC ) = 1; That last tie-breaker matters more than people want to admit. If you can’t guarantee stable output, you can’t guarantee stable controls. (If you want the official reference for QUALIFY, Snowflake has a good one: https://docs.snowflake.com/en/sql-reference/constructs/qualify) QUALIFY https://docs.snowflake.com/en/sql-reference/constructs/qualify 2) Temporal consistency checks (validate transitions, not snapshots) Enrollment status was the cleanest example. Borrowers progress through statuses, but not every transition is valid. So instead of checking “what status is it today,” I checked the timeline: each event against the prior event. the timeline WITH ordered AS ( SELECT borrower_id, school_id, status, effective_dt, LAG(status) OVER ( PARTITION BY borrower_id, school_id ORDER BY effective_dt, source_event_ts, event_id ) AS prev_status FROM curated_enrollment_events ) SELECT * FROM ordered WHERE prev_status IS NOT NULL AND NOT is_allowed_transition(prev_status, status); WITH ordered AS ( SELECT borrower_id, school_id, status, effective_dt, LAG(status) OVER ( PARTITION BY borrower_id, school_id ORDER BY effective_dt, source_event_ts, event_id ) AS prev_status FROM curated_enrollment_events ) SELECT * FROM ordered WHERE prev_status IS NOT NULL AND NOT is_allowed_transition(prev_status, status); The key detail: is_allowed_transition was a mapping table joined in SQL—not a UDF—because policies change and you want the rule itself to be auditable. is_allowed_transition This is the same instinct behind “unit tests for data.” Great Expectations is a well-known framework in that space: https://github.com/great-expectations/great_expectations https://github.com/great-expectations/great_expectations https://github.com/great-expectations/great_expectations I wasn’t running GE inside Tableau; the point is the mindset: write explicit expectations, and produce inspectable failures. 3) Independent recomputation (because “matching” can still be wrong) Financial data has a nasty property: your ledger and your reporting can agree and still both be wrong—because they’re downstream of the same flawed assumption. So for interest and some payment adjustments, I recomputed expected values independently: expected_interest = principal * rate * day_count_fractionreconcile actual vs expected within tolerancegroup exceptions by rule set / program / effective date expected_interest = principal * rate * day_count_fraction expected_interest = principal * rate * day_count_fraction reconcile actual vs expected within tolerance group exceptions by rule set / program / effective date These controls are expensive, so I scoped them to high-risk windows (recent changes, known error periods, regulatory samples) and ran them incrementally with backfill to catch late events. I also used a known exceptions table with expiry dates—no permanent ignore list. When an exception expires, it forces a re-review. Tableau: treat it like a triage console, not a presentation deck A monitoring dashboard fails when it tries to be a “pretty summary.” What worked was a triage layout: Open findings by severity and SLA classFindings per day with markers for deploys / feed changes / rule updatesWork queue table: one finding per row, owner tags, link to evidence queryDrilldown: before/after values, event sequence, raw source timestamps Open findings by severity and SLA class Findings per day with markers for deploys / feed changes / rule updates Work queue table: one finding per row, owner tags, link to evidence query Drilldown: before/after values, event sequence, raw source timestamps One rule I enforced hard: Tableau filters should never change the logic. They change what you’re viewing, not what’s true. The loop that made it compound over time Every time we found a real edge case, we did two things: Write a control that would have caught it earlierWrite a regression query proving the control work Write a control that would have caught it earlier Write a regression query proving the control work That’s how the system stopped being “a nice monitoring dashboard” and became a capability: a growing set of enforceable, testable truths. Results (what actually changed) Over the project lifetime (07/2022 to 09/2024), the controls reports became how the team tracked student-loan data health. Rolled out across 60+ monitoring views/tools with a ~10 person teamHelped handle 50+ enterprise problem tickets Rolled out across 60+ monitoring views/tools with a ~10 person team Helped handle 50+ enterprise problem tickets I’m not going to claim “dashboards saved millions.” That’s not how controls create value. Controls matter when they reduce time-to-detect and make root cause clear enough that engineering fixes the right thing. And if you’ve worked in financial data governance, you know the real bar isn’t “did we have a chart.” It’s detect, document, remediate, prove. The takeaway I wish more teams internalized Dashboards measure performance. Controls protect people. If you want your monitoring to survive production—and survive scrutiny—make it evidence-first: Evidence rows over counts.Deterministic temporal truth over “latest.”Independent recomputation where “matching” isn’t proof. Evidence rows over counts. Deterministic temporal truth over “latest.” Independent recomputation where “matching” isn’t proof. That’s how you stop arguing about numbers and start shipping fixes.