Half the questions I got asked when I interviewed in 2022 are irrelevant now. Here are the ones I'd ask today.
The BI engineering interview hasn't caught up with the job.
I've been on both sides of the table. I've answered questions about star schemas, slowly changing dimensions, and the difference between OLAP and OLTP. I've also asked those questions. And here's what I've realized: most BI interview question lists were written for a world where the job was "build dashboards and write SQL." That world is shrinking fast.
In 2026, a BI engineer doesn't just build dashboards. You govern metrics for AI consumption. You design semantic layers that agents query autonomously. You sit between raw data and business decisions, and increasingly, an AI system sits between you and the stakeholder. The questions should reflect that.
I've organized these into six categories. The first three are foundational (you still need to know them). The last three are what separate a BI engineer who can survive the AI era from one who gets replaced by it.
SQL and Data Modeling (The Basics That Still Matter)
1. Write a query to find the second-highest revenue day per product category in the last 90 days.
This tests window functions, filtering, and whether you think about edge cases like ties. The strong approach uses DENSE_RANK() partitioned by category and ordered by daily revenue descending, then filters to rank = 2. The edge cases that matter: what happens when two days tie for first place (DENSE_RANK gives you the actual second-highest, ROW_NUMBER would arbitrarily pick one), what happens when a category has fewer than 90 days of data, and whether "revenue" means gross or net. That last point sounds pedantic in an interview, but in production, it's the kind of assumption that creates a wrong dashboard nobody catches for weeks. A strong candidate will write the query and then ask what "revenue" means before finalizing it.
2. Explain the difference between a star schema and a snowflake schema. When would you choose one over the other?
A star schema has a central fact table connected directly to denormalized dimension tables. A snowflake schema normalizes those dimensions into sub-dimensions, reducing redundancy but adding joins. The standard answer is "star for query performance, snowflake for storage efficiency." That's correct but incomplete. The deeper answer: star schemas are dramatically easier for a semantic layer to sit on top of, because the join paths are simple and predictable. If you're building governed metrics in dbt, Cube, or LookML, a clean star schema means your metric definitions stay readable. Snowflake schemas create join chains that make semantic layer definitions brittle and harder to maintain. In 2026, the modeling decision isn't just about query speed. It's about how easily machines (AI agents, semantic layers, automated testing) can navigate your schema without human intervention.
3. What is a slowly changing dimension? Walk me through how you'd implement a Type 2 SCD for a customer table where the address changes.
A slowly changing dimension tracks how attributes of a dimension entity change over time. Type 1 overwrites the old value (you lose history). Type 2 creates a new row with effective dates (you keep full history). Type 3 adds a column for the previous value (limited history). For Type 2 implementation: when a customer's address changes, you close the current record by setting its end_date to the change date and marking is_current as false. Then you insert a new record with the new address, a start_date of the change date, a null end_date, and is_current as true. The join from your fact table to the dimension uses a date range: WHERE fact.event_date BETWEEN dim.start_date AND COALESCE(dim.end_date, '9999-12-31'). The reason this matters more now than ever: AI agents querying historical data cannot intuit that a customer was in a different region six months ago. If the SCD logic is wrong, the agent will attribute historical revenue to the current address, not the address at the time of the transaction. Every historical analysis an AI generates depends on this being implemented correctly.
4. You have a fact table with 500 million rows and a dashboard that takes 45 seconds to load. Walk me through how you'd diagnose and fix the performance issue.
Start with the query profile, not the table. Run the dashboard's underlying query with EXPLAIN or your warehouse's query profiler. Look for full table scans, expensive joins, and sort operations. The most common culprits, in order of likelihood: the query scans the entire table because there's no date filter or partition pruning (fix: add a partition on date and ensure the dashboard applies a date filter). The query joins to a large dimension without a filter pushdown (fix: restructure the join or add clustering on the join key). The query computes an expensive aggregation at query time that could be pre-computed (fix: create an aggregate table or materialized view that pre-calculates the metric at the grain the dashboard needs). The dashboard fires multiple queries simultaneously when it loads (fix: consolidate into fewer queries or stagger the loads). A strong candidate will also mention that the right long-term fix depends on the query pattern. If the dashboard is accessed 50 times a day with the same filters, caching or pre-aggregation is the answer. If it's accessed once a week with varying filters, optimizing the underlying table structure is more impactful.
5. What's the difference between RANK(), DENSE_RANK(), and ROW_NUMBER()? Give a scenario where using the wrong one produces incorrect results.
All three assign a number to rows within a partition based on an ORDER BY clause. ROW_NUMBER() assigns a unique sequential number (1, 2, 3) with no gaps and no ties. RANK() allows ties but creates gaps (1, 1, 3). DENSE_RANK() allows ties with no gaps (1, 1, 2). The scenario where this matters: you're building a "top 3 products by revenue" report. Two products tie for second place. With DENSE_RANK(), you'd get ranks 1, 2, 2, 3 and show four products (correct, because there are legitimately four products in the top 3 ranks). With RANK(), you'd get 1, 2, 2, 4 and filtering to rank <= 3 would show three products but miss one of the tied second-place products (misleading). With ROW_NUMBER(), you'd get 1, 2, 3, 4, arbitrarily breaking the tie and hiding the fact that two products performed equally (incorrect). I've seen a production dashboard at a previous company that used ROW_NUMBER() for a "top accounts" view. It silently dropped tied accounts, and one of the dropped accounts happened to be one the sales team was actively pursuing. Nobody noticed for three months.
BI Tools and Visualization (Still Relevant, Differently)
6. You're building a dashboard for a VP who checks it once a week for 5 minutes. How does that change your design decisions compared to a dashboard an analyst uses daily?
For the VP: fewer charts, larger numbers, clear directional indicators (up/down arrows, red/yellow/green), no filters they need to manipulate, and a layout that tells a story top-to-bottom in under two minutes. The dashboard should answer "are we on track?" without requiring any interaction. Every element should have a comparison (week over week, versus target, versus forecast). If the VP has to click or scroll to understand the state of the business, the dashboard has failed. For the analyst: expose the filters, include Explore links, provide drill-down paths, and optimize for flexibility over polish. The analyst's dashboard is a starting point for investigation, not a finished answer. They need to be able to change the date range, swap dimensions, and click into underlying data. Building the same dashboard for both audiences is how you end up with something the VP finds overwhelming and the analyst finds constraining. I've seen teams try to split the difference with "simple" and "advanced" tabs. In practice, the VP never clicks the second tab, and the analyst resents the first one.
7. A stakeholder says the dashboard numbers "don't look right" but can't explain why. How do you respond?
First, take it seriously. "Don't look right" is an imprecise observation, but it's not an invalid one. People who look at their numbers every day develop an intuition for what's normal. When something feels off, they're often right even if they can't articulate why. The diagnostic process: ask what specifically looks different from what they expected. Is the number too high, too low, or just unfamiliar? Pull up the same view from last week or last month and compare. Check the data freshness (is the dashboard showing today's data or yesterday's?). Check for known upstream issues (did a pipeline fail?). Run the underlying query manually and verify the result matches the dashboard. If the data is correct and the stakeholder's intuition is wrong, that's still a conversation worth having. Maybe their mental model of the business has drifted from reality. Maybe a recent change (new product launch, pricing update, seasonal effect) explains the shift. Walking through the "why" together builds more trust than just saying "the data is correct."
8. Explain what a semantic layer is and why it matters more in 2026 than it did in 2022.
A semantic layer is an abstraction that sits between your raw data and the tools that consume it. It defines what metrics mean (net revenue = gross order value minus refunds), how tables relate to each other (orders join to customers on customer_id), and what access rules apply (only finance can see margin data). In 2022, the primary consumers of the semantic layer were BI tools and human analysts. The semantic layer existed for consistency: making sure the Looker dashboard and the Tableau report showed the same revenue number. In 2026, the primary consumer is increasingly an AI agent. When a stakeholder asks an NL-to-SQL interface "what was revenue last quarter?" the semantic layer is what prevents the agent from inventing its own definition of revenue, picking the wrong table, or ignoring access restrictions. Without a semantic layer, an AI agent will generate SQL based on whatever schema it can see, and it will define metrics however it seems most reasonable based on column names and context. That's not governed analytics. That's automated guessing. The semantic layer became the governance enforcement point for AI, not just a convenience for humans, and that shift is the single biggest change in BI engineering in the past four years.
9. When would you recommend a BI tool (Looker, Tableau, Power BI) versus a code-based approach (Python notebooks, Streamlit)?
BI tools are best when: the audience is non-technical, the analysis follows a repeatable pattern, the data is well-modeled, and ongoing maintenance needs to be low. A Looker dashboard that shows weekly KPIs to a marketing team is a perfect use case. The governance is built in (LookML definitions, access controls), the refresh is automatic, and the consumer doesn't need to know SQL. Code-based approaches are best when: the analysis is exploratory or one-off, the output needs custom logic that a BI tool can't express, or the consumer is technical and wants to modify the analysis themselves. A Python notebook that runs a cohort analysis with custom statistical tests is a better fit than trying to force Tableau to do something it wasn't designed for. Streamlit or similar frameworks fit the middle ground: you need something interactive and shareable, but the logic is too custom for a BI tool. The trap is defaulting to one approach for everything. I've seen teams build Streamlit apps for simple KPI dashboards (overengineered, hard to maintain) and try to build complex ML-driven analyses in Looker (underpowered, frustrating). Match the tool to the problem, not the other way around.
10. A team exports your dashboard data to Excel every week and builds their own reports. Is this a problem? What would you do?
Before deciding it's a problem, ask why they do it. In my experience, there are three common reasons. First, the dashboard doesn't answer their actual question. It shows the right data, but not in the format they need for their workflow (maybe they need to add their own calculations, run scenarios, or combine it with data from another source). Second, they need to manipulate the data in ways the BI tool doesn't support (sorting, conditional formatting, pivot tables with custom groupings). Third, they don't have direct access to the BI tool, and the CSV is how the data reaches them. Each reason has a different solution. For the first, build them a better dashboard or a parameterized Explore that answers their actual question. For the second, consider whether a spreadsheet template with live data via API is a better solution than a dashboard. For the third, give them access or build a scheduled email delivery. The worst response is to block the export and tell them to use the dashboard. That treats the symptom (data leaving the governed environment) without addressing the cause (the governed environment doesn't meet their needs). It also creates shadow analytics, where people find workarounds that are even less governed than the CSV export was.
Data Governance and Quality (The New Baseline)
11. How would you define and enforce a canonical metric (like "net revenue") across an organization where three teams already calculate it differently?
This is a people problem before it's a technical problem. Start by documenting how each team currently calculates the metric and why their version differs. Usually, the differences are defensible: finance excludes pending orders because they follow accounting standards, product includes them because they want to measure demand, and sales uses a different currency conversion because they report in the customer's local currency. The alignment step is a meeting (yes, a meeting) where stakeholders agree on a single canonical definition for shared reporting. This doesn't mean killing the other calculations. It means designating one as the official version and clearly labeling the others as variants with specific names (like "pipeline revenue" versus "recognized revenue"). Then implement the canonical definition in the semantic layer so that every downstream tool and AI agent uses it by default. Deprecate the old versions over a defined timeline with clear communication. Monitor for drift by checking whether anyone creates new tables or queries that recalculate the metric outside the semantic layer. The technical implementation takes a day. The stakeholder alignment takes a quarter. Most BI engineers underestimate the second part.
12. What's a data contract? How would you use one to protect a table that multiple downstream teams depend on?
A data contract is a formal agreement between a data producer and its consumers that specifies what the data will look like and how it will behave. A typical contract includes: the schema (column names, types, nullable constraints), freshness guarantees (updated daily by 6 am), quality thresholds (null rate under 2% for key fields, no duplicate primary keys), ownership (which team is responsible, who to contact when something breaks), and semantic descriptions (what each column means in business terms). To protect a critical table, you'd implement the contract as automated checks in your pipeline. dbt contracts can enforce schema expectations at build time. Data quality tools like Great Expectations or Soda can validate quality thresholds on every refresh. Alerting triggers when the freshness SLA is breached. The contract lives in version control alongside the code, not in a Confluence page nobody reads. The key nuance most candidates miss: a data contract only works if both sides honor it. If the upstream team changes the schema without updating the contract, or if the downstream team ignores breach notifications, the contract is theater. The enforcement mechanism matters as much as the specification. In the AI era, there's an additional dimension: the contract should be machine-readable so that an AI agent querying the table can programmatically check whether the freshness SLA is being met.
13. You discover that a key metric has been wrong for the past two weeks, but nobody noticed. What do you do?
The immediate steps: verify the error independently (don't trust a single observation), identify the root cause (pipeline change, upstream schema drift, logic error in the model, bad source data), determine the scope (which dashboards, reports, and downstream models are affected), and fix the underlying issue. Then the harder steps. Notify every team that consumed the affected metric during the two-week window. Be specific: "Revenue on the executive dashboard was understated by approximately 8% between March 1 and March 14 due to a filter that excluded a new product category." Don't bury the impact. Assess whether any decisions were made based on the incorrect data. If the board saw understated revenue numbers, that matters. If a marketing team paused a campaign because conversion rates looked low, that matters. Document how the error wasn't caught earlier. Were there no data quality tests on this metric? Were there tests that passed but didn't check for this specific failure mode? Did anyone look at the metric during the two weeks and not notice the change? The systemic fix is more important than the technical fix. Add a test that would have caught this specific error. Add a monitoring alert that flags unexpected metric changes above a threshold. And if nobody noticed for two weeks, ask whether the metric is actually being used to make decisions, or just displayed.
14. Explain the difference between data quality and data trust. Can you have one without the other?
Data quality is an objective property of the data: is it accurate, complete, timely, and consistent? You can measure data quality with tests (null rate, freshness, schema conformance, referential integrity). Data trust is a subjective property of the relationship between people and the data: do the consumers believe the data is reliable enough to base decisions on? You can measure trust by observing behavior (do people check the dashboard or do they pull their own numbers?). You can absolutely have quality without trust. I've worked on platforms where every table had extensive dbt tests, 99.9% pipeline uptime, and freshness SLAs that were never breached. Analysts still pulled their own data from source systems and reconciled it manually because they'd been burned in the past and nobody had shown them the testing infrastructure. You can also have trust without quality. Dashboards that load quickly and look polished create confidence regardless of whether the underlying data is correct. I've seen stakeholders make quarterly decisions based on a dashboard that had a silent calculation error for months, simply because the dashboard felt reliable. Building trust requires more than building quality. It requires visibility into the quality (showing people the tests exist and pass), communication when issues occur (proactively, not after someone discovers the problem), and consistency over time (trust is built in drops and lost in buckets).
15. How would you design a freshness SLA for a critical business table? What happens when it's breached?
The SLA should define four things. The expected refresh cadence (this table is updated daily). The freshness threshold (the table should reflect data no older than 24 hours as of 7 am local time). The monitoring mechanism (an automated check runs at 7:15 am and verifies the latest record's timestamp). The escalation path (if the check fails, alert the pipeline owner on Slack, then the data engineering on-call if not resolved within 30 minutes, then the data platform manager if not resolved within 2 hours). When the SLA is breached, three things should happen. First, an automated notification goes to the affected consumers, not just the engineering team, but the business teams that depend on this table. They need to know that today's numbers may be incomplete, so they don't make decisions on stale data. Second, the root cause is investigated and documented (even if the fix is simple, the pattern matters over time). Third, the table metadata should reflect the breach status so that any system querying it (including AI agents) can check programmatically whether the data is current. That last point is the one most people miss. A freshness SLA only protects humans if humans check it. If an AI agent queries the table at 9 am and the data hasn't refreshed, the agent will return stale results with full confidence unless there's a machine-readable signal that says "this data is not current."
AI and LLM Integration (The New Frontier)
16. An AI agent generates a SQL query against your warehouse and returns an answer to a stakeholder. How do you ensure that the answer is trustworthy?
There are five checkpoints, and most implementations miss at least three. First, semantic governance: did the agent use canonical metric definitions from the semantic layer, or did it generate its own SQL from the raw schema? If the semantic layer defines net revenue as gross minus refunds, and the agent wrote SUM(amount) against the orders table, the answer might be close, but it's not governed. Second, access control: should the agent (or the user it's acting on behalf of) have access to the tables it queried? Many NL-to-SQL implementations use a service account with broad read permissions, bypassing the row-level and column-level security that would apply to a human user. Third, data freshness: was the underlying data current at the time of the query? The agent doesn't know the SLA or whether it's been breached. It just queries whatever is there. Fourth, result validation: Is the returned number within a plausible range? A simple bounds check (revenue should be between X and Y for this time period based on historical patterns) can catch catastrophic errors before they reach the stakeholder. Fifth, auditability: is there a log of the original question, the generated SQL, the tables accessed, and the result returned? Without this, you can't debug incorrect answers or investigate compliance questions. The trustworthy answer isn't just a correct answer. It's an answer you can prove is correct, trace to its sources, and verify was produced within governance boundaries.
17. What's the difference between an AI agent querying your semantic layer versus writing raw SQL against your warehouse? Why does it matter?
When an agent queries a semantic layer, it's constrained to the metric definitions, join paths, and access rules that your data team has approved. The agent says, "Give me net revenue by region", and the semantic layer translates that into the canonical SQL expression. The agent doesn't choose the table, the join logic, or the calculation. It consumes a governed metric. When an agent writes raw SQL against the warehouse, it sees the full schema and makes its own decisions. It might pick the right table, or it might pick a staging table that hasn't been deduplicated. It might define revenue correctly, or it might use a column that includes pending orders when your finance team excludes them. It will make these choices based on column names, table descriptions, and whatever context the prompt provides, not based on your governance policies. The practical difference: with a semantic layer, you govern once, and every consumer (human or AI) gets the same governed answer. Without one, every AI query is an ungoverned experiment. At enterprise scale, where dozens of stakeholders are asking questions through an NL interface every day, ungoverned AI queries create metric sprawl at machine speed.
18. How would you evaluate whether an LLM-generated SQL query is correct before it runs against production data?
Layer the checks from cheapest to most expensive. First, syntax validation: does the query parse? This catches basic errors and is effectively free. Second, schema validation: do the tables and columns referenced in the query actually exist? Does the query respect column types (not comparing a string to an integer)? Third, semantic validation: does the query use the tables and columns that your governance policies designate as canonical for this type of question? If someone asks about revenue and the generated query hits a staging table instead of the production revenue table, the syntax is fine, but the semantics are wrong. This requires a mapping between business concepts and approved data sources, which is exactly what a semantic layer provides. Fourth, access validation: Should the requesting user have access to the tables and columns in the query? Check the generated query against the user's permission set, not the service account's. Fifth, result validation: after execution, does the output fall within expected ranges? If quarterly revenue is typically between $3M and $5M and the query returns $47, something is wrong. For production systems, I'd also add a human-in-the-loop option for high-stakes queries: anything touching financial data, PII, or executive reporting gets flagged for review before the result is returned.
19. A stakeholder asks your NL-to-SQL interface: "What was our revenue last quarter?" The system returns an answer. What could go wrong?
I'll enumerate the failure modes from most to least obvious. "Revenue" is ambiguous: gross revenue, net revenue, recognized revenue, booked revenue, ARR, and MRR are all things people mean when they say "revenue." Without a semantic layer enforcing a default definition, the agent will guess. "Last quarter" is ambiguous: calendar quarter or fiscal quarter? If your fiscal year starts in February, Q4 means something different from what the stakeholder might assume. The agent defaults to the calendar unless told otherwise. The table could be stale: a pipeline failure means the last two days of the quarter are missing. The number is systematically understated but looks plausible. The agent has no mechanism to check freshness. The agent picked the wrong table: your warehouse has revenue_summary, revenue_staging, revenue_final, and revenue_v2. The agent chose based on the table name and recency. The "best" table by those signals might not be the canonical one. Currency conversion: international orders might be converted at the daily rate, the monthly average rate, or the rate at time of booking. Different tables use different conventions. The agent can't know which is correct for this stakeholder's context. Comparison confusion: the stakeholder will inevitably compare the AI's answer to a number they saw on a dashboard. If the dashboard uses a different revenue definition (common), the numbers won't match, and the stakeholder will lose trust in one or both systems. Every one of these failure modes has happened in production systems I've worked on.
20. How would you design an audit trail for AI-generated analytics queries?
Every AI-generated query should produce a log record containing: the original natural language prompt (what the user actually asked), the interpreted intent (how the system parsed the question, including which metric, time period, and dimensions it identified), the generated SQL (the exact query that ran), the tables and columns accessed (for compliance tracking and access auditing), the semantic layer definitions used (if applicable, which canonical metric definition was applied), the data freshness at time of query (when was the underlying table last refreshed), the result returned (the actual answer the user saw), a timestamp and user identifier (who asked, when), and any confidence or validation signals (did the result pass range checks, did the system flag any ambiguity). Store these logs in an append-only table that's queryable by the data governance team. Build alerting on anomalies: queries that access unusual tables, results that fall outside historical ranges, or high volumes of queries from a single user. This audit trail serves three purposes. Debugging: When a stakeholder says, "The AI gave me a wrong number last Tuesday," you can trace exactly what happened. Compliance: If a regulator asks how a specific number was produced, you can provide full lineage from prompt to result. Improvement: By analysing which queries produce ambiguous or incorrect results, you can identify gaps in your semantic layer and improve the system over time.
Metrics and Business Acumen (What AI Can't Replace)
21. A product manager asks you to build a metric for "user engagement." How do you scope this?
Don't start building. Start asking questions. Engagement with what? The whole product, a specific feature, a specific workflow? Measured how? Logins, time spent, actions taken, features used, value generated? Over what time period? Daily active users, weekly active users, 28-day rolling engagement? For which user segment? All users, paid users, new users in their first 30 days, enterprise accounts? Compared to what? A target, a previous period, a cohort benchmark, a control group? Each combination produces a different metric with different implications. "Daily active users" measures breadth. "Average actions per session" measures depth. "Percentage of users who complete the core workflow" measures value. They can all be called "engagement", and they can all move in different directions simultaneously. The strong answer proposes a metric that's specific, measurable, and tied to a business outcome. "Weekly active users who complete at least one core workflow within 7 days of signup" is a metric. "Engagement" is a wish. The BI engineer's job is to turn the wish into a metric, and the translation requires business context that no AI can infer from a schema.
22. How do you decide whether a metric belongs in a dashboard, an alert, or a scheduled report?
Match the delivery mechanism to the decision pattern. If the metric needs continuous monitoring and someone checks it daily or multiple times a day, put it on a dashboard. Examples: daily active users, pipeline health, real-time conversion rate. If the metric is stable most of the time but requires immediate action when it crosses a threshold, make it an alert. Examples: error rate exceeding 5%, revenue dropping more than 20% day over day, a critical pipeline missing its SLA. Nobody should stare at a dashboard waiting for these events. The system should tell them. If the metric is reviewed periodically for trend analysis and doesn't require immediate action, put it in a scheduled report. Examples: monthly churn rate, quarterly revenue by segment, and annual customer acquisition cost. These need context and narrative, not a real-time chart. The most common mistake is putting everything on a dashboard. A dashboard with 40 charts is a dashboard with zero insights. The second most common mistake is never alerting. If your pipeline breaks at 3 am and nobody knows until a stakeholder complains at 9 am, you have a monitoring gap, not a dashboard gap.
23. You've been asked to measure the ROI of a new feature launch. The feature affects user behavior over six months. How do you set up the analysis?
Start before the launch, not after. Define the success metrics in advance: what behavior are you expecting to change, by how much, over what time period? Without a pre-defined hypothesis, you'll end up cherry-picking metrics that make the feature look good after the fact. Set up a cohort comparison. Users who have access to the feature versus users who don't (if it's a staged rollout) or users before and after the launch (if it's a full rollout). Account for seasonality and external factors by using a difference-in-differences approach or a matched control group if possible. Define leading and lagging indicators. The leading indicator might be the feature adoption rate in the first two weeks. The lagging indicator might be retention improvement at 90 days. Track both, because the leading indicator tells you if the feature is being used, and the lagging indicator tells you if it's creating value. Build the measurement infrastructure before launch day. Create the tables, define the metrics, and set up the dashboards. If you wait until after launch to start measuring, you'll lose the baseline data you need for comparison. Report incrementally: a week-one read on adoption, a month-one read on engagement, a quarter-one read on business impact. Don't promise a single "ROI number" at the end, because ROI for features that change behavior over six months is a curve, not a point.
24. What's the difference between a vanity metric and an actionable metric? Give an example of each.
A vanity metric looks impressive but doesn't inform decisions. Total registered users is the classic example. It only goes up (unless you delete accounts), so it always looks like progress. You can't act on it: knowing you have 2 million registered users doesn't tell you what to build, what to fix, or what to invest in. An actionable metric connects directly to a decision. Weekly active users who complete the core workflow (like sending a message, making a purchase, or uploading a file) tell you whether your product is delivering value. If it drops, you investigate. If it drops for a specific segment, you know where to focus. If it rises after a feature launch, you have evidence that the feature worked. The deeper distinction: actionable metrics have a threshold that triggers a response. If weekly active users drop below X, we investigate. If conversion rate falls below Y, we pause the campaign. A metric that you publish but never act on, no matter what the number is, is functionally a vanity metric even if it sounds important. BI engineers who can't make this distinction will build dashboards full of numbers that look good in a screenshot and change nothing in the business.
25. A director asks: "Are we on track this quarter?" What do you need to know before you can answer?
On track against what? There are at least four things this could mean, and the director may not have distinguished between them. On track against the quarterly revenue target set during planning? On track against the growth rate from the same quarter last year? On track relative to the forecast model that projects where you should be based on the weekly run rate? On track relative to the budget that was approved by finance? Each of these comparisons produces a different answer. You might be ahead of last year, but behind the plan. You might be on pace with the forecast but behind the budget. The BI engineer's job is not to guess which comparison the director means. It's to surface all relevant comparisons and let the director choose the framing. The strong answer builds a single view that shows: actuals to date, the quarterly target, the year-over-year comparison, the run-rate projection for the remainder of the quarter, and the gap between each. That view doesn't answer "are we on track?" It answers every version of the question simultaneously.
Systems Thinking and Communication (What Gets You Hired)
26. Describe a time you discovered that a widely used metric was being calculated incorrectly. What did you do?
This is behavioral, but it's the most revealing question on the list because the technical fix is almost always easy. The hard part is the organizational response. A strong answer describes the discovery (how you found the error, what tipped you off), the investigation (how you verified it wasn't just a different definition), the impact assessment (which decisions were made on the wrong data, and over what time period), the communication (how you told stakeholders, in what order, with what level of detail), and the systemic fix (what you changed so this category of error can't happen again). A weak answer focuses only on the technical fix ("I found a bug in the SQL and updated the model"). That's the least interesting part. What the interviewer is really asking is: can you deliver bad news to senior stakeholders, take accountability without deflecting, and turn a failure into a governance improvement? That's leadership, and it's what separates a senior BI engineer from a junior one.
27. How would you onboard a new analyst to your data platform in one week?
Day 1: pair them with an experienced analyst for a tour, not a documentation dump. Walk through three to five core dashboards, explaining not just what they show but why they exist and who uses them. Introduce the semantic layer and explain which metrics are canonical. Give them read access to the warehouse and the BI tool. Day 2 through 3: assign them a real but small question to answer using the platform. Something like "what was the conversion rate for new users last week, by channel?" The question should be answerable with existing dashboards or simple queries, but requires them to navigate the tools, find the right data, and verify their answer. Provide a known correct answer they can check against. Day 4: introduce the governance policies. Walk through the data contracts, freshness SLAs, and quality checks. Show them how to verify data freshness before trusting a table. Introduce them to the data catalog or documentation (if it exists and is current). Day 5: debrief. Ask what was confusing, what was missing, and what they had to ask someone else about. Their confusion is your documentation gap. Every pain point they hit is a pain point every future hire will also hit. If your answer to "how do you onboard someone?" is "point them to the Confluence page," ask yourself when that page was last updated. At most organizations, the answer is "sometime last year," which means you're onboarding people with outdated information and hoping they figure out the gaps on their own.
28. Your data platform serves five different teams. Each team wants different things. How do you prioritize what to build?
This is a product management question applied to internal infrastructure, and BI engineers who can answer it well are the ones who get promoted. The framework I'd use: first, categorize requests by type. Is this a new metric, a new data source, a performance improvement, a governance fix, or a visualization change? Each type has a different cost and a different impact profile. Second, score each request on three dimensions: business impact (how much revenue, time, or risk does this affect?), consumer breadth (does this serve one person or fifty?), and strategic alignment (does this move us toward our data platform goals or is it a one-off?). Third, be transparent about the tradeoffs. Show the requesting teams the prioritized list and the reasoning. "We're building the revenue reconciliation fix first because it affects finance's quarterly close, which has a hard deadline. Your attribution model update is second." Teams are surprisingly reasonable when they can see the reasoning, even if their request isn't first. The worst approach: prioritize by who asked most recently, who is most senior, or who complained the loudest. That creates a platform shaped by organizational politics rather than organizational needs.
29. How do you communicate a data limitation to a non-technical stakeholder without undermining their confidence in the platform?
The framing matters enormously. "The data is wrong" kills trust. "The data is wrong, and here's the Jira ticket to fix it" kills trust slightly less. "The data covers X accurately, doesn't yet cover Y, and here's our plan to add Y by next quarter" builds trust. The key is to lead with what the data can do, acknowledge the specific limitation clearly, and provide a timeline or workaround. Be concrete, not vague. "Our revenue numbers are accurate for domestic orders. International orders are not yet included because we're migrating the currency conversion logic. We expect to have international revenue in the dashboard by April 15. In the meantime, I can pull a manual report that includes international numbers with a 24-hour delay." That's a limitation the stakeholder can work with. Compare it to: "The revenue numbers might be a little off because of some international stuff." That's a limitation that makes the stakeholder wonder what else might be "a little off." Transparency builds trust. Vagueness destroys it. And if you don't proactively communicate limitations, stakeholders will discover them on their own, usually in a meeting, usually at the worst possible moment.
30. What does "governed, trustworthy analytics" mean to you? How would you build it?
This is the open-ended question I'd close every interview with. There's no single right answer. But the best answers will cover the five pillars. Centralized definitions: every key business metric is defined once, in a semantic layer, and every downstream consumer (dashboards, reports, notebooks, AI agents) uses that definition. No shadow metrics. No competing calculations. Tested quality: every critical table has automated data quality checks that run on every refresh. Schema tests, freshness checks, null rate thresholds, referential integrity, and business logic validations. Tests that fail block the data from being served, not just log a warning. Enforced access: row-level and column-level security that applies to every consumer, including AI agents and service accounts. Access policies are machine-readable and checked at query time, not written in a document and honored on the honor system. Visible provenance: for any number on any dashboard, a user should be able to trace it back to its source table, its transformation logic, its metric definition, and its freshness status. If they can't, trust is a matter of faith, not evidence. Maintained documentation: data catalogs, metric dictionaries, and governance policies that are current, discoverable, and actually read. Documentation that's 14 months out of date is worse than no documentation, because it creates false confidence. The hardest part of governed analytics is not any single pillar. It's maintaining all five simultaneously while the underlying data, the tools, the team, and the business requirements all change underneath you. That's the job. And in 2026, with AI agents now consuming your analytics alongside humans, the cost of failure on any of these pillars scales at a speed that human-only systems never had to contend with.
What I'd Tell a Candidate Preparing for BI Engineering Interviews in 2026
The SQL and modeling questions will still be there. Practice them. But the interviews that will differentiate you are the ones asking about governance, AI integration, and business judgment. Anyone can learn window functions. Not everyone can explain why a semantic layer matters when an AI agent is the one writing the queries.
Know your tools. But more importantly, know why the tools matter and what happens when they fail. That's the job now.
