When A/B Tests Aren’t Possible, Causal Inference Can Still Measure Marketing Impact

Ever tried to prove a marketing change was effective, but you’re stuck without a clean A/B test? You’re not alone. In many real‑world settings—legal constraints, limited traffic, or budget—running a randomized experiment is simply impossible. In this post you’ll learn how to turn those constraints into opportunities by applying causal‑inference methods. We’ll walk through Diff‑in‑Diff, Synthetic Control, and Meta’s GeoLift, show how to prep your data, and provide ready‑to‑run code using libraries like causalpy and GeoLift. causalpy GeoLift 1. When A/B Testing is Off‑limits Scenario Why AB is hard Typical legal or business constraint Regulated industries Personal data restrictions, GDPR, HIPAA Must avoid randomization of treatment that could expose private data High‑traffic campaigns A/B would dilute spend across many users Only a few percent of traffic can afford the test Rapid iterations Budget or time constraints Full‑scale experiment takes weeks Geographical targeting Only certain regions available for test Randomization at city level may violate compliance or yield insufficient power Scenario Why AB is hard Typical legal or business constraint Regulated industries Personal data restrictions, GDPR, HIPAA Must avoid randomization of treatment that could expose private data High‑traffic campaigns A/B would dilute spend across many users Only a few percent of traffic can afford the test Rapid iterations Budget or time constraints Full‑scale experiment takes weeks Geographical targeting Only certain regions available for test Randomization at city level may violate compliance or yield insufficient power Scenario Why AB is hard Typical legal or business constraint Scenario Scenario Why AB is hard Why AB is hard Typical legal or business constraint Typical legal or business constraint Regulated industries Personal data restrictions, GDPR, HIPAA Must avoid randomization of treatment that could expose private data Regulated industries Regulated industries Personal data restrictions, GDPR, HIPAA Personal data restrictions, GDPR, HIPAA Must avoid randomization of treatment that could expose private data Must avoid randomization of treatment that could expose private data High‑traffic campaigns A/B would dilute spend across many users Only a few percent of traffic can afford the test High‑traffic campaigns High‑traffic campaigns A/B would dilute spend across many users A/B would dilute spend across many users Only a few percent of traffic can afford the test Only a few percent of traffic can afford the test Rapid iterations Budget or time constraints Full‑scale experiment takes weeks Rapid iterations Rapid iterations Budget or time constraints Budget or time constraints Full‑scale experiment takes weeks Full‑scale experiment takes weeks Geographical targeting Only certain regions available for test Randomization at city level may violate compliance or yield insufficient power Geographical targeting Geographical targeting Only certain regions available for test Only certain regions available for test Randomization at city level may violate compliance or yield insufficient power Randomization at city level may violate compliance or yield insufficient power When you can’t do a classic experiment, you can still estimate causal impact by leveraging existing variation in the data—if you handle the assumptions carefully. 2. A Very Quick Primer on Causal Inference Term Meaning Counterfactual “What would have happened if we had not applied the treatment?” Average Treatment Effect (ATE) Expected difference in outcome between treated and control groups. Parallel Trends In Diff‑in‑Diff, the assumption that, absent treatment, treated and control would have followed the same trend. Synthetic Control Builds a weighted combination of control units to mimic the treated unit’s pre‑treatment trajectory. Geolift A specialized synthetic‑control variant for geo‑targeted advertising, accounting for location‑specific confounders. Term Meaning Counterfactual “What would have happened if we had not applied the treatment?” Average Treatment Effect (ATE) Expected difference in outcome between treated and control groups. Parallel Trends In Diff‑in‑Diff, the assumption that, absent treatment, treated and control would have followed the same trend. Synthetic Control Builds a weighted combination of control units to mimic the treated unit’s pre‑treatment trajectory. Geolift A specialized synthetic‑control variant for geo‑targeted advertising, accounting for location‑specific confounders. Term Meaning Term Term Meaning Meaning Counterfactual “What would have happened if we had not applied the treatment?” Counterfactual Counterfactual “What would have happened if we had not applied the treatment?” “What would have happened if we had not applied the treatment?” Average Treatment Effect (ATE) Expected difference in outcome between treated and control groups. Average Treatment Effect (ATE) Average Treatment Effect (ATE) Expected difference in outcome between treated and control groups. Expected difference in outcome between treated and control groups. Parallel Trends In Diff‑in‑Diff, the assumption that, absent treatment, treated and control would have followed the same trend. Parallel Trends Parallel Trends In Diff‑in‑Diff, the assumption that, absent treatment, treated and control would have followed the same trend. In Diff‑in‑Diff, the assumption that, absent treatment, treated and control would have followed the same trend. Synthetic Control Builds a weighted combination of control units to mimic the treated unit’s pre‑treatment trajectory. Synthetic Control Synthetic Control Builds a weighted combination of control units to mimic the treated unit’s pre‑treatment trajectory. Builds a weighted combination of control units to mimic the treated unit’s pre‑treatment trajectory. Geolift A specialized synthetic‑control variant for geo‑targeted advertising, accounting for location‑specific confounders. Geolift Geolift A specialized synthetic‑control variant for geo‑targeted advertising, accounting for location‑specific confounders. A specialized synthetic‑control variant for geo‑targeted advertising, accounting for location‑specific confounders. All of these methods rely on observational data, so the key challenge is to approximate the conditions of a randomized trial through clever modeling and careful data prep. 3. Data Preparation – The Common Thread Panel structure – you need observations over time for each unit (user, region, product, etc.). Pre‑ and post‑treatment periods – at least 5–10 periods before and after the intervention. Covariates – variables that predict the outcome and are not affected by the treatment (e.g., seasonality, marketing spend). Treatment flag – a binary column that is 1 during the treatment period for treated units and 0 otherwise. Balance diagnostics – check that treated and control units are similar on covariates before the event. Once you have a tidy dataframe, you can plug it into any of the libraries below. Panel structure – you need observations over time for each unit (user, region, product, etc.). Panel structure Pre‑ and post‑treatment periods – at least 5–10 periods before and after the intervention. Pre‑ and post‑treatment periods Covariates – variables that predict the outcome and are not affected by the treatment (e.g., seasonality, marketing spend). Covariates Treatment flag – a binary column that is 1 during the treatment period for treated units and 0 otherwise. Treatment flag Balance diagnostics – check that treated and control units are similar on covariates before the event. Once you have a tidy dataframe, you can plug it into any of the libraries below. Balance diagnostics 4. Difference‑in‑Differences (Diff‑in‑Diff) Diff‑in‑Diff is the workhorse of quasi‑experimental design. It’s simple, fast, and works well when you have a clear before/after signal and a suitable control group. Assumptions Parallel trends: the counterfactual trend for the treated group would have matched the control group. No spill‑over: treatment on one unit does not affect the outcome of other units. Parallel trends: the counterfactual trend for the treated group would have matched the control group. Parallel trends No spill‑over: treatment on one unit does not affect the outcome of other units. No spill‑over Implementation in Python (statsmodels) import pandas as pd import statsmodels.formula.api as smf # df: long format with columns `y`, `treat`, `time`, `unit` model = smf.ols( formula='y ~ treat * time', data=df ).fit(cov_type='cluster', cov_kwds={'groups': df['unit']}) print(model.summary()) import pandas as pd import statsmodels.formula.api as smf # df: long format with columns `y`, `treat`, `time`, `unit` model = smf.ols( formula='y ~ treat * time', data=df ).fit(cov_type='cluster', cov_kwds={'groups': df['unit']}) print(model.summary()) The coefficient on treat:time is the Diff‑in‑Diff estimate. treat:time Quick check: Parallel Trends import matplotlib.pyplot as plt pre = df[df['time'] < 0] plt.figure(figsize=(8,4)) plt.plot(pre.groupby('time')['y'].mean(), label='Control') plt.plot(pre.groupby('time')['y'].mean() + pre.groupby('time')['treat'].mean()*pre['treat'].mean(), label='Treated') plt.axvline(0, color='k', ls='--') plt.legend(); plt.show() import matplotlib.pyplot as plt pre = df[df['time'] < 0] plt.figure(figsize=(8,4)) plt.plot(pre.groupby('time')['y'].mean(), label='Control') plt.plot(pre.groupby('time')['y'].mean() + pre.groupby('time')['treat'].mean()*pre['treat'].mean(), label='Treated') plt.axvline(0, color='k', ls='--') plt.legend(); plt.show() If the lines are roughly parallel, your assumption is plausible. 5. Synthetic Control When you have a single treated unit (e.g., a city that receives an ad campaign) and many potential controls, synthetic control offers a principled way to construct a counterfactual. single Concept Compute weights for each control unit so that the weighted average of their pre‑treatment outcomes matches the treated unit’s trajectory. After the treatment starts, the weighted control series becomes your counterfactual. Compute weights for each control unit so that the weighted average of their pre‑treatment outcomes matches the treated unit’s trajectory. After the treatment starts, the weighted control series becomes your counterfactual. Library: causalpy causalpy The research output shows how to run a synthetic‑control analysis with causalpy. from causalpy import SyntheticControl import matplotlib.pyplot as plt # Suppose df_sc is a dataframe with columns: # 'unit', 'time', 'Y', and a column 'treated' that is 1 for the treated unit df_sc = causalpy.load_data('synthetic_control') result_sc = SyntheticControl( df_sc, treatment_id=1, treatment_start=51, outcome_var='Y', control_ids=[2,3,4,5] ) print(result_sc.summary()) fig, ax = result_sc.plot() plt.show() from causalpy import SyntheticControl import matplotlib.pyplot as plt # Suppose df_sc is a dataframe with columns: # 'unit', 'time', 'Y', and a column 'treated' that is 1 for the treated unit df_sc = causalpy.load_data('synthetic_control') result_sc = SyntheticControl( df_sc, treatment_id=1, treatment_start=51, outcome_var='Y', control_ids=[2,3,4,5] ) print(result_sc.summary()) fig, ax = result_sc.plot() plt.show() Interpreting the Output Metric Meaning ATT Average treatment effect on the treated. p‑value Probability of observing the effect if the null holds. Weights How much each control unit contributes to the synthetic counterfactual. Metric Meaning ATT Average treatment effect on the treated. p‑value Probability of observing the effect if the null holds. Weights How much each control unit contributes to the synthetic counterfactual. Metric Meaning Metric Metric Meaning Meaning ATT Average treatment effect on the treated. ATT ATT Average treatment effect on the treated. Average treatment effect on the treated. p‑value Probability of observing the effect if the null holds. p‑value p‑value Probability of observing the effect if the null holds. Probability of observing the effect if the null holds. Weights How much each control unit contributes to the synthetic counterfactual. Weights Weights How much each control unit contributes to the synthetic counterfactual. How much each control unit contributes to the synthetic counterfactual. Synthetic control is powerful but requires many pre‑treatment observations and a decent pool of control units. 6. GeoLift – Geo‑Targeted Incrementality Meta’s GeoLift is an open‑source implementation of an augmented synthetic control tailored for advertising campaigns that target whole regions (cities, states, etc.). It adds a ridge‑regularized forecasting layer to improve out‑of‑sample performance. augmented synthetic control Key Features Geographical focus – works at city or state level. Power analysis – helps you pick the right number of test markets. Augmented Synthetic Control – adds a ridge‑regularized regression on pre‑treatment data. Geographical focus – works at city or state level. Power analysis – helps you pick the right number of test markets. Augmented Synthetic Control – adds a ridge‑regularized regression on pre‑treatment data. Quickstart in R # Install GeoLift install.packages('remotes') remotes::install_github('facebookincubator/GeoLift') library(GeoLift) # Load example data data(GeoLift_PreTest) # 40 cities, 90 days pre‑campaign data(GeoLift_Test) # same cities, 15 days post‑campaign # Convert to GeoLift format pre <- GeoDataRead( data = GeoLift_PreTest, date_id = 'date', location_id = 'location', Y_id = 'Y', format = 'yyyy-mm-dd', summary = TRUE ) post <- GeoDataRead( data = GeoLift_Test, date_id = 'date', location_id = 'location', Y_id = 'Y', format = 'yyyy-mm-dd', summary = TRUE ) # Market‑selection & power analysis sel <- GeoLiftMarketSelection( data = pre, treatment_periods = c(10,15), N = 2:4, effect_size = seq(0,0.25,0.05), cpic = 7.5, budget = 100000, alpha = 0.1 ) # Run GeoLift inference gl_test <- GeoLift( Y_id = 'Y', data = post, locations = c('chicago','portland'), treatment_start_time = 91, treatment_end_time = 105 ) print(gl_test) plot(gl_test, type = 'Lift') plot(gl_test, type = 'ATT') # Install GeoLift install.packages('remotes') remotes::install_github('facebookincubator/GeoLift') library(GeoLift) # Load example data data(GeoLift_PreTest) # 40 cities, 90 days pre‑campaign data(GeoLift_Test) # same cities, 15 days post‑campaign # Convert to GeoLift format pre <- GeoDataRead( data = GeoLift_PreTest, date_id = 'date', location_id = 'location', Y_id = 'Y', format = 'yyyy-mm-dd', summary = TRUE ) post <- GeoDataRead( data = GeoLift_Test, date_id = 'date', location_id = 'location', Y_id = 'Y', format = 'yyyy-mm-dd', summary = TRUE ) # Market‑selection & power analysis sel <- GeoLiftMarketSelection( data = pre, treatment_periods = c(10,15), N = 2:4, effect_size = seq(0,0.25,0.05), cpic = 7.5, budget = 100000, alpha = 0.1 ) # Run GeoLift inference gl_test <- GeoLift( Y_id = 'Y', data = post, locations = c('chicago','portland'), treatment_start_time = 91, treatment_end_time = 105 ) print(gl_test) plot(gl_test, type = 'Lift') plot(gl_test, type = 'ATT') What the output tells you Lift – percent increase over the synthetic control. p‑value – statistical significance of the lift. Weights – which control cities contributed most to the counterfactual. GeoLift also offers a best = TRUE flag that automatically augments the synthetic control with ridge regression, giving tighter confidence bounds. Lift – percent increase over the synthetic control. Lift p‑value – statistical significance of the lift. p‑value Weights – which control cities contributed most to the counterfactual. GeoLift also offers a best = TRUE flag that automatically augments the synthetic control with ridge regression, giving tighter confidence bounds. Weights best = TRUE 7. Choosing the Right Tool Situation Best Fit Multiple treated units, clear before/after Diff‑in‑Diff Single treated unit, many potential controls Synthetic Control (causalpy) Geographically targeted ad campaign GeoLift Need Bayesian posterior causalimpact (R) or causalpy Bayesian models Situation Best Fit Multiple treated units, clear before/after Diff‑in‑Diff Single treated unit, many potential controls Synthetic Control (causalpy) Geographically targeted ad campaign GeoLift Need Bayesian posterior causalimpact (R) or causalpy Bayesian models Situation Best Fit Situation Situation Best Fit Best Fit Multiple treated units, clear before/after Diff‑in‑Diff Multiple treated units, clear before/after Multiple treated units, clear before/after Diff‑in‑Diff Diff‑in‑Diff Single treated unit, many potential controls Synthetic Control (causalpy) Single treated unit, many potential controls Single treated unit, many potential controls Synthetic Control (causalpy) Synthetic Control (causalpy) Geographically targeted ad campaign GeoLift Geographically targeted ad campaign Geographically targeted ad campaign GeoLift GeoLift Need Bayesian posterior causalimpact (R) or causalpy Bayesian models Need Bayesian posterior Need Bayesian posterior causalimpact (R) or causalpy Bayesian models causalimpact (R) or causalpy Bayesian models Libraries you’ll need Library Language What it does causalpy Python Diff‑in‑Diff, RD, Synthetic Control, Bayesian models GeoLift R Geo‑level augmented synthetic control for ad lift causalimpact R Bayesian structural time‑series causal inference statsmodels Python Quick Diff‑in‑Diff in OLS pymc Python Bayesian modeling for causalpy Library Language What it does causalpy Python Diff‑in‑Diff, RD, Synthetic Control, Bayesian models GeoLift R Geo‑level augmented synthetic control for ad lift causalimpact R Bayesian structural time‑series causal inference statsmodels Python Quick Diff‑in‑Diff in OLS pymc Python Bayesian modeling for causalpy Library Language What it does Library Library Language Language What it does What it does causalpy Python Diff‑in‑Diff, RD, Synthetic Control, Bayesian models causalpy causalpy Python Python Diff‑in‑Diff, RD, Synthetic Control, Bayesian models Diff‑in‑Diff, RD, Synthetic Control, Bayesian models GeoLift R Geo‑level augmented synthetic control for ad lift GeoLift GeoLift R R Geo‑level augmented synthetic control for ad lift Geo‑level augmented synthetic control for ad lift causalimpact R Bayesian structural time‑series causal inference causalimpact causalimpact R R Bayesian structural time‑series causal inference Bayesian structural time‑series causal inference statsmodels Python Quick Diff‑in‑Diff in OLS statsmodels statsmodels Python Python Quick Diff‑in‑Diff in OLS Quick Diff‑in‑Diff in OLS pymc Python Bayesian modeling for causalpy pymc pymc Python Python Bayesian modeling for causalpy Bayesian modeling for causalpy 8. Pitfalls and Best Practices Check assumptions – Always plot pre‑treatment trends and balance tables. Avoid contamination – Make sure control units are truly unaffected by the treatment. Power matters – For GeoLift, run the built‑in market‑selection to avoid under‑powered tests. Robustness checks – Try alternative control sets, add covariates, or use placebo periods. Document everything – Store the exact data version, code, and results in a reproducible notebook or script. Check assumptions – Always plot pre‑treatment trends and balance tables. Avoid contamination – Make sure control units are truly unaffected by the treatment. Power matters – For GeoLift, run the built‑in market‑selection to avoid under‑powered tests. Robustness checks – Try alternative control sets, add covariates, or use placebo periods. Document everything – Store the exact data version, code, and results in a reproducible notebook or script. 9. Actionable Next Steps Audit your data – Do you have a panel? How many pre‑treatment periods? Pick a method – Start with Diff‑in‑Diff if you have multiple treated units; otherwise, go synthetic control or GeoLift. Run a quick prototype – Use the code snippets above with your own data. Validate assumptions – Visualize trends, check balance, and run placebo tests. Interpret – Translate the statistical output into business terms (e.g., $X lift per 1,000 impressions). Iterate – If results are inconclusive, adjust the control set or gather more data. You can now estimate causal impact without a clean AB test, turning observational variation into actionable insight. Audit your data – Do you have a panel? How many pre‑treatment periods? Pick a method – Start with Diff‑in‑Diff if you have multiple treated units; otherwise, go synthetic control or GeoLift. Run a quick prototype – Use the code snippets above with your own data. Validate assumptions – Visualize trends, check balance, and run placebo tests. Interpret – Translate the statistical output into business terms (e.g., $X lift per 1,000 impressions). Iterate – If results are inconclusive, adjust the control set or gather more data. You can now estimate causal impact without a clean AB test, turning observational variation into actionable insight.