Prerequisite: Know All About A/B testing What to do when you can’t A/B Test A/B Testing is one of the most important skills of a data professional. All major tech giants use this method for experimentation at scale. A/B testing has proven itself a lot of times (some popular case studies): t by Google helped them with a $200M revenue gain The 41 Shades of Blue Experimen by Electronic Arts helped them with a 43% conversion growth EA SimCity Case Study However, there are some limitations of A/B testing as well let’s understand cases when A/B testing should not or cannot be used: When we can’t establish independence between the two groups involved in the A/B test — i.e., adding someone to the “A” group impacts the “B” group and vice versa. [Cannot be used] The cost of a “bad” version is very high — A/B testing can cause users exposed to “bad” version churn. [Should not be used] For early startups with insufficient user traffic, A/B testing sample size collection or getting significance will take a lot more time. [Should not be used] — cases can be solved with experimentation technique adoption (won’t discuss in this newsletter) [A/B Testing should not be used] MultiArmBandit — case of spillover effect, where we are not able to establish independence b/w two groups involved in A/B Testing, can be solved with which will be discussed in this Newsletter in detail. [A/B Testing cannot be used] Switchbacking or Counterfactuals: Causal Impact or Synthetic control group techniques Beyond AB Testing I) Naive Method — Pre and Post Full Release Analysis The most naive approach one can think of when AB Testing is not possible is to do a full release and then analyze the before and after-release impact on metrics. No Control Group 🤔 doing this is not science at all, the outside world affects users a lot more than the product changes released. Pre-Post or Before-After Analysis does not consider external factors like weather, holiday, lockdowns, etc. II) Switchback Experiments or Time Split Experiments Researchers at MIT and Harvard developed a that outlines a theoretical framework for optimal analysis and design of switchback experiments. Switchback experiments, also known as time split experiments, employ sequential reshuffling of control/treatments to remove bias inherent to certain data. paper These methods are popular in 2-sided marketplaces, such as Doordash, Uber, Ola, Zomato, Swiggy, and Lyft, because they allow for robust experimentation on data with finite resources (drivers, riders/customers, etc.). — In marketplace experimentation problems of or often make traditional A/B testing ineffectual. Case Study of Doordash Network Effect Spillover Effect Use Switchbacking: If the treatment impacts a shared pool of resources, the control group will be affected, thereby invalidating our experiment. Switchback Method — Splits a fixed group of users into treatment and control versions over time (illustration below). Every 30 minutes we randomly all users in User Group A to either the control or treatment group. This method can apply to experiments with any number of treatments. The duration of each time split is fairly arbitrary, however, the guiding principle is that the duration should be small enough to show useful insights into our data, but not unnecessarily small so that computation becomes a problem. uses 30-minute windows. Doordash Limitation of Switchback Switchback experimentation can only be used when experimenting with different algorithms which are not user-facing: We cannot show different things on the User Interface as it would be a bad user experience. Switchbacks are perfect for experimenting with algorithms like Driver-Rider Matching or Surge Pricing, etc. III) Synthetic Control or Causal Impact Inferencing In 2015, Google released a paper (595+ citations), and in 2016 (927+ citations) introduced us to Synthetic Control which has been described as the “most important development in program evaluation in the last decade” (Athey and Imbens 2016). Inferring causal impact using Bayesian structural time-series The State of Applied Econometrics — Causality and Policy The synthetic control method is a statistical method used to evaluate the effect of an intervention in comparative case studies. It involves the construction of a weighted combination of groups used as controls, to which the treatment group is compared. This comparison is used to estimate what would have happened to the treatment group if it had not received the treatment. In the above image: The dark blue line is the metric we are looking at for concluding the impact of the experiment and the dotted line is our prediction if we hadn’t rolled out the treatment. The difference b/w the dotted line and the dark blue line is the treatment effect. The dotted black line above is the synthetic control which is predicted using a time series model. To build a model to predict synthetic control — below are the factors we consider. : The control group is used to train a weight vector that predicts the synthetic control values. Note that the control group cannot be influenced by the treatment in any way. Example: If Delhi is Treatment City then Control City can be Mumbai, Bangalore, Kolkata, Chennai, etc. Control City Data : Past data points to be taken into consideration to capture seasonal trend patterns. Treatment City Pattern in Previous Week/ Month/ Years : Last 30 days of data (evaluation metric — conversion rate) of the treatment city (let’s say Delhi). Treatment City Data in Control Period : Weather, Holiday, etc. Other Factors Loss Function is to minimize the difference b/w the synthetic control and treatment group before the start of treatment. Limitation of Synthetic Control Exogenous shocks like Lockdowns, War, etc. can still invalidate results. Difficult to detect small effects. Can’t dig into user-level heterogeneous effects as we over experimenting on the city level. Conclusion and Key Takeaways We discussed AB Testing case studies and AB Test’s importance in the experimentation world. We went through cases when AB Testing is not possible, particularly in the marketplace. Discussed Switchbacks or Time Split Experimentation Method and its drawbacks. Finally, we discussed Synthetic Control, its importance, how to create a synthetic control, and its drawbacks. Connect, Follow or Endorse me on LinkedIn if you found this read useful. If you liked this blog, don’t forget to hit the ❤️. Stay tuned for the next one! I am nominated for the HackerNoon 2022 Noonies, Vote for me: https://www.noonies.tech/2022/programming/2022-hackernoon-contributor-of-the-year-data References Michael Berk — Causal Inference using Synthetic Controls Nick Jones, Sam Barrows: Uber’s Synthetic Control | PyData Amsterdam 2019 Inferring causal impact using Bayesian structural time-series The State of Applied Econometrics — Causality and Policy Other Recommended Newsletters: [2] Mastering A/B Testing by understanding Pitfalls [3] Data Science in Ride-Hailing at Ola, Uber, Rapido, etc. [4] No more Cancellations? at Uber Also Published Here