Disclosure: mabl, the ML-driven test automation service, has previously sponsored Hacker Noon.
Finding regressions in an application are some of the most frustrating times for developers. You’re building a new feature, you have a CI/CD pipeline, and you’re just trying to ship code fast. Unfortunately the new feature you just shipped broke something that previously worked. Now it’s time to find the needle in the haystack of what caused the issue, go fix it and make sure this fix doesn’t break the new feature you just built. Regression testing is a painful process.
That being said, Regression Testing is a very important step, especially in modern developer workflows where changes are being pushed several times a day. What matters most in software development is if customers can use your product successfully, period. We call this the end to end customer journey at mabl. Regression tests insure developers are continuing to ship features while not impacting the end to end user journey negatively.
Part of this user journey revolves around performance, because even if a feature works, if it causes performance regressions, that’s a problem. Of all companies where performance could impact users, Facebook is a prime example. They have 2 billion monthly active users (MAU’s) and a slight performance regression can have a massive impact. In 2013 Facebook embarked on a project to link code changes to regression testing of performance. They call this system CT-Scan.
CT-Scan focuses on two components:
The Facebook CT-Scan system
As described in Facebook’s blog, CT-Scan is specifically designed to detect and help remediate performance regressions. However there are several design principles that Facebook used for CT-Scan which can be applicable to general regression testing. We’ll highlight some of those below.
Often times, it’s hard to design QA or Staging environments that are similar enough to prod to detect all the issues that could crop up with code changes, especially as it relates to performance characteristics. One of the design architectures that Facebook chose is running regressions in all stages, including Production. The production regressions are less desirable for them since it requires data collection from user devices, however this disadvantage for Facebook may not be an issue for many developers where they don’t need to collect end user data. One way we at mabl have adopted this design approach is with the ability to easily repurpose or extend test plans across any and all environments and continuously run these tests. By designing test plans with easy portability, dev and ops teams can work off the same predictions, or expected test results for regression analysis. For example, there developers can run a series of regression tests on pages loading, or the ability to login, or a complete end to end test in a QA environment and ops teams can run those same tests in production and the length of time these tests take should be the same across both environments. Both teams easily work off the same tests and measure the results in exactly the same format, just as Facebook is doing. Ultimately, this portability insures that the performance you experience in pre-prod matches what the user actually experiences in production, which is actually the most important measurement.
The most important reason why Facebook developed CT-Scan is because regressions could impact performance, not just functionality. For example, take a customer journey of posting a picture on Facebook. Imagine it takes you 15 seconds to upload a photo, add a comment, and tag a friend in it. Now imagine after a code push, a regression test runs and the test succeeds. However, a second look at the perf details show that it took 3x longer for that same journey. As a customer, I can say I wouldn’t be happy. This is why CT-Scan is so important, it’s not just about functionality, it’s about performance. Mabl has adopted a similar approach by using machine learning technology. As Facebook does, we build specific machine learning models for each of the different journeys themselves which are used to predict the expected range for future test runs of the journey. If any test run doesn’t fall into the expected range, we alert the user, the same as Facebook would alert the developer.
Below is an example of a test run which took significantly longer than the expected range based on previous runs.
In addition to actually detecting regressions with CT-Scan, Facebook also collects as much data as possible to help diagnose what’s causing the regressions so developers have what they need to fix the issues. A example scenario they point out is, perhaps there was a memory increase after a certain code change. Linking that back to a certain object type in a certain call stack so the engineer has the data they need to diagnose and fix the memory increase would be very useful. This approach, of collecting and using several different diagnostic metrics to help with regression resolution is also an approach we’ve taken with mabl as well. One example is to automatically detect and record javascript errors and visual changes to our regressions and take screenshots of where they are happening within a test run. In most development, these types of diagnostics are most often collected and traced back to regressions by humans (if tracked at all) after the fact. However, collecting this type of data inline with tests allows system to automate the correlation as well as increases the pace of the resolution.
Regression testing isn’t the only type of test we can apply machine learning to. There’s a lot more we can try to automate and like Facebook, mabl and other companies are trying to make developers lives easier.
Originally published at www.mabl.com on November 19, 2017.