End-to-End tests are at the tip of the testing pyramid, they are supposed to give the most confidence that the system under test works, but on most End to End testing frameworks we find ourselves fighting “flaky” tests, and ending up not trusting the test suite. We hope we can change that with Detox.
Our story start with the Wix App — our official native iOS/Android app
In terms of engineering efforts:
Having Google Play and Apple App Store as our means of distribution, our releases are inherently not continuous deployment. So we have a release train (2 platform every week). But the distribution mechanism is not the real reason we don’t do true CD.
In fact, if we’d run the entire suite on both platforms, and just two OS versions on each platform, we’ll end up with 56 (2 platforms x 2 OS versions x 14) person-days for a full regression. But it gets even worse.
This ice cream cone is a software testing anti-pattern.
Let’s take a simplified example, if development goes at the same pace of 2 features or bug fixes per week, than QA will have two additional tests each week, meaning that in week 1 they will have 2 tests, and in week 7 the suite will be 7 times larger.
Add a growing product to the mix, which needs to hire more developers and increase the development process rate, and the QA test suite just explodes…The higher rate of features meant that the QA regression suite grows even faster.
The graph on the bottom shows the number of new developed features every week, the graph on the top shows the number tests in a QA test suite.
That’s nothing new though…
We don’t want to hire an army of people to do manual QA, we want automated tests, with modern continuous workflow that runs on CI, with a very short development feedback loop, and if all tests are green we have all the confidence we need to release a new version.
This is the testing pyramid, since you already know why tests are so important, and understand the different types of tests, we’ll focus on how we test each type rather than explaining what they are. Let’s break down E2E into two parts, pure UI automation (meaning, not testing external services), and full E2E, mimicking a user with real server data. These must run on a device or a simulator.
Let’s focus on mobile development, and React Native specifically.
What do we know how to do?
Unit TestingBusiness logic is mostly in JS, it’s easy to test on node with Jest. React Native, like React, uses Flux architecture to control app data flow. One of the most popular Flux implementations is Redux, which we use app wide. Although Redux is widely popular, we never felt comfortable with unit testing Redux apps, so we’ve developed methodologies and a test-kit for testing redux apps, check out redux-testkit for more information. Another popular flux implementations is Mobx, which is much more unopinionated than Redux, and has great testing capabilities. We’ve created an opinionated flavor to make it easier for our engineers. Remx can be tested quite easily, unit tests can be vanilla JavaScript, totally unaware of the underlying implementation, we will add more information about Remx in the near future.
**Component testing**Also run on node, we rely on Enzyme by Airbnb, and use Enzyme Drivers to help with mocking.
**UI Automation/End to End**But what do you do about end-to-end tests? These tests give the most confidence because they’re pretty much a robot running your app on a device.Maintaining an End to End test suite is hard, and isn’t as reliable as the others. but why ?
E2E tests are often considered to be flaky, in all platforms, web, iOS, Android.
Manual synchronization is used so commonly that we incorporate it into our testing frameworks infrastructure, API calls are filled with loops containing sleep()
functions.This is an example I took from Aaron Greenwald’s talk in React Amsterdam, it’s an actual piece of code we used to test our React Native app on our previous testing framework.
sleep(a_lot);
In order to understand how big of a problem is flakiness, let’s calculate the probability of a test suite to fail.
q: probability of a test to failn: number of tests
1-q is the probability to succeed.(1-q)^n is the probability of the entire suite to succeed.1-(1-q)^n is the probability of at least one test to fail.
If a test is flaky 0.5% of times:
And we have 20 tests
50 tests
100 tests
You get the point, very unreliable…
So, this is a complex issue… and we’ve had experience with a few frameworks in the past.
AppiumThe most popular solution out there the de-facto standard in the industry. We also checked what other companies with mobile products do regarding End to End tests, and found out that many don’t even have automation, and those who do, use Appium. The internals of Appium, its driver, is implemented using Instruments (iOS) and UIAutomator (Android) which are essentially external ways to interact with the device, just like a user.
We used Appium for 2 years in general and for 8 months with React Native, and found that we invested an unreasonable portion of our time writing tests and petting the system than actually writing features.
We found that End to End testing is really hard:
Tests are flaky, we got different results on different machines, frequent failures in CI, which the only solution for was addition of sleeps, which slowed the tests down.
Tests were already slow since Apple UIAutomation tool is limited to performing one action per second, and there’s a hack which removes this cap Instruments without delays (which is already unmaintained), so after each release of a new Xcode we would have to wait for patch before upgrading.
MagnetoIt is also worth noting Magneto, an E2E testing framework for Android only,a solution by Everything.me, where I previously worked, built with UIAutomator as the main driver.
Other frameworks, like Robotium and Calabash are not under active development anymore.
The main resemblance between these frameworks is that they are blackbox testing frameworks.
A box
What is blackbox testing? It’s a method of testing stuff from the outside, without knowing what’s going on internally.In mobile, black box E2E frameworks essentially go over the view hierarchy, looks for an element (if it’s not found, sleeps, continues looping in this manner until a certain timeout), then interacts with that view. Same principles are applied in web black box E2E.
Now, think how unfair is asking the users to provide this timeout, they have no idea what’s going on inside the operating system, or even inside the application, and that is the main cause of flakiness.
E2E gets even more flaky when used on react-native apps…
On native apps, there’s only one thread responsible for rendering the UI (the main thread).With React Native it’s a bit trickier, React Native’s unique architecture adds complexity to the system, its UI rendering starts at the reconciler , which calculates which parts of the UI have changed, this is done on the JavaScript thread, which is then passed over an asynchronous bridge and translated into native instructions for the main thread to render a real layout.Due to this asynchronous rendering mechanism it uses, there are now two threads controlling the rendering, so blackbox testing frameworks have even greater trouble controlling React Native apps.
When a React Native app starts it loads a bundle, either from a local packager server or from an asset on the device, in any case this is an asynchronous process which takes an undetermined amount of time. A black box testing framework will need to sleep during this process as well, but for how long ? There is no real answer.
Black box was a dead end, we needed a different approach …
Detox does Gray box, not Black box, to allows the test framework to monitor the app from the inside and actually synchronize with it.
Gray box essentially uses a piece of code that is planted in the app, it can help us see what’s going on inside.
Unlike Black box, Gray box runs in the same process, has access to memory, and can monitor the execution process. Being able to read internal memory gives it the ability to detect what’s happening inside the process: if there are network requests in flight, when the main thread is idle, other threads are idle, Animations have ended, the react native bridge is idle. It can execute on main thread, to make sure that when it performs actions nothing in the UI hierarchy changes in the meantime.
But there are also downsides — Usually when testing with gray box testing frameworks the app goes through a different compilation/running process since it needs extra code that is executed from inside the process. For us it was worth sacrificing this point and get this huge value in return.
The leading native gray box drivers are developed by Google — EarlGrey for iOS and Espresso for Android. These frameworks can synchronize with the application, making sure to only interact with the app when it’s idle.
The underlying synchronization mechanism used in these gray box frameworks works in the following way.
Instead of retrying an action/expectation on the UI, they will query internal resources every few milliseconds or listen to callbacks from them telling that they have now switched to idle mode. The test will not continue until all of them return yes and only then, when the app is idle, it will interact with the UI.
Idling Resources
Detox does not rely on WebDriver, since this is not the web. Detox communicates with its native driver (which extends EarlGrey and Espresso) using a JSON-based reflection mechanism, this allows a common JavaScript implementation to invoke native methods directly on the device.
async-await
instead of putting everything in a promise queue means that breakpoints will work as expected.
A simple login flow test written with Detox
Detox is built from the ground up for native mobile and has deep first-class support for React Native apps.
We found out that React Native pretty much reimplements iOS and Android, so apart from the basic synchronization support of EarlGrey and Espresso for native apps, we had to create special synchronization mechanisms for React Native as well.
Traditionally, test frameworks evaluate expectations in the test script running on the computer. Detox evaluates expectations natively directly in the tested app running on a simulator. This enables operations that were impossible before due to different scope or performance reasons.
Let’s take a look at High level diagram, hopefully it will help us understand how Detox works.
For more in depth information on how Detox works visit the docs.
Let’s get back to our testing pyramid.
So we now have a stable End to End testing framework. But it may still be flaky due to network and server issues.
In order to do that we would need to remove the dependency of tests in network, with expected requests and responses in a consistent well-timed manner we will create pure UI automation (UI hermetic tests).
react-native-repackager is a mocking mechanism for our react-native JS code. Essentially it extends the packager’s ability to override bundled files with any other file, essentially creating an easy way to mock environments in react-native.
So you can create your own pre-packaged responses or set your endpoints to your local mock server, it can help a lot with separation of your testing concerns.
React-native-repackager turns Detox into a UI automation framework as well. Pyramid is all green, no excuses :) we can start testing!
The surprising thing is that not only gray box is more stable than blackbox, it’s also much faster. No more sleeps or waitUntil, code executes the millisecond the app becomes idle. so it’s about 5–10 times faster than black box solutions.In fact, it’s so fast it runs the full detox test project suite (79 tests) in 4 minutes.
Detox’s own test suite running on an iOS Simulator
Many have been asking about Detox support for Android. This is a feature that is being eagerly anticipated inside Wix as well.Now, open source is a wonderful thing, it can form collaborations that can take projects to the next level. A few months back we were contacted by Simon Rácz from KPN (a major telecom company in the Netherlands), he offered to help with Detox for Android. Since then he practically became a team member of Detox, implementing key features in our upcoming Android support.
Let’s see how it looks
Detox’s own test suite running on an Android Emulator
This is the same test suite we have been using to test our iOS implementation. It was virtually untouched, this way we could ensure our API is truly cross platform. Detox for Android is almost ready, in fact, there are very few things missing, for more details on our Android release, keep an eye on our releases page on github.
wix/detox_detox - Gray Box E2E Tests and Automation Library for Mobile Apps_github.com
I would like to thank the team members behind Detox: Leo Natan, Tal Kol, Sergey Ilyevsky, Simon Rácz, Elad Bogomolny, Daniel Schmidt, and to all of our other internal and external contributors, thank you guys, you’re awesome!
Our initial mission was to create a framework that we can trust, such that will give us confidence that when builds are green, we can release our new version, and with that to build a real continuous deployment workflow. In order to achieve that we needed to change our state of mind, there is no such thing as a flaky test, either there’s a bug in the app or the test framework is lacking, and our top priority is to fight flakiness, but this is not a a very easy task.We’re very excited about Detox, we hope that it will be useful to others as it is to us.