In this article, we will take a look at some possible solutions to problems that appear if you adhere to the discipline of creating end to end (E2E) tests. As you continue developing an application, it will inevitably become more complicated and include more features. Maintaining the same E2E coverage will require more and more tests—and their execution time will add up. Let’s see when it becomes a problem and how we can address it.
Over the years, I’ve observed the following relationship between total execution time of continuous integration (CI) and my perception of it:
I’ve gotten a similar impression from other developers I’m working with. CI time up to 20–25 minutes allows you a quick, shallow context switch: answering some message, code review, taking a short break. Longer execution times force you to start working on another issue while you wait for feedback—and this can lead to more difficult context switching if you have to resolve some issues after CI response.
So, if in your case you see a similar pattern of a CI time threshold above which things get annoying, this means that at some point you will need to optimize it. Among many possible CI jobs, running E2E is one that takes a lot of time and is often on the critical path. Let’s analyze some ways of improving the performance.
The testing framework that you use will have a significant impact on all aspects of your QA experience. This includes the test speed. If you already have a test suite built, then it would be difficult to migrate to another one; but if you are starting from scratch, then it’s worth investigating your options.
Besides using faster tools, pretty much the only option to speed up your test is to run them in parallel. Often, by default, tests are run in one long queue, one by one. If instead, you manage to run them in two threads, you can get up to 50% speed improvement on the total test time. The real-world gains will be less than that. Both threads will compete for the same computation power, and some initialization steps will need to happen twice. Nonetheless, getting E2E to run in parallel is a great optimization tool in certain situations.
Let's see what options we have to achieve parallelization.
A straightforward approach would be to run many tests at the same time on the same machine. From the test runner point of view, it’s a simple job—it will start more browsers and execute tests from different threads. For the maintainers of the test suite, it’s not easy at all because running tests in parallel can introduce a lot of flakiness. There are a few things I found useful when using this approach.
To make it possible to run many tests at the same time, I needed to make sure there would be no collision in data used by each test. So in each test file, I added before and after scripts that generated new users and other entities directly in the database to be used by the test. Parts of the objects were predetermined, some parts randomly generated. Because each thread is running tests from separate files, and each file defines its own data, they can talk to the same backend, and there should be no collisions—as long as there is no system-wide state that affects sessions of all users.
For the application I work on, we use GitLab as a hosting and CI provider. The CI agents that are available in GitLab are not particularly strong machines. Even adding only one E2E thread was making each test significantly longer—eating up the possible benefits of parallelization. For this approach to work, you need to scale up the CI agents—that is, to provide stronger machines in terms of the number of CPUs and RAM. This is especially interesting if you have some powerful machines available and you want to use them for running CI.
In my project, after many years of running E2E as one job, we finally split the one big job into several ones. Luckily, it was an easy task because the whole system is split into applications, and each application has its own E2E suite. It was very natural to run each application’s E2E in a separate job, and thanks to depending on ad hoc data before, tests run late in the queue were independent of the ones run before them.
It took me a long time to try this approach because the startup time for an E2E job is pretty long in our case. Due to all the Docker containers that have to be downloaded and started, it takes about 6–7 minutes before the first test can be run. It felt wasteful to me, but eventually, the benefits of this approach came to outweigh the downsides. The benefits are as follows.
I’m especially happy that since we made the change with respect to how E2Es are run, my colleagues have a much simpler interface to evaluate failures. Before, you had to dive into job logs or error summaries to see what tests were failing. Now, just by seeing which job fails, you know which app is affected; and even without knowing the code and implementation details, you can make an educated guess about whether the failure is related to the recent changes or some random issue. This can be especially important when the people in charge of deployment are not developers themselves, and they have to rely on CI to know if the version is safe to deploy. This leads to faster reruns as well: you only need to retry the tests related to the job that failed, not all tests in the system.
This approach lets you leverage an “infinite” pool of cloud runners from your CI provider. Because each job is running only one thread of tests, lightweight machines you get from the cloud will do just fine. For us, it reduced a lot of overhead related to maintaining an auto-scaling cluster of runners that we were using before.
If you want to get updates about when I publish new content related to testing or other IT stuff, you can sign up here.
Also published here.