How to Fix Flaky End-to-End Tests with Playwright and Reflow by@reflowio

How to Fix Flaky End-to-End Tests with Playwright and Reflow

Strategies to reduce end-to-end test failures, why you should care, and a little promotion for a low-code SaaS testing tool.
image HackerNoon profile picture

Bootstrapped low-code SaaS helping teams with their end-to-end test automation.

github social icon

End-to-end testing is a Quality Assurance technique that involves exercising and validating an application's workflow from start to finish.

This technique aims to reproduce real user scenarios and validate expectations over application output.

Unfortunately, end-to-end testing has problems that make tests difficult to maintain:

  1. The testing process must generally assume that the system is in a consistent starting state. This might require seeding or wiping test data before/after the test is run.

  2. When application changes are made, there may need to be changes to the test sequence to handle the application update.

  3. Even when nothing changes, sometimes tests will fail anyway. Such tests are denoted as flaky.

Over the last 18 months, I’ve been bootstrapping a SaaS tool that allows non-technical QA personnel to implement and maintain end-to-end tests using a no-code recording UI. I’ve accrued a toolkit of strategies to fix flaky tests, and automatically heal the test sequence when there are application changes.

This article is a summary of those techniques, and how to apply them into your test suite, regardless of whether you use reflow or not.

Why does this matter?

Time is a precious resource for all product teams. It must be protected and spent wisely. Flaky tests cost time.

  • Every time a test flakes, generally someone will investigate whether the failure is a true-failure, or a false-positive. The time cost of this can become very significant.

  • A test may have downstream dependencies that cannot run until the test is re-run, or fixed. This can often escalate the time cost.

  • The failed test may place your system into a non-deterministic state, which may then require manual effort to fix.

Emotional Damage

  • The very existence of an automated tests implies that someone cared enough to automate away some manual effort.

  • When that manual effort returns through a test flake, there is an emotional toll.

  • If the flake cannot be removed, the team will be very aware that they are stuck maintaining them in perpetuity.

Automation Trust

  • If the tests are flaky, the tests will not be trusted.
  • If the tests are not trusted, they will become gradually deprecated and eventually removed.
  • If the product team does not replace these tests, the codebase will lose test coverage, and may become less trusted.
  • If the codebase is less trusted, productivity will suffer.


Strategy 1: Generic pre-action stability

The most common flake reason that we see is caused by interacting with the page too quickly. Page elements often get rendered before they can be successfully interacted with. This means just checking for their existence is often not enough.

There are a few generic actions that can be done to increase the likelihood that the page is fully loaded before it is interacted with.

In playwright, these stability method are available at the waitForLoadState api.

await page.waitForLoadState('domcontentloaded', { timeout: 15000 });
await page.waitForLoadState('load', { timeout: 30000 });
await page.waitForLoadState('networkidle', { timeout: 5000 });

These 3 load events are:

  1. domcontentloaded - wait for the DOMContentLoaded event to be fired. This fires when the initial page document is loaded and parsed. For SPAs, stylesheets, images and javacript will generally not be loaded when this event fires. Hence it is usually preferable to rely on the load event instead.

  2. load - wait for the load event to be fired. This event fires when all markup, stylesheet, javascript and all static assets like images and audio have been loaded. For many SPAs, which dynamically load data after the page renders its initial document, the load event may not be enough.

  3. networkidle - wait until there are no network connections for at least 500 ms. This is a useful event to hook into for SPAs which load data after they have their initial render. However, it may either be too early or too late for page interaction.

In reflow, there is an optional 4th event: screenshotstable. This event is fired when the page:

  1. Has not changed within 5s.

  2. Looks like the page in its most recent successful test.

We introduced this event as most applications, when loading, either show a loading animation or continuously re-render whilst they load data. It introduces a minor delay to test actions, but we believe the decrease in flakiness makes it worthwhile.

Similarly, for a deterministic end-to-end test sequence, the page almost always looks the same between test runs. Hence we can save a screenshot in S3 to track how the page looks like before an action execution, and compare it during the test execution to the previous successful run of the same device/browser/operating system.

As always, a timeout needs to be set to avoid this waiting forever when the application has changed, or is not expected to stop rendering. In reflow, these timeouts are automatically configured based on how the page behaved when it was originally recorded or, if it exists, the most recent successful test execution.

Strategy 2: Intelligent Waiting

If an action expects to be on a given page, we can wait until the navigation to that page is complete with page.waitForNavigation. This can be important, as multiple load events may fire during a navigation sequence: hence a page.waitForLoadState will not always be enough.

If an action affects a specific element, we can do further action-specific steps. For example:

  1. Wait for the element to be Attached to the DOM.
  2. Wait for the element to be Visible.
  3. Wait for the element to be Stable, as in not animating or completed animation
  4. Wait for the element to be able to receive events
  5. If the element is a clickable element, wait for the element to be enabled.
  6. If the element is a text-entry element, wait for the element to be editable.

In playwright, these pre-action stability statements are automatically applied based on the interaction type.

In reflow, stability is further enhanced with:

  1. Waiting for the element to look the same as it did in the last successful run. I.e. continuously take screenshots of the element, and wait until they look the same (or a timeout).
  2. Apply a specific timeout to the element based on how long it took for the last DOM attribute update to be applied during the last successful run. E.g. if the button turned green via a class attribute change after 7s, wait at least 7s for the element to get DOM attribute changes, before further timeouts.

Strategy 3: Pick Good Selectors

When an application changes, the locators used to identify elements often also change. These locators are generally known as selectors.

A smart development team will combat this with a strongly consistent element attribute, such as data-test-id. E.g.


To click such a button we can use the data-test-id attribute to locate the element.


Other good options, for scenarios where adding a specific element selector is undesirable, are:




Placeholders are often unique to the element


An attribute used by assistive technologies to help identify an element (e.g. for a screen reader)


An attribute used to denote alternate text for an image, if the image cannot be displayed


A role attribute is used to add semantic meaning to an element for assistive technologies (e.g. screen readers)


An input often has a unique/unchanging type that references it for short forms


If an element node type is only used once, the nodeName (e.g. [a ; input ; button]) is often a good choice


Elements are often assigned unique id attributes to aid in locating them by javascript.

Reflow will pick up such selectors automatically during test recording, and score them by uniqueness and type. It will also:

  1. Combine parents selectors with their children to reduce ambiguity. E.g. [data-test-id="foo"] >> [data-test-id="bar"].
  2. Calculate all the possible sets of selectors, rank them, and inspect the page for the most likely element based on how close they are to the most recent successful run's element.
  3. Compare all page elements made from partial selector matches to a screenshot of the most recent successful run's element, to enable auto-healing the locator when it cannot be found or is ambiguous. We call this a Visual Selector

Strategy 4: "Wait Until" checkpoints

This strategy relies on writing application-specific logic to be built to ensure the application is in a specific state.

For instance, if a page element represents a calculation, and we want to wait for that calculation to be completed before moving on, a developer could write an assertion like:

await page.waitForSelector('[aria-label="calculation"] >> text=29.76')

In reflow, such an assertion can be made with as long a timeout as desired. For instance, in most of reflow's internal test suite (powered by itself) it starts by:

  1. Making a new reflow test
  2. Navigating to the recording UI of that test
  3. Waiting until the recording UI looks like a pre-recorded screenshot, up to 5 minutes.

The Visual Assertion on step [3] removes the flakiness of both server startup times, and DNS propagation. It doesn't require any code, and works very reliably. When the starting URL has any visual changes, reflow will fail the test and offer an auto-healing option to quickly replace it with the new visual snapshot.

Strategy 5: Zero-dependency Data Seeding

This strategy helps combat the problem "The testing process must generally assume that the system is in a consistent starting state", by breaking out an initial test suite that resets application data.

In reflow, this is done by adding a user to a team, accepting the invite, then removing them. After this sequence all the user's end-to-end tests are associated with the team that they just left, and they are in an empty team.

In other applications, this might be done via invoking a REST endpoint directly in the test suite to reset application data. E.g. invoking a REST handler that resets the database:


Conclusion: Is there a way to eliminate all flaky tests forever?


An engineering team can be smart, and drastically reduce the amount of time spent maintaining end-to-end tests by applying these strategies. However that time will never go to zero whilst an application is being actively developed.

Test Automation helps ensure that QA effort is placed on the boundaries of new feature development, rather than endlessly covering existing features on every application change.

Commercial low-code automation tools like can allow product teams to work more effectively by reducing the cost of automation. They are not a magic pill, but can be a very valuable tool for teams where QA and development personnel want to work closer together on test automation, regardless of coding ability.

react to story with heart
react to story with light
react to story with boat
react to story with money

Related Stories

. . . comments & more!