End-to-end testing is a Quality Assurance technique that involves exercising and validating an application's workflow from start to finish. This technique aims to reproduce real user scenarios and validate expectations over application output. Unfortunately, end-to-end testing has problems that make tests difficult to maintain: The testing process must generally assume that the system is in a . This might require seeding or wiping test data before/after the test is run. consistent starting state When are made, there may need to be changes to the test sequence to handle the application update. application changes Even when nothing changes, sometimes tests will fail anyway. Such tests are denoted as . flaky Over the last 18 months, I’ve been bootstrapping : a SaaS tool that allows non-technical QA personnel to implement and maintain end-to-end tests using a no-code recording UI. I’ve accrued a toolkit of strategies to fix tests, and automatically heal the test sequence when there are . reflow.io flaky application changes This article is a summary of those techniques, and how to apply them into your test suite, regardless of whether you use reflow or not. Why does this matter? Time is a precious resource for all product teams. It must be protected and spent wisely. . Flaky tests cost time Every time a test flakes, generally someone will investigate whether the failure is a true-failure, or a false-positive. The time cost of this can become very significant. A test may have downstream dependencies that cannot run until the test is re-run, or fixed. This can often escalate the time cost. The failed test may place your system into a non-deterministic state, which may then require manual effort to fix. Emotional Damage The very existence of an automated tests implies that someone cared enough to automate away some manual effort. When that manual effort returns through a test flake, there is an emotional toll. If the flake cannot be removed, the team will be very aware that they are stuck maintaining them in perpetuity. Automation Trust If the tests are flaky, the tests will not be trusted. If the tests are not trusted, they will become gradually deprecated and eventually removed. If the product team does not replace these tests, the codebase will lose test coverage, and may become less trusted. If the codebase is less trusted, productivity will suffer. Mitigations Strategy 1: Generic pre-action stability The most common flake reason that we see is caused by interacting with the page too quickly. Page elements often get rendered before they can be successfully interacted with. This means just checking for their existence is often not enough. There are a few generic actions that can be done to increase the likelihood that the page is fully loaded before it is interacted with. In playwright, these stability method are available at the api. waitForLoadState await page.waitForLoadState('domcontentloaded', { timeout: 15000 });
await page.waitForLoadState('load', { timeout: 30000 });
await page.waitForLoadState('networkidle', { timeout: 5000 }); These 3 load events are: - wait for the DOMContentLoaded event to be fired. This fires when the initial page document is loaded and parsed. For SPAs, stylesheets, images and javacript will generally not be loaded when this event fires. Hence it is usually preferable to rely on the event instead. domcontentloaded load - wait for the load event to be fired. This event fires when all markup, stylesheet, javascript and all static assets like images and audio have been loaded. For many SPAs, which dynamically load data after the page renders its initial document, the event may not be enough. load load - wait until there are no network connections for at least 500 ms. This is a useful event to hook into for SPAs which load data after they have their initial render. However, it may either be too early or too late for page interaction. networkidle In reflow, there is an optional 4th event: . This event is fired when the page: screenshotstable Has not changed within 5s. Looks like the page in its most recent successful test. We introduced this event as most applications, when loading, either show a loading animation or continuously re-render whilst they load data. It introduces a minor delay to test actions, but we believe the decrease in flakiness makes it worthwhile. Similarly, for a deterministic end-to-end test sequence, the page almost always looks the same between test runs. Hence we can save a screenshot in S3 to track how the page looks like before an action execution, and compare it during the test execution to the previous successful run of the same device/browser/operating system. As always, a timeout needs to be set to avoid this waiting forever when the application has changed, or is not expected to stop rendering. In reflow, these timeouts are automatically configured based on how the page behaved when it was originally recorded or, if it exists, the most recent successful test execution. Strategy 2: Intelligent Waiting If an action expects to be on a given page, we can wait until the navigation to that page is complete with . This can be important, as multiple events may fire during a navigation sequence: hence a will not always be enough. page.waitForNavigation load page.waitForLoadState If an action affects a specific element, we can do further action-specific steps. For example: Wait for the element to be Attached to the DOM. Wait for the element to be Visible. Wait for the element to be Stable, as in not animating or completed animation Wait for the element to be able to receive events If the element is a clickable element, wait for the element to be . enabled If the element is a text-entry element, wait for the element to be . editable In playwright, these . pre-action stability statements are automatically applied based on the interaction type In reflow, stability is further enhanced with: Waiting for the element to look the same as it did in the last successful run. I.e. continuously take screenshots of the element, and wait until they look the same (or a timeout). Apply a specific timeout to the element based on how long it took for the last DOM attribute update to be applied during the last successful run. E.g. if the button turned green via a attribute change after 7s, wait at least 7s for the element to get DOM attribute changes, before further timeouts. class Strategy 3: Pick Good Selectors When an application changes, the locators used to identify elements often also change. These locators are generally known as . selectors A smart development team will combat this with a strongly consistent element attribute, such as . E.g. data-test-id <button
  data-test-id={`test-actions-${testId}`}
/> To click such a button we can use the attribute to locate the element. data-test-id await page.click(`[data-test-id="test-actions-${testId}"]`); Other good options, for scenarios where adding a specific element selector is undesirable, are: Selector Description placeholder="..." Placeholders are often unique to the element [aria-label="..."] An attribute used by assistive technologies to help identify an element (e.g. for a screen reader) img[alt="..."] An attribute used to denote alternate text for an image, if the image cannot be displayed role="..." A role attribute is used to add semantic meaning to an element for assistive technologies (e.g. screen readers) input[type="..." An input often has a unique/unchanging type that references it for short forms nodeName If an element node type is only used once, the (e.g. [ ; ; ]) is often a good choice nodeName a input button #... Elements are often assigned unique attributes to aid in locating them by javascript. id Reflow will pick up such selectors automatically during test recording, and score them by uniqueness and type. It will also: Combine parents selectors with their children to reduce ambiguity. E.g. . [data-test-id="foo"] >> [data-test-id="bar"] Calculate all the possible sets of selectors, rank them, and inspect the page for the most likely element based on how close they are to the most recent successful run's element. Compare all page elements made from partial selector matches to a screenshot of the most recent successful run's element, to enable auto-healing the locator when it cannot be found or is ambiguous. We call this a Visual Selector Strategy 4: "Wait Until" checkpoints This strategy relies on writing application-specific logic to be built to ensure the application is in a specific state. For instance, if a page element represents a calculation, and we want to wait for that calculation to be completed before moving on, a developer could write an assertion like: await page.waitForSelector('[aria-label="calculation"] >> text=29.76') In reflow, such an assertion can be made with as long a timeout as desired. For instance, in most of reflow's internal test suite (powered by itself) it starts by: Making a new reflow test Navigating to the recording UI of that test Waiting until the recording UI looks like a pre-recorded screenshot, up to 5 minutes. The on step [3] removes the flakiness of both server startup times, and DNS propagation. It doesn't require any code, and works very reliably. When the starting URL has any visual changes, reflow will fail the test and offer an option to quickly replace it with the new visual snapshot. Visual Assertion auto-healing Strategy 5: Zero-dependency Data Seeding This strategy helps combat the problem "The testing process must generally assume that the system is in a consistent starting state", by breaking out an initial test suite that resets application data. In reflow, this is done by adding a user to a team, accepting the invite, then removing them. After this sequence all the user's end-to-end tests are associated with the team that they just left, and they are in an empty team. In other applications, this might be done via invoking a REST endpoint directly in the test suite to reset application data. E.g. invoking a REST handler that resets the database: await page.request.post(`${event.variables.url}/reset/scenario/empty?token=${event.variables.secret}`); Conclusion: Is there a way to eliminate all flaky tests forever? No. An engineering team can be smart, and drastically reduce the amount of time spent maintaining end-to-end tests by applying these strategies. However that time will never go to zero whilst an application is being actively developed. Test Automation helps ensure that QA effort is placed on the boundaries of new feature development, rather than endlessly covering existing features on every application change. Commercial low-code automation tools like can allow product teams to work more by reducing the . They are not a magic pill, but can be a very valuable tool for teams where QA and development personnel want to work closer together on test automation, regardless of coding ability. reflow.io effectively cost of automation

Low-Code Test Automation Tool

How to Fix Flaky End-to-End Tests with Playwright and Reflow

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

3 Free Marketing Tactics I Used to Grow 'Who Raised?' To $2K MRR

29 Tools and Resources for the Bootstrapping Entrepreneur

263 Meetings, 12 Months and $128,700 in Startup Funding Later

14 Ways Entrepreneurs Can Extend their Runway to Go the Distance

5 Ways To Cut Costs If You're Bootstrapping a Startup or SMB

A Bootstrapped SaaS Story: from $0 to $10K MRR 🦄

3 Free Marketing Tactics I Used to Grow 'Who Raised?' To $2K MRR

29 Tools and Resources for the Bootstrapping Entrepreneur

263 Meetings, 12 Months and $128,700 in Startup Funding Later

14 Ways Entrepreneurs Can Extend their Runway to Go the Distance

5 Ways To Cut Costs If You're Bootstrapping a Startup or SMB

A Bootstrapped SaaS Story: from $0 to $10K MRR 🦄

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps