Basic Knowledge

What is a Flaky Test?

Hugo Escafit

31 Aug 2023 — 5 min read

Do you ever feel this sensation in mathematics class, facing a problem, starting to write equations, the same as the ones on the classroom board, as you table neighbor, doing everything right and having the wrong solution at the end?

You will never know if it was the calculator's fault or yours. You don't have any choice but to do the equation again.

After a second try, the result is the right one, but you don't feel like having changed anything else.

Frustrating right? Welcome in the flakiness word.

Flaky tests are hell for developers because they generate inconsistent results, making it challenging to discern whether a failure is due to a bug or a flaky test. This uncertainty can waste a lot of time and effort on unnecessary debugging and troubleshooting.

At the end of this article, you will know what is a flaky test and be able to shine around your peers, introducing them to this concept.

🧐 Flaky Test Definition

Many definitions exist, but I choose to pick the Datadog one:

A flaky test is a software test that yields both passing and failing results despite zero changes to the code or test. In other words, flaky tests fail to produce the same outcome with each individual test run. The nondeterministic nature of flaky tests makes debugging extremely difficult for developers and can translate to issues for your end users.

This definition of a flaky test is pretty good. What we can enlight from it is the way that this specific issue is unpredictable and totally random. Every engineer will face a flaky test during their life. The problem is that most of the time, they decide to do nothing with it, rerun the CI, and merge the code.

This is the last thing you should do while facing a flaky test.

Indeed, they can be problematic because their unpredictability can create uncertainty about whether a failure is due to a real issue in the code or simply because of the test's flakiness.

A developer facing a flaky test. — Don't panic.

🥸 A Concrete Example

Imagine you're developing a web application, and you have an automated test that checks the functionality of a login form. The test automatically fills in a username and password, clicks the login button, and then verifies that the user is redirected to the home page.

The test usually passes, but sometimes it fails. After a thorough investigation, you find that the failure happens when the test runs more quickly than usual. When the test fills in the form and clicks the login button too quickly, the application hasn't finished loading a crucial piece of data from the server, and the login attempt fails.

In this case, the test is flaky because its result depends on timing which can vary. The code it's testing is fine—the login form does work correctly if the necessary data has loaded. But the test doesn't always allow for that loading time, so it sometimes fails.

🔍 How To Identify Flaky Tests?

Following all the reasons quoted below, you should be interested in this question.

It is possible to identify flaky tests, but it often requires careful analysis and can be quite challenging due to the inherent unpredictability of these tests. Some strategy exists if you suspect a test to be flaky.

Depending on your resources and what you are able to do, you should try to:

👉 Rerun Failed Tests

If a test fails, rerun it under the same conditions. If it passes on subsequent runs, it's likely flaky.

👉 Run Tests in Different Orders

Sometimes a test is flaky because it's inadvertently dependent on the tests that run before it. By shuffling the order of tests, you can uncover these hidden dependencies.

👉 Parallel Execution

Running tests concurrently can reveal flakiness caused by shared state or resource contention that doesn't appear when tests are run individually.

👉 Run Tests Under Different Conditions

Try running the tests on different machines, at different times, or under different network conditions. If the test results vary, the test may be flaky.

👉 Test History Analysis

Over time, if a test shows intermittent failures without associated code changes, it is likely flaky.

👉 Flakiness Dashboards/Tools

Some tools or CI/CD systems offer flakiness dashboards that analyze and report on test flakiness by tracking test reliability over time.

Be careful; even if these strategies can be quite effective, it's crucial to understand that no method is 100% reliable due to the unpredictable and complex nature of flaky tests. The inconsistency of flaky tests, their dependency on various external factors like timing and network conditions, and the challenge of reproducing all conditions exactly contribute to this uncertainty.

❌ How To Avoid Flaky Tests?

While it's not always possible to completely avoid flaky tests due to the complex and unpredictable nature of software testing, you can certainly minimize your occurrence by adhering to good testing practices:

👉 Isolation

Tests should not depend on other tests or be affected by the order in which they are run. Each test should set up its own data and clean up after itself to ensure the environment is ready for the next test.

👉 Avoid Timing Issues

If a test relies on specific timing or delays, it can often become flaky. For example, a test might pass if a server responds in time but fail if the server takes too long. It's better to mock out dependencies when possible or allow a generous timeout period for operations that must take time.

👉 Control External Dependencies

Tests should avoid depending on external services or systems when possible, as these can introduce unpredictability. Where necessary, these should be stubbed or mocked.

👉 Use Appropriate Test Data

Use fixed data that the test controls completely, rather than shared or random data. This can avoid failures caused by changes to the data.

👉 Handle Network Issues

Network issues can cause tests to fail intermittently. Mocking network requests or using a local server for testing can help avoid this.

👉 Use Automated Tools

Tools can help detect flaky tests, prevent bad practices that often lead to flakiness, and encourage practices that lead to more reliable tests.

👉 Retries and Fallbacks

Implementing retries and fallbacks in tests can help to mitigate the impact of flaky tests, though this doesn't address the root cause of flakiness.

Even with these measures, some flaky tests might still emerge, especially in complex systems. When that happens, the key is to identify and address them as soon as possible to prevent further complications.

Recap ⬇️

In short, flaky tests are unpredictable, and they can cause confusion for developers, making them a real challenge in the software development process. Despite that, understanding flaky tests can give you the power to identify, manage, and to some extent, prevent them.

Now you know what flaky tests are and how to identify and mitigate them. Remember, the unpredictability of flaky tests doesn't make them invincible — through good testing practices and careful management, you can reduce their impact on your development workflow.

Mergify FlakyGuard Product

FlakyGuard utilizes artificial intelligence to identify and help you rectify flaky tests within your software, compatible with any framework. Our team has been working on it for months, and it's about to change how you work as a developer.

To all developers out there who will face or are currently facing flaky tests in their journey: Good luck! May your code be clean, your tests be reliable, and your debugging sessions be short. Remember, every challenge is an opportunity to learn and grow. Keep going! ❤️