How to Get Rid of Flaky Tests: 3 Tools You Should Know

How to Get Rid of Flaky Tests: 3 Tools You Should Know

Mathieu Poissard

While the best practices we've shared with you in articles Flaky Tests: How to Fix Them? and How to Get Rid of Flaky Tests? Best Practices may largely solve your flaky test problems, they can be difficult to implement and maintain.

In fact, these practices require brainpower and time. Two resources that software development teams often lack.

To make up for any shortcomings, you should know that there are also tools available to help you detect and diagnose flakiness in your test suite.

Old School Tools

When it comes to flaky tests, there are already a few solutions on the shelves. After all, if there's no solution, there's no problem.

And that's only to be expected when you understand the problem faced by many software teams. However, the latter are rather limited, technologically and numerically speaking.

TeamCity by JetBrains

TeamCity spots flaky tests and centralized them in the dedicated tab called "Flaky Tests". Quite explicit, isn't it?

Flaky test detection by TeamCity is based on the following heuristics:

  1. A high flip rate. In TeamCity, a flip means that a test status changes from "OK" to "Failure", or vice versa. If the flip rate is too high, TeamCity will consider the test flaky.
  2. If there is a flip in the new build with no changes, the test is considered flaky by TeamCity.
  3. If the same test is invoked several times in the same build and the test status flips, the test is marked as flaky.

Pros & Cons

This TeamCity feature is ideal for scanning your test suite for flaky tests. The various heuristics leave little to chance, and most flaky tests will be correctly identified.

What's more, it doesn't seem necessary to configure TeamCity for each test framework.

Unfortunately, TeamCity stops at detection, leaving you with flaky tests on your hands and no concrete solution.

It's up to you to figure out how to manage these tests or how to get rid of them. For example, TeamCity does not automatically re-run your flaky tests.

Test Insights by CircleCI

Test Insights gives you a nice visibility into your test suite. It brings, as its name says, insights about the most failed, slowest, and flakiest tests.

Recently, Test Insights has included a whole new layer of detection and analysis dedicated to flaky tests.

These flaky tests are now labeled as “FLAKY" on the Insights dashboard.

To design and develop their solution, the CircleCI teams started with the following conclusion:

Historically, when a testing job in a workflow had flaky tests, the only option to get to a successful workflow was to rerun your workflow from failed.

Test Insights detect flaky tests by identifying tests that failed and passed, or flip as TeamCity calls it, on the same commit in a 14-day window.

Once a test is labeled as flaky, Test Insights will automatically re-run it until it passes, saving a lot of time for your team.

Pros & Cons

In addition to accurately detecting and spotlighting your flaky tests, CircleCI's Test Insights automates the re-run of your flaky tests.

And when you know that one of the only ways to deal with flaky tests is to re-run them again and again until they pass. This sounds like a really good idea.

Unfortunately, configuring Test Insights is quite time consuming. If you want to go further and not only spot your flaky tests but also re-run all the guilty ones, you have to configure specific jobs running each test framework.

Huge Limitations

Overall, these two tools have a few shortcomings and share some real limitations.

First of all, neither TeamCity nor Test Insights is able to detect all the flakiness in your test suite. Some of your flaky tests will fly under the radar.

What's more, these two technologies are not really general-purpose, with CircleCi even requiring specific configurations and integrations for each test framework present in your suite.

This complexity doesn't make onboarding any easier, and getting to grips with the functionalities can prove arduous, putting off first wins.

Finally, the two tools described above will take into account all your logs. On the face of it, completeness is a good thing, isn't it? Well, not really, since it means more data to analyze and therefore, a longer process time.

Introducing CI Monitoring: a Fine Blend of Human and Artificial Intelligence

CI Monitoring is an all-in-one solution for getting rid of your flaky tests or at least getting rid of their nuisance.

Whatever framework you use, CI Monitoring embeds an AI capable of analyzing your test suite and detecting flaky tests.

Once the flaky tests have been detected, CI Monitoring doesn't stop there: it adds a new touch of automation. Combined with Mergify's merge queue, CI Monitoring can automatically rerun failed flaky tests until they pass. Apart from the CI cost, your flaky tests become totally frictionless.

Finally, once detected, flaky tests are classified, and CI Monitoring recommends how to repair them.

Flaky Test Detection: How Does It Work?

CI Monitoring uses heuristics and statistics to classify tests and detect the flaky ones. It classes the tests in 4 categories:

  • Test failed → FAILURE
  • Test failed, has been retried, and failed again → FAILURE
  • Test failed, has been retried, and worked! → FLAKY
  • Test success → SUCCESS

Automatic Test Retrier: How does it work?

CI Monitoring is working step by step to analyze your test suite. Rather than doing custom integration with every possible framework, CI Monitoring is able to analyze logs and understand what the problem is.

Go Further and Smarter with CI Monitoring and the Merge Queue

CI Monitoring is also able to know that if a failure XYZ happened on 3 tests on 3 different PRs opened in the last hour, the problem is not coming from the PR but might be a flaky test or a CI failure.

Moreover, when CI Monitoring detects a CI failure (e.g. new linter has been merged) and the PRs fail because of that, CI Monitoring and our Merge Queue can rebase/update the failing PR to make sure they are now passing.

Finally, CI Monitoring is smart; we already said that, but that's a fact. It is able to detect what language or technology is used in the log by parsing the output and understanding it.


While there are a few tools that can address the recurring problem of flaky testing, most of them are limited by their technical capabilities and the complexity of configuration and use.

However, one tool stands out from the rest, filling the gaps left by the alternatives. It's CI Monitoring.

This agnostic tool adapts to any test framework you may be using. It intelligently detects the technologies used to analyze your test suite better. As a result, it detects your flaky tests accurately and reliably.

What's more, it delivers immediate value by automating the re-run of your flaky tests while giving you hints on how to remove any flakiness from your suite.

So, if you want to get rid of your flaky test, you should give CI Monitoring a try.

To all developers out there who will face, or are currently facing flaky tests in their journey: Good luck! May your code be clean, your tests be reliable, and your debug sessions be short. Remember, every challenge is an opportunity to learn and grow. Keep going! ❤️