Quick Hits: The Hidden Cost of Test Flakes

A persistent challenge we face at work is the unreliability of our integration tests. These end-to-end tests, interacting with mostly real components, are susceptible to issues like timeouts that trigger failures—failures that rarely signify an actual problem.

This creates a situation where test “flakes” become mere noise, seldom requiring action. Over time, they can even become so commonplace that they’re simply ignored, whether they’re eventually fixed or not.

I’ve been meaning to write about test flakes for a while now. A deeper dive is definitely on the horizon, providing the necessary context. But first, I want to highlight something I believe will truly drive home the importance of tackling this issue.

Let’s say your team has a suite of 100 tests integrated into your CI pipeline. Imagine each of these tests, meticulously maintained, has a mere 1% chance of flaking on any given run. Even with such a robust testing setup, the probability of a completely flake-free run is only about 36.6%. That means there’s a whopping 63.4% chance of encountering at least one test failure, even if nothing is fundamentally wrong.

> sum(dbinom(1:100, 100, 0.01))1
 [1] 0.6339677

This stark reality underscores the insidious nature of test flakes. Even with a seemingly low individual flake rate, the cumulative impact on your CI pipeline can be significant. It’s a problem that can’t be ignored, and in my upcoming posts, we’ll delve deeper into the causes, consequences, and potential solutions for test flakiness.

  1. This sample R code shows how to calculate the probability of at least one test failure during a test run in our example scenario. ↩︎

Leave a Reply

Your email address will not be published. Required fields are marked *