FlakeRepro: Automated and Efficient Reproduction of Concurrency-Related Flaky Tests

Flaky tests, which can non-deterministically pass or fail on the same code, impose significant burden on developers by providing misleading signals during regression testing. Microsoft developers consider flaky tests as one of the top two reasons for slowing down software development. In order to debug the root-cause of a flaky behavior, a developer often needs to first reliably reproduce a failed execution. Unfortunately, this is non-trivial. For example, most of the flakiness in unit tests are caused by concurrency, and reproducing their failures requires specific thread interleaving. To address this challenge, we introduce FlakeRepro that helps developers reproduce a failed execution of a flaky test caused by concurrency. FlakeRepro combines static and dynamic analysis to quickly identify an interleaving that makes a flaky test fail with the same original error message. FlakeRepro is efficient: it can reproduce a failed execution after exploring few tens of interleavings. FlakeRepro integrates well with existing systems: it automatically instruments test binaries that can run on existing and unmodified test pipelines.

We have implemented FlakeRepro for .NET and used it in Microsoft. In an experiment with 22 Microsoft projects, FlakeRepro could reproduce 26 of total 31 concurrent-related flaky tests, after exploring only <7 interleavings and within 6 minutes on average.