Group similar test failures in test dashboard? #6778

gjoseph92 · 2022-07-21T23:07:59Z

In #6774, I manually went through the CI dashboard and identified taxonomies of test failures that were similar/the same across many tests. I have a feeling it would be valuable (and reasonable effort) to create a view that does this automatically, helping identify high-impact issues affecting CI.

Generally, there are probably 2 reasons for flaky tests:

An individual test is written in a way that's unreliable (too reliant on timing, actually causes a deadlock sometimes, etc.).
A bug in dask is causing something unrelated to the test to fail (timeout connecting to the cluster, asyncio error during cluster teardown, etc.). These tend to pop up in many unrelated tests. Because they can happen anywhere, they tend to blow up CI and the flaky test dashboard, and are probably responsible for the majority of failing tests.

I think we could identify #2 in a more automated way, just by creating another view on the test dashboard that groups failures by the failure message (like how OSError: Timed out trying to connect to tcp://127.0.0.1:8786 after 5 s shows up in 13 different tests). There would need to be some fuzziness to this (an exact string match wouldn't work). But that visibility might help us to identify, prioritize, and fix the problems faster. It's also possible that these systematic problems would be more likely to affect users?

cc @ian-r-rose

The text was updated successfully, but these errors were encountered:

gjoseph92 added the tests Unit tests and/or continuous integration label Jul 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group similar test failures in test dashboard? #6778

Group similar test failures in test dashboard? #6778

gjoseph92 commented Jul 21, 2022 •

edited

Loading

Group similar test failures in test dashboard? #6778

Group similar test failures in test dashboard? #6778

Comments

gjoseph92 commented Jul 21, 2022 • edited Loading

gjoseph92 commented Jul 21, 2022 •

edited

Loading