Improve awareness of frequently flaking tests #6500
Labels
discussion
Discussing a topic with no specific actions yet
flaky test
Intermittent failures on CI.
stability
Issue or feature related to cluster stability (e.g. deadlock)
Problem:
We have tests that are failing rather frequently. The test report is a good resource to get an overview, but you have to actively take a look at it to identify those frequently failing tests. At the same time, #6452 has demonstrated that these flakes can point toward general issues that affect users. Had we taken a look earlier, we would have caught #6494 earlier as well.
Question:
How can we improve our awareness of and response time to tests that start flaking frequently?
Possible Solution:
One possible solution would be to implement a bot that files a new ticket for every test that starts flaking at least x times in the last y CI runs on
main
. The extreme being a ticket for any failing test onmain
.There also exist (paid) tools that allow tracking flaking tests in more detail.
cc @fjetter
The text was updated successfully, but these errors were encountered: