Flaky tests #248

jbergstroem · 2015-11-11T00:49:43Z

Before the nodejs merge we only has pass/fail. We had a discussion about this during the merge and the conclusion was to introduce the flaky test structure already found in nodejs so we could always have a passing [although flaky] test suite for releases.

We [the build group] just brought it up again since the tests has a tendency to rot. Also, reporting tests to github (see #236) will make it even less visible since a flaky pass is still a pass.

What can we do to improve this? Do we add a policy for moving flaky tests to fail after a while? Removing them altogether? I know @Trott has been doing a lot of work to improve the status of flaky tests (well, tests in general) -- perhaps you want to chip in?

rmg · 2015-11-11T01:16:40Z

Could the flaky tests be parsed out and posted as individual statuses? Might make them visible enough.

jbergstroem · 2015-11-11T01:20:03Z

@rmg you mean as fails? if they're flagged as pass gh will still fold into "success"

rmg · 2015-11-11T01:30:29Z

As passes. The "folding" is only a UI thing, the status list could still be expanded to show them, which is a lot more visible than having to go through to the build logs.

jbergstroem · 2015-11-11T01:35:32Z

Then it could just as well be part of the current, which would currently show: All tested passed! (3 flaky, 14 skips, 790 total)

Trott · 2015-11-11T01:41:17Z

This may be unrealistic, but I would like to see a small and committed group of people aggressively work on the flaky tests and make them Not Flaky.

Speaking of which, nodejs/node#3636 needs an LGTM. Just saying.

Trott · 2015-11-11T05:31:56Z

(Gratuitous self-promotion, but I am doing my part to try to rally people to fix flaky tests.)

jbergstroem · 2015-11-11T06:08:59Z

@Trott what are your thoughts on making it a realistic goal? How can we put more pressure on either flagging tests as flaky or making sure they are followed up?

Trott · 2015-11-11T07:03:29Z

Here's what I've been thinking:

Form a WG. Once the flaky tests are all fixed, the WG can disband. One big benefit will be that it will be a small-ish group of people that talk to each other and are paying attention to what each other are working on. So things like "Hey, can someone go to Fix flakiness of pipeflood test node#3636 and look at not just the code changes, but the CI runs, and the history from the previous pull request that it came off of and the issue filed and..." ... yeah, all that will happen easier hopefully. "Oh, you finally solved that thing we've been having a back-and-forth about! Brilliant! Oooh, that is a clever fix. Bravo! LGTM!"
This is harder to solve, but for me, the biggest demotivator to fixing the flaky tests is that buildbots seem to fail with alarming frequency. ("Seem" is the operative word. I don't know that they actually do, but it sure seems like it.) So fixing that would indirectly be a huge help. Otherwise, why bother getting the tests to run reliably if they're still not going to run reliably because the buildbots are going to fail on CI? Maybe that can be the same WG. Maybe that's this WG? Maybe this WG is where the flaky tests stuff belongs rather than a separate WG?

(Wherever the flaky tests firefighters assemble, obviously, I want in.)

thefourtheye · 2015-11-11T08:29:45Z

Maybe we can raise alerts if the buildbots go down

orangemocha · 2015-11-11T12:37:33Z

The current documented policy for flaky tests (https://github.com/nodejs/node/wiki/Flaky-tests#what-to-do-when-you-encounter-a-new-flaky-test) calls for opening an issue to track them when you mark the test as flaky, and assigning the issue to the next release milestone. I am not sure if we have followed it in 100% of cases. During the node-accept-pull-request experiment we were aggressively marking tests as flaky, but there were so many different ones that were failing randomly that we had to give up. Reliability of the build bots is definitely something that we need to address, and I certainly haven't given up on that. Btw, @joaocgreis is working a CI job that helps stress a single test in CI to determine if it's flaky or not, which should be a useful tool in this context.

Assuming we are going to finally improve the reliability of build bots, the part that I think could use some improvement is clarifying who is going to take responsibility for fixing the flaky test / how to motivate people to do it. The person who marks the tests as flaky is usually the collaborator who is making the determination that the test is not failing because of the current pull request changes. They are not motivated to fix the test and not necessarily the most qualified in the particular test that is failing.

In a dev team working for one company, you could probably just assign the issue to the test author/owner. I am not sure that this would work in an open source project.

So how do we motivate collaborators to investigate and fix these failures? Here are some options we could consider:

We stick to the current policy of assigning them to the next release milestone. The main problem with this is that we might see the list too late in the release cycle, and decide to punt. To counter that, we could try to increase awareness of issues that are flagged for a milestone, throughout the milestone (for example by sending periodic reports). I think this might be useful even beyond flaky tests.
We somehow set a timeout for how long a test can be marked as flaky, or a limit to the number of tests marked as flaky, or both. When the limit is reached, we block merging all pull requests until the situation is brought back under control. This makes the whole team accountable for fixing flaky tests. I recall @domenic saying that this is what the v8 team does.
We try to motivate collaborators by some sort of reward mechanism, like keeping a scoreboard with how many issues they have resolved in a given period (and assign extra points for flaky tests).
We come up with an algorithm for assigning the flaky test issue to a specific collaborator, in addition to assigning a milestone. For example, the assignee could be the last active collaborator to have modified the test logic. If the search doesn't yield a suitable candidate, we could fall back to round-robin selection between a pool of collaborators, ideally choosing the pool based on the the area of the test.

rvagg · 2016-01-27T04:49:08Z

Possibly something to completely hand off to the new testing group, @Trott has basically been owning this whole area in the time between when this issue was posted and now. Shall we close? Is there anything in here that @nodejs/testing wants to capture first?

jbergstroem · 2016-01-27T05:00:21Z

Sounds good to me. For an 'outsider' it might be good to define the scope of the build and testing workgroup somewhere (where do I post what problem).

Trott · 2016-01-27T05:21:53Z

@jbergstroem wrote:

For an 'outsider' it might be good to define the scope of the build and testing workgroup somewhere (where do I post what problem).

The Testing WG is putting together docs so we can be chartered by the TSC. Because some of our charter is likely to overlap with the existing Build WG charter, the Build WG will probably have to ratify our charter as well.

It would be great if as many build folks as were willing would read the very short draft docs for the Testing WG and comment. You can find them here: https://github.com/nodejs/testing/pull/2/files

jbergstroem mentioned this issue Nov 17, 2015

Should there be a Testing WG? nodejs/node#3872

Closed

jbergstroem closed this as completed Jan 27, 2016

orangemocha mentioned this issue Feb 26, 2016

How to motivate people to fix flaky tests nodejs/testing#21

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky tests #248

Flaky tests #248

jbergstroem commented Nov 11, 2015

rmg commented Nov 11, 2015

jbergstroem commented Nov 11, 2015

rmg commented Nov 11, 2015

jbergstroem commented Nov 11, 2015

Trott commented Nov 11, 2015

Trott commented Nov 11, 2015

jbergstroem commented Nov 11, 2015

Trott commented Nov 11, 2015

thefourtheye commented Nov 11, 2015

orangemocha commented Nov 11, 2015

rvagg commented Jan 27, 2016

jbergstroem commented Jan 27, 2016

Trott commented Jan 27, 2016

Flaky tests #248

Flaky tests #248

Comments

jbergstroem commented Nov 11, 2015

rmg commented Nov 11, 2015

jbergstroem commented Nov 11, 2015

rmg commented Nov 11, 2015

jbergstroem commented Nov 11, 2015

Trott commented Nov 11, 2015

Trott commented Nov 11, 2015

jbergstroem commented Nov 11, 2015

Trott commented Nov 11, 2015

thefourtheye commented Nov 11, 2015

orangemocha commented Nov 11, 2015

rvagg commented Jan 27, 2016

jbergstroem commented Jan 27, 2016

Trott commented Jan 27, 2016