Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky tests #248

Closed
jbergstroem opened this issue Nov 11, 2015 · 13 comments
Closed

Flaky tests #248

jbergstroem opened this issue Nov 11, 2015 · 13 comments

Comments

@jbergstroem
Copy link
Member

Before the nodejs merge we only has pass/fail. We had a discussion about this during the merge and the conclusion was to introduce the flaky test structure already found in nodejs so we could always have a passing [although flaky] test suite for releases.

We [the build group] just brought it up again since the tests has a tendency to rot. Also, reporting tests to github (see #236) will make it even less visible since a flaky pass is still a pass.

What can we do to improve this? Do we add a policy for moving flaky tests to fail after a while? Removing them altogether? I know @Trott has been doing a lot of work to improve the status of flaky tests (well, tests in general) -- perhaps you want to chip in?

@rmg
Copy link

rmg commented Nov 11, 2015

Could the flaky tests be parsed out and posted as individual statuses? Might make them visible enough.

@jbergstroem
Copy link
Member Author

@rmg you mean as fails? if they're flagged as pass gh will still fold into "success"

@rmg
Copy link

rmg commented Nov 11, 2015

As passes. The "folding" is only a UI thing, the status list could still be expanded to show them, which is a lot more visible than having to go through to the build logs.

@jbergstroem
Copy link
Member Author

Then it could just as well be part of the current, which would currently show: All tested passed! (3 flaky, 14 skips, 790 total)

@Trott
Copy link
Member

Trott commented Nov 11, 2015

This may be unrealistic, but I would like to see a small and committed group of people aggressively work on the flaky tests and make them Not Flaky.

Speaking of which, nodejs/node#3636 needs an LGTM. Just saying.

@Trott
Copy link
Member

Trott commented Nov 11, 2015

(Gratuitous self-promotion, but I am doing my part to try to rally people to fix flaky tests.)

@jbergstroem
Copy link
Member Author

@Trott what are your thoughts on making it a realistic goal? How can we put more pressure on either flagging tests as flaky or making sure they are followed up?

@Trott
Copy link
Member

Trott commented Nov 11, 2015

Here's what I've been thinking:

  • Form a WG. Once the flaky tests are all fixed, the WG can disband. One big benefit will be that it will be a small-ish group of people that talk to each other and are paying attention to what each other are working on. So things like "Hey, can someone go to Fix flakiness of pipeflood test node#3636 and look at not just the code changes, but the CI runs, and the history from the previous pull request that it came off of and the issue filed and..." ... yeah, all that will happen easier hopefully. "Oh, you finally solved that thing we've been having a back-and-forth about! Brilliant! Oooh, that is a clever fix. Bravo! LGTM!"
  • This is harder to solve, but for me, the biggest demotivator to fixing the flaky tests is that buildbots seem to fail with alarming frequency. ("Seem" is the operative word. I don't know that they actually do, but it sure seems like it.) So fixing that would indirectly be a huge help. Otherwise, why bother getting the tests to run reliably if they're still not going to run reliably because the buildbots are going to fail on CI? Maybe that can be the same WG. Maybe that's this WG? Maybe this WG is where the flaky tests stuff belongs rather than a separate WG?

(Wherever the flaky tests firefighters assemble, obviously, I want in.)

@thefourtheye
Copy link

Maybe we can raise alerts if the buildbots go down

@orangemocha
Copy link

The current documented policy for flaky tests (https://github.com/nodejs/node/wiki/Flaky-tests#what-to-do-when-you-encounter-a-new-flaky-test) calls for opening an issue to track them when you mark the test as flaky, and assigning the issue to the next release milestone. I am not sure if we have followed it in 100% of cases. During the node-accept-pull-request experiment we were aggressively marking tests as flaky, but there were so many different ones that were failing randomly that we had to give up. Reliability of the build bots is definitely something that we need to address, and I certainly haven't given up on that. Btw, @joaocgreis is working a CI job that helps stress a single test in CI to determine if it's flaky or not, which should be a useful tool in this context.

Assuming we are going to finally improve the reliability of build bots, the part that I think could use some improvement is clarifying who is going to take responsibility for fixing the flaky test / how to motivate people to do it. The person who marks the tests as flaky is usually the collaborator who is making the determination that the test is not failing because of the current pull request changes. They are not motivated to fix the test and not necessarily the most qualified in the particular test that is failing.

In a dev team working for one company, you could probably just assign the issue to the test author/owner. I am not sure that this would work in an open source project.

So how do we motivate collaborators to investigate and fix these failures? Here are some options we could consider:

  1. We stick to the current policy of assigning them to the next release milestone. The main problem with this is that we might see the list too late in the release cycle, and decide to punt. To counter that, we could try to increase awareness of issues that are flagged for a milestone, throughout the milestone (for example by sending periodic reports). I think this might be useful even beyond flaky tests.
  2. We somehow set a timeout for how long a test can be marked as flaky, or a limit to the number of tests marked as flaky, or both. When the limit is reached, we block merging all pull requests until the situation is brought back under control. This makes the whole team accountable for fixing flaky tests. I recall @domenic saying that this is what the v8 team does.
  3. We try to motivate collaborators by some sort of reward mechanism, like keeping a scoreboard with how many issues they have resolved in a given period (and assign extra points for flaky tests).
  4. We come up with an algorithm for assigning the flaky test issue to a specific collaborator, in addition to assigning a milestone. For example, the assignee could be the last active collaborator to have modified the test logic. If the search doesn't yield a suitable candidate, we could fall back to round-robin selection between a pool of collaborators, ideally choosing the pool based on the the area of the test.

@rvagg
Copy link
Member

rvagg commented Jan 27, 2016

Possibly something to completely hand off to the new testing group, @Trott has basically been owning this whole area in the time between when this issue was posted and now. Shall we close? Is there anything in here that @nodejs/testing wants to capture first?

@jbergstroem
Copy link
Member Author

Sounds good to me. For an 'outsider' it might be good to define the scope of the build and testing workgroup somewhere (where do I post what problem).

@Trott
Copy link
Member

Trott commented Jan 27, 2016

@jbergstroem wrote:

For an 'outsider' it might be good to define the scope of the build and testing workgroup somewhere (where do I post what problem).

The Testing WG is putting together docs so we can be chartered by the TSC. Because some of our charter is likely to overlap with the existing Build WG charter, the Build WG will probably have to ratify our charter as well.

It would be great if as many build folks as were willing would read the very short draft docs for the Testing WG and comment. You can find them here: https://github.com/nodejs/testing/pull/2/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants