-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automating Flaky Test Detection #3575
Comments
We track flaky tests in https://github.com/nodejs/reliability/ |
FWIW, I believe that detecting flaky tests hasn't been a big issue. "setting them as flaky to make our CI green again", as the referenced tweet put it, is the easy part. Finding people who are willing to fix them, on the other hand, has been rather difficult. Also, this was merged only recently: #3056 |
IMHO, github issues are not a good way of tracking issues that persist in different days. For example, I'd be super interested in learning how long does it take a flaky test to appear or fixed when introduced into the codebase. I don't know if the proposed service supports it or not, but I believe there is no harm in trying a new solution. Maybe a new product might create the dopamine effect we need to promote fixing the issues. |
I would not replace the current infrastructure with a 3rd party system straight away, but I'm not opposed to add it in a non-destructive way, evaluate it, and see how it reacts. I'm not sure having that data available would give us anything more useful than our current system, but I'd be happy to be proven wrong. Something important to consider is that sending data to the 3rd party must be disabled during security releases while CI is locked down. |
I think for Jenkins this does not need additional work, at the end of every test run we already run
I think that can be done in parallel to reliability report generation, which is done as a daily job scrawling data from Jenkins. I agree with @tniessen that the difficult part is actually fixing the flakes. I think we have more than enough open issues about flaky tests that contain whatever data a service like BuildPulse can collect, but from what I've observed that doesn't help much in terms of actually getting them fixed, some of the flaky test issues in the core repo are years old and still reproduce from time to time, no matter they are marked flaky or not, and we never know if it's caused by a real bug or if it's caused by platform issues, because we just don't have enough volunteers to actually look into them. I think when marking tests as flaky, we need to actually spend some brain energy into figuring out why it is flaky and why an immediate solution is difficult to come up with & the impact is small enough so we have to put on this band-aid for now. We also need to check that it is not actually a regression (e.g. nodejs/node#49852) (IMO the best solution to potential regression is revert ASAP and only reland when the flake is addressed). That part probably can't be accomplished by automation. |
Also I think what would be really useful is having automation to bisect regressions for newly appeared flakes (ideally on affected platforms only) to find the commit that introduce a flake. (Maybe old ones too, they can be bisected just fine). Even better if this can associate flakes with infra updates somehow in case it is caused by infra updates. This would be very helpful in addressing flakes like nodejs/node#49852 (for that it took some human intelligence to find the suspicious commit even though we did not verify it via bisect and just got a lucky guess). I worked with a similar automation in V8 before and I found it amazing. I am not sure if BuildPause supports that though. If it can only collect information based on past test failures but cannot do actual bisects, that’s still not much of an improvement compared to what we already have. |
Hi everyone - read through the comments so far, and would like to address some concerns. I agree that detection on it's own is not sufficient in solving the underlying problem - your GitHub issues track a subset of known flaky tests already. So what additional data can we provide? We can show (among others, these just directly address identified concerns):
These metrics, alerting, and PR comments make sure flaky tests are tracked, prioritized correctly, and fixed. BuildPulse determines test flakiness if we see a passing and failing result for that test, but no code has changed. We determine if the code has changed by comparing git tree SHAs. Because of this continuous monitoring, we can identify flaky tests as they come up. |
+1 from me on that approach. I think the main question is if we have somebody from build or who has jenkins access who will volunteer to help with the effort as my guess is that we'll need update build jobs to send the data. |
So far I think no one objects to try it out in addition to what we already have. So we just need volunteers to integrate it with Jenkins and/or GitHub actions and see how it goes. |
hey all, i'd like to improve Windows builds of Node CI and fix flaky tests where possible, and really curious how BuildPulse can help here. So was just wondering what's the current status of this? |
@siddhantdange what would we need to have some kind of MVP of this? Anything I can help you with here? |
@shnooshnoo 3 steps!
Additionally, our onboarding docs can be found here. Happy to help here as well, if you can point me to the correct workflow! |
FYI @nodejs/build I will meet with @siddhantdange on Tuesday, February 6 at 16:00 CET. Anyone who wants is welcome to join. I'll post a meeting summary here afterward. |
A quick summary after the meeting:
|
This commit adds initial BuildPulse support by optionally uploading test results there. If machine is not configured to connect to BuildPulse, or if any condition is not met, the data will not be sent. If the upload fails, the test job will just continue. For now, this is a PoC and only Windows machines can upload results. If we decide to move forward with it, other platforms will be added. Refs: nodejs#3575
This commit adds initial BuildPulse support by optionally uploading test results there. If machine is not configured to connect to BuildPulse, or if any condition is not met, the data will not be sent. If the upload fails, the test job will just continue. For now, this is a PoC and only Windows machines can upload results. If we decide to move forward with it, other platforms will be added. Refs: nodejs#3575
@nodejs/build a PR enabling BuildPulse integration on part of Windows machines in PRs and daily builds is open here. |
This commit adds initial BuildPulse support by optionally uploading test results there. If machine is not configured to connect to BuildPulse, or if any condition is not met, the data will not be sent. If the upload fails, the test job will just continue. For now, this is a PoC and only Windows machines can upload results. If we decide to move forward with it, other platforms will be added. Refs: #3575
Flaky tests unnecessarily waste developer and CI time, impact product stability, and reduce confidence in tests. It seems that the https://github.com/nodejs/node repository suffers from flaky tests, as shown by the context below.
Context:
test-structuredclone-jstransferable
is flaky node#50260test-esm-loader-hooks
node#50336BuildPulse is a platform for detecting flaky tests, reporting on impact, mitigating against that impact, and ultimately fixing flaky tests. We rely on open source ourselves, so in an effort to give back, we offer BuildPulse to open-source projects for free (up to a certain volume on test results per day).
I'd like to propose using BuildPulse to replace the current flaky test workflow, which seems to be through https://github.com/nodejs/reliability. We can replace this repository by making it easy to search for known flaky tests and build context (links to test flakiness occurrences, error logs, commits, etc).
We can also enhance your current workflow through ncu-ci integrations, GitHub Issues integrations, pull request comments, reports, notifications, badges, and more.
I'm willing to do the integration myself, which largely consists of 2 steps:
If the integration is helpful, BuildPulse would appreciate a testimonial down the road :)
Happy to answer any questions and interested in what next steps might look like!
The text was updated successfully, but these errors were encountered: