-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add resume-ci
label
#43929
add resume-ci
label
#43929
Conversation
Review requested:
|
2932675
to
3705060
Compare
I'm very reluctant about adding this because I think it's a mistake to resume CI without looking at the failures, and if you are looking at the failures you can just click on the Jenkins button. |
Unless you are not a collaborator (e.g. a trigger), in which case you cannot resume using Jenkins CI. I agree with the sentiment though. |
I agree, CI should only be resumed if a human has made sure the specific flakiness existed before that PR - this is a tool that can help triaggers do something they currently cannot do this was already discussed in the original issue #40817 (comment) |
also, as for my understanding, TSC has addressed (part) of this discussion as well? #42125 (comment) |
Was it considered to allow triagers to access the Jenkins feature? |
IIRC that is the intention here: not to make resuming CIs simpler in general, but as a workaround to give this particular permission to triagers. |
I mean, do we really need a workaround, rather than giving them the permission inside Jenkins? |
If that is possible that sounds like a better solution to me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow, amazing
If I didn't need to resume CI repeatedly for every PR, I would sympathize with this. But CI is much too flaky for looking at errors to be worth my time until after I've resumed at least three, maybe five times. It should resume by default and stop after 3-5 attempts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not ideal that we have to do this but I do think this is the correct thing to do given the circumstances.
That is exactly how I assume #43522 was merged, and it has made CI much, much worse for all collaborators, to the point where it was virtually unusable for days.
Resuming CI without looking at errors makes it more likely to miss related failures and thus to introduce new flaky tests, which makes the situation worse for everyone, beyond that PR. Let's say a PR introduces a test that flakes 50 % of the time. Running one CI and checking for errors gives you a 50 % chance of catching it. Resuming CI without checking what errors occurred reduces the chances of catching the flaky test exponentially! I can't find it right now, but there was a PR a while ago that implemented something like this (i.e., automated resuming or something similar), which I was against for the same reason. There are other approaches that might be worth investigating. For example, the test runner could, when a test fails, re-run it |
The problem is more profound: it's virtually impossible to get a green CI without resuming. I currently have 7 PRs that I'm restarting CI every day. Something that would be extremely helpful is to get the list of failed tests as a PR comment. We could have a different approach:
|
I completely agree @mcollina, we are in a tough spot right now. All I am saying is that resuming without properly checking for errors only makes things worse.
Big +1 as long as we treat it as an urgent TODO list. (Essentially what I wrote in #43754 (comment).) |
Refs: #43929 (comment) PR-URL: #43954 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Feng Yu <F3n67u@outlook.com>
Refs: #43929 (comment) PR-URL: #43954 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Feng Yu <F3n67u@outlook.com>
Refs: #43929 (comment) PR-URL: #43954 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Feng Yu <F3n67u@outlook.com>
Refs: #43929 (comment) PR-URL: #43954 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Feng Yu <F3n67u@outlook.com>
Refs: #43929 (comment) PR-URL: #43954 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Feng Yu <F3n67u@outlook.com>
I think that this PR would be a great change. But I also think that (signed, an idiot who didn't realize request and retry were different, and so requested too many builds) |
@kvakil that sounds a great improvement! but:
|
In my mind resume wouldn't happen automatically, the author would still
need to go back and retag the PR with `request-ci` in order to resume
it. There just wouldn't be a separate `resume-ci` label: ideally the
tooling can detect if the CI needs to be rebuilt entirely (if there were
additional commits since the last CI) or if it can just be resumed (if
there have been no additional commits).
& to be clear this is just a wish, I definitely don't think it should
stop us from using this PR. I am not sure how hard the implementation
would be. I just think the user experience would be better.
|
@kvakil adding a |
Currently the user experience is to go to the Jenkins Web UI to check what are the failures, and if it turns out the failures are indeed unrelated to the PR to test, the "Resume build" button is right there on Jenkins UI, there's no reason to go back to GitHub to add a label. What I'm trying to say is this label is not meant to improve the UX (it won't), it's to enable triagers to resume CIs, which they currently can't.
That's exactly what's controversial about this PR: some are concerned that adding this label would make collaborators/triagers less likely to think about the CI failures and instead re-apply the label without checking the failures until they get a passing CI – in particular, it would enable the landing of PRs that introduce flaky tests, making the CI even less reliable than they currently are.
That'd be a very nice feature indeed! I would go further: only accept 1 |
yes, I will probably wait for my nomination to complete so my jenkins token will actually work :) |
The particular UX I dislike here is having |
Refs: nodejs#43929 (comment) PR-URL: nodejs#43954 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Feng Yu <F3n67u@outlook.com>
Refs: nodejs/node#43929 (comment) PR-URL: nodejs/node#43954 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Feng Yu <F3n67u@outlook.com>
Refs: nodejs#43929 (comment) PR-URL: nodejs#43954 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Feng Yu <F3n67u@outlook.com>
Refs: #43929 (comment) PR-URL: #43954 Backport-PR-URL: #45126 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Feng Yu <F3n67u@outlook.com>
fixes #40817
awaiting merge of nodejs/node-core-utils#642