-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34053][INFRA][FOLLOW-UP] Disables canceling push/schedule workflows #31153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
ok to test |
|
Testing it here as well: https://github.com/potiuk/spark/actions/runs/480315914 |
|
cc @dongjoon-hyun and @mik-laj FYI. |
|
add to whitelist |
Changes the configuration of the cancel workflow action to skip schedule/push events from canceling. This has the effect that duplicates of all direct pushes (master merges or direct pushes to the spark repository are not cancelled)
dd54694 to
8cdd4b1
Compare
|
Actually I also removed the "if pull_request". This is really useful because your "master" pushes will also trigger the cancel workflow that will keep on cancelling the PR duplicates. This helps in case the queue is full |
|
You can see it in operation here: https://github.com/potiuk/spark/actions And especially here: https://github.com/potiuk/spark/runs/1688506527?check_suite_focus=true#step:2:46 I had two builds running from master at the time when I pushed it and you will see that the action found them running but did not add them as candidates for cancelling because they were a 'push' type. |
|
👌 sounds good! |
|
I'm gonna merge and see how it goes. Other tests here are very unlikely related anyway. |
|
Merged to master. |
|
Let's monitor the builds and see how it goes .. |
Let me know if you see any problems. The action has pretty comprehensive logs and tells exactly what it does, but if you see any problems I am happy to jump-in and help. |
|
But it seems working fine in general though .. hm .. there seems a hole somewhere. let me monitor a bit more. cc @dongjoon-hyun, @maropu, @viirya, @Ngone51, @ulysses-you, @LuciferYang FYI. |
|
I think it works as expected. Unfortunately GA have very poor filtering in the "actions" list but there is one trick - you can filter by "branch" and for PRs it works pretty well: You will see here an example : SPARK-34092 branch was pushed twice and one of the two workflows managed to complete from the first push before it was re-pushed. The other workflow was still running when re-push happened and it was cancelled (as expected). The second push failed however (but it was failing test not cancelling). |
|
Also it's not easy to mtch the "canceling" workflow (unfortunately :( ) but you can scroll through the list of workflow and the cancelling workflow will be usually adjacent to the triggering workflow: If you look at the logs of this worfkflow you will see what happened (hoever due the way cancel action works it might be another cancel job that was run slightly earlier from the queue). In this particular case it is the previous 'cancel duplicates" job that triggered the cancel: https://github.com/apache/spark/runs/1692933716?check_suite_focus=true And you can see the logs telling exactly what happened: Hope it helps :) |











Changes the configuration of the cancel workflow action to skip schedule/push events from canceling. This has the effect that duplicates of all direct pushes (master merges or direct pushes to the spark repository are not cancelled)
What changes were proposed in this pull request?
Update to CI cancel policy to skip direct pushes. Duplicates will only be cancelled for Pull Requests.
Why are the changes needed?
Apparenlty the aggressive behavior of the cancel action which also cancels duplicate master builds is too agressive for spark community. This change spares merges to master and scheduled builds from duplicate checking (as a result all merges to master will be always build to completion).
The previous behavior of the action was that in case of subsequent merges to master, only the latest one was guaranteed to complete. Other changes could be cancelled before they complete to save job queue.
Does this PR introduce any user-facing change?
No, except if the master builds were somehow facing the users (but it's unlikely taking into account the ASF release policy).
There was a potential that some changes that could be detected by specific master merge failing could be detected later (in one of the subsequent builds) which could make investigation of the root cause of failure a bit more difficult, because it could have been introduced in one of the commits between two completed builds merges. But this is at most impacting the timeline of release close to release itself, not the release itself.
How was this patch tested?
This configuration parameter has been tested earlier in Airflow. We used it initially, but since our master builds are heavy and we have extensive tests in the PRs and investigation for failed builds is not as difficult we found that limiting the strain on Github Action and cancelling master builds was more important for the health of the whole ASF community and we removed that configuration.
Tested in https://github.com/potiuk/spark/runs/1688506527?check_suite_focus=true#step:2:46 where the action found other master builds in progress but did not add them as candidates to cancel.