-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: rerun workflow from failed #28143
Conversation
CLA Signature Action: All authors have signed the CLA. You may need to manually re-run the blocking PR check if it doesn't pass in a few minutes. |
* @returns {Promise<any[]>} A promise that resolves to an array of workflow items. | ||
* @throws Will throw an error if the CircleCI token is not defined or if the HTTP request fails. | ||
*/ | ||
async function getCircleCiWorkflowsByBranch(branch: string): Promise<any[]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run
rerun-from-failed: | ||
when: | ||
condition: | ||
equal: ["<< pipeline.schedule.name >>", "rerun-from-failed"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Builds ready [6e99343]
Page Load Metrics (2148 ± 129 ms)
Bundle size diffs
|
Builds ready [894483d]
Page Load Metrics (2025 ± 227 ms)
Bundle size diffs
|
Builds ready [88c0e66]
Page Load Metrics (2067 ± 87 ms)
Bundle size diffs
|
* Note: the API returns the first 20 workflows by default. | ||
* If we wanted to get older workflows, we would need to use the 'page-token' we would get in the first response | ||
* and perform a subsequent request with the 'page-token' parameter. | ||
* This seems unnecessary as of today, as the amount of daily PRs merged to develop is not that high. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good decision. Easier to run this multiple times throughout the day, rather than support paging through more runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Just a couple of minor points of feedback, but overall this looks fantastic. Change request is just for the npx tsx
step
Co-authored-by: Mark Stacey <markjstacey@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Builds ready [e01f7ee]
Page Load Metrics (2092 ± 98 ms)
Bundle size diffs
|
Description
This PR adds a new workflow in our circle ci config, called
rerun-from-failed
which does the following:a. The workflow has only been run once (not retried, or run multiple times, no matter its final status), this is to spare some credits and avoid re-runing multiple times the same jobs, but we could change this, and allow 2 runs instead of 1 for example, if we see that it's needed.
b. The workflow is completed and has the status of
failed
c. The workflow runs in develop branch
c. The workflow was triggered by the merge queue bot. This means that we won't rerun scheduled workflows (like the nightly ones). It didn't seem necessary to re-run those, but we can remove the filter, if we want
from_failed
the workflows that have the conditions mentioned above. Note: the circle ci API does not support thererun_failed_tests
featureThis new workflow can be scheduled by circle ci UI panel, and we can choose on which frequency we want it to run. Possibly once every hour (only Mon-Friday), but that's totally customizable from the UI.
Our usage falls within the API limits, which are 51 requests per second per endpoint. In our case we will be doing:
everytime we run the re-run-workflow.
Implementation
A few words around the implementation of this setup:
process.env.API_V2_TOKEN
for authenticating the circle ci requestsrerun-from-failed
The initial idea of adding a rerun logic embedded inside the test_and_release, and re-run right after, poses some challenges and that's why making a decoupled workflow and automate that by scheduling seems to solve those better.
One issue is, how to make sure that we are not rerunning from failed forever. That might need additional logic and complexity for tracking the reruns for that specific workflow (possibly creating more artifacts and reading them) into the current workflow.
Another issue is how to ensure that the workflow has finished (no matter if failed or successful) to then apply the rerun if needed:
required
keyword, for making the rerun job the last one, that wouldn't serve us, as it would only be run if all jobs were successful (which doesn't solve our task)on_fail
to then trigger the rerun logic, but this would cancel ongoing parallel jobs, and it's not desired as we discussedI found that decoupling the rerun and relying on their API could benefit in both challenges, as well as doesn't pollute the current ci config, making it a totally independent workflow, that can be customized by the UI.
It also allow us to use more customizable rules, by accessing the state and number of runs of each workflow in a straight forward manner.
Happy to discuss further though :)
Related issues
Fixes: #25955
Manual testing steps
Screenshots/Recordings
See pipeline here: https://app.circleci.com/pipelines/github/MetaMask/metamask-extension/110689/workflows/9ac7aaee-2610-4985-952d-6bd4f747c071/jobs/4141314
It fetched 20 last workflows from develop, from those, it got it status, and rerun only on workflow which complied with all requirements (not being rerun before, and with status failed)
Pre-merge author checklist
Pre-merge reviewer checklist