chore: rerun workflow from failed #28143

seaona · 2024-10-29T10:25:56Z

Description

This PR adds a new workflow in our circle ci config, called rerun-from-failed which does the following:

It gets the last 20 circle ci workflows from develop branch
It assesses if any of the workflows needs to be rerun. The conditions for a rerun are:
a. The workflow has only been run once (not retried, or run multiple times, no matter its final status), this is to spare some credits and avoid re-runing multiple times the same jobs, but we could change this, and allow 2 runs instead of 1 for example, if we see that it's needed.
b. The workflow is completed and has the status of failed
c. The workflow runs in develop branch
c. The workflow was triggered by the merge queue bot. This means that we won't rerun scheduled workflows (like the nightly ones). It didn't seem necessary to re-run those, but we can remove the filter, if we want
It reruns from_failed the workflows that have the conditions mentioned above. Note: the circle ci API does not support the rerun_failed_tests feature

This new workflow can be scheduled by circle ci UI panel, and we can choose on which frequency we want it to run. Possibly once every hour (only Mon-Friday), but that's totally customizable from the UI.
Our usage falls within the API limits, which are 51 requests per second per endpoint. In our case we will be doing:

1 GET to get all workflows
20 GET to get each workflow status
X POST (a max of 20) to rerun the corresponding failed jobs

everytime we run the re-run-workflow.

Implementation

A few words around the implementation of this setup:

This setup uses the API token set in process.env.API_V2_TOKEN for authenticating the circle ci requests
This new workflow can be scheduled to be run once a day, twice etc.. depending on our needs, also from the circle ci ui, with the name rerun-from-failed
This new workflow can be enabled and disabled from the circle ci ui, just by removing the scheduled job

The initial idea of adding a rerun logic embedded inside the test_and_release, and re-run right after, poses some challenges and that's why making a decoupled workflow and automate that by scheduling seems to solve those better.
One issue is, how to make sure that we are not rerunning from failed forever. That might need additional logic and complexity for tracking the reruns for that specific workflow (possibly creating more artifacts and reading them) into the current workflow.

Another issue is how to ensure that the workflow has finished (no matter if failed or successful) to then apply the rerun if needed:

if we used the required keyword, for making the rerun job the last one, that wouldn't serve us, as it would only be run if all jobs were successful (which doesn't solve our task)
we could run a job with a timer with ~30mins, so this would make sure that the workflow has finished (no matter, if failed or not) and then could rerun from failed calling the API. That would add additional resources to circle ci though
we could add a trigger if job fails on_fail to then trigger the rerun logic, but this would cancel ongoing parallel jobs, and it's not desired as we discussed
we could make that each job writes into an artifact their result, but the challenge again comes on when to trigger the read action to that file

I found that decoupling the rerun and relying on their API could benefit in both challenges, as well as doesn't pollute the current ci config, making it a totally independent workflow, that can be customized by the UI.
It also allow us to use more customizable rules, by accessing the state and number of runs of each workflow in a straight forward manner.

Happy to discuss further though :)

Related issues

Fixes: #25955

Manual testing steps

Check successful ci run for this new job (which in this example, it rerun 1 workflow from failed, successfully): https://app.circleci.com/pipelines/github/MetaMask/metamask-extension/110689/workflows/9ac7aaee-2610-4985-952d-6bd4f747c071/jobs/4141314
Create a branch of out this branch, and remove the filters in the config.yml file, so the new workflow is run. You can then check the result in circle ci

Screenshots/Recordings

See pipeline here: https://app.circleci.com/pipelines/github/MetaMask/metamask-extension/110689/workflows/9ac7aaee-2610-4985-952d-6bd4f747c071/jobs/4141314
It fetched 20 last workflows from develop, from those, it got it status, and rerun only on workflow which complied with all requirements (not being rerun before, and with status failed)

Pre-merge author checklist

I've followed MetaMask Contributor Docs and MetaMask Extension Coding Standards.
I've completed the PR template to the best of my ability
I’ve included tests if applicable
I’ve documented my code using JSDoc format if applicable
I’ve applied the right labels on the PR (see labeling guidelines). Not required for external contributors.

Pre-merge reviewer checklist

I've manually tested the PR (e.g. pull and build branch, run the app, test code being changed).
I confirm that this PR addresses all acceptance criteria described in the ticket it closes and includes the necessary testing evidence such as recordings and or screenshots.

github-actions · 2024-10-29T10:26:08Z

CLA Signature Action: All authors have signed the CLA. You may need to manually re-run the blocking PR check if it doesn't pass in a few minutes.

dabossdana · 2024-11-10T20:45:56Z

.circleci/scripts/rerun-ci-workflow-from-failed.ts

+ * @returns {Promise<any[]>} A promise that resolves to an array of workflow items.
+ * @throws Will throw an error if the CircleCI token is not defined or if the HTTP request fails.
+ */
+async function getCircleCiWorkflowsByBranch(branch: string): Promise<any[]> {


seaona · 2024-11-11T15:40:32Z

.circleci/config.yml

+  rerun-from-failed:
+    when:
+      condition:
+        equal: ["<< pipeline.schedule.name >>", "rerun-from-failed"]


this workflow will only run in develop, and if it's triggered with this exact name

metamaskbot · 2024-11-11T16:37:43Z

Builds ready [6e99343]

builds: chrome, firefox
builds (beta): chrome
builds (flask): chrome, firefox
builds (MMI): chrome, firefox
builds (test): chrome, firefox
builds (test-flask): chrome, firefox
build viz: Build System
mv3: Background Module Init Stats
mv3: UI Init Stats
mv3: Module Load Stats
mv3: Bundle Size Stats
mv2: E2e Actions Stats
code coverage: Report
storybook: Storybook
typescript migration: Dashboard
all artifacts
bundle viz:
- background: 0, 1, 2, 3, 4, 5, 6, 7
- common: 0, 1, 10, 11, 12, 2, 3, 4, 5, 6, 7, 8, 9
- content-script: 0
- offscreen: 0
- ui: 0, 1, 10, 11, 12, 2, 3, 4, 5, 6, 7, 8, 9

Page Load Metrics (2148 ± 129 ms)

Platform	Page	Metric	Min (ms)	Max (ms)	Average (ms)	StandardDeviation (ms)	MarginOfError (ms)
Chrome	Home	firstPaint	352	2574	1951	568	273
		domContentLoaded	1766	2516	2117	259	124
		load	1784	2579	2148	269	129
		domInteractive	28	181	63	37	18
		backgroundConnect	8	85	33	25	12
		firstReactRender	50	387	146	70	34
		getState	4	80	23	23	11
		initialActions	0	1	0	0	0
		loadScripts	1255	1901	1549	198	95
		setupStore	6	60	16	16	8
		uiStartup	1982	2999	2453	327	157

Bundle size diffs

background: 0 Bytes (0.00%)
ui: 0 Bytes (0.00%)
common: 0 Bytes (0.00%)

metamaskbot · 2024-11-12T07:54:38Z

Builds ready [894483d]

builds: chrome, firefox
builds (beta): chrome
builds (flask): chrome, firefox
builds (MMI): chrome, firefox
builds (test): chrome, firefox
builds (test-flask): chrome, firefox
build viz: Build System
mv3: Background Module Init Stats
mv3: UI Init Stats
mv3: Module Load Stats
mv3: Bundle Size Stats
mv2: E2e Actions Stats
code coverage: Report
storybook: Storybook
typescript migration: Dashboard
all artifacts
bundle viz:
- background: 0, 1, 2, 3, 4, 5, 6, 7
- common: 0, 1, 10, 11, 12, 2, 3, 4, 5, 6, 7, 8, 9
- content-script: 0
- offscreen: 0
- ui: 0, 1, 10, 11, 12, 2, 3, 4, 5, 6, 7, 8, 9

Page Load Metrics (2025 ± 227 ms)

Platform	Page	Metric	Min (ms)	Max (ms)	Average (ms)	StandardDeviation (ms)	MarginOfError (ms)
Chrome	Home	firstPaint	379	4042	1964	642	308
		domContentLoaded	1698	3727	1990	476	228
		load	1724	3735	2025	472	227
		domInteractive	19	84	45	17	8
		backgroundConnect	10	161	38	40	19
		firstReactRender	55	280	108	48	23
		getState	5	54	14	15	7
		initialActions	0	1	0	0	0
		loadScripts	1209	2717	1461	354	170
		setupStore	6	63	25	21	10
		uiStartup	1901	4094	2285	518	249

Bundle size diffs

background: 0 Bytes (0.00%)
ui: 0 Bytes (0.00%)
common: 0 Bytes (0.00%)

.circleci/scripts/rerun-ci-workflow-from-failed.ts

metamaskbot · 2024-11-19T10:20:51Z

Builds ready [88c0e66]

builds: chrome, firefox
builds (beta): chrome
builds (flask): chrome, firefox
builds (MMI): chrome, firefox
builds (test): chrome, firefox
builds (test-flask): chrome, firefox
build viz: Build System
mv3: Background Module Init Stats
mv3: UI Init Stats
mv3: Module Load Stats
mv3: Bundle Size Stats
mv2: E2e Actions Stats
code coverage: Report
storybook: Storybook
typescript migration: Dashboard
all artifacts
bundle viz:
- background: 0, 1, 2, 3, 4, 5, 6, 7
- common: 0, 1, 10, 11, 12, 2, 3, 4, 5, 6, 7, 8, 9
- content-script: 0
- offscreen: 0
- ui: 0, 1, 10, 11, 12, 2, 3, 4, 5, 6, 7, 8, 9

Page Load Metrics (2067 ± 87 ms)

Platform	Page	Metric	Min (ms)	Max (ms)	Average (ms)	StandardDeviation (ms)	MarginOfError (ms)
Chrome	Home	firstPaint	1848	2512	2066	177	85
		domContentLoaded	1812	2465	2030	167	80
		load	1846	2582	2067	182	87
		domInteractive	29	264	53	50	24
		backgroundConnect	12	118	42	28	14
		firstReactRender	51	291	111	88	42
		getState	45	205	112	45	22
		initialActions	0	1	0	0	0
		loadScripts	1344	1887	1517	142	68
		setupStore	6	47	11	9	4
		uiStartup	2073	2978	2447	286	137

Bundle size diffs

background: 0 Bytes (0.00%)
ui: 0 Bytes (0.00%)
common: 0 Bytes (0.00%)

.circleci/config.yml

Gudahtt · 2024-11-20T15:17:40Z

.circleci/scripts/rerun-ci-workflow-from-failed.ts

+ * Note: the API returns the first 20 workflows by default.
+ * If we wanted to get older workflows, we would need to use the 'page-token' we would get in the first response
+ * and perform a subsequent request with the 'page-token' parameter.
+ * This seems unnecessary as of today, as the amount of daily PRs merged to develop is not that high.


Good decision. Easier to run this multiple times throughout the day, rather than support paging through more runs.

.circleci/scripts/rerun-ci-workflow-from-failed.ts

Gudahtt

Great work! Just a couple of minor points of feedback, but overall this looks fantastic. Change request is just for the npx tsx step

Co-authored-by: Mark Stacey <markjstacey@gmail.com>

Gudahtt

LGTM!

metamaskbot · 2024-11-20T19:51:50Z

Builds ready [e01f7ee]

builds: chrome, firefox
builds (beta): chrome
builds (flask): chrome, firefox
builds (MMI): chrome, firefox
builds (test): chrome, firefox
builds (test-flask): chrome, firefox
build viz: Build System
mv3: Background Module Init Stats
mv3: UI Init Stats
mv3: Module Load Stats
mv3: Bundle Size Stats
mv2: E2e Actions Stats
code coverage: Report
storybook: Storybook
typescript migration: Dashboard
all artifacts
bundle viz:
- background: 0, 1, 2, 3, 4, 5, 6, 7
- common: 0, 1, 10, 11, 12, 2, 3, 4, 5, 6, 7, 8, 9
- content-script: 0
- offscreen: 0
- ui: 0, 1, 10, 11, 12, 2, 3, 4, 5, 6, 7, 8, 9

Page Load Metrics (2092 ± 98 ms)

Platform	Page	Metric	Min (ms)	Max (ms)	Average (ms)	StandardDeviation (ms)	MarginOfError (ms)
Chrome	Home	firstPaint	1834	2711	2096	208	100
		domContentLoaded	1816	2691	2057	198	95
		load	1832	2704	2092	205	98
		domInteractive	29	67	47	13	6
		backgroundConnect	9	104	39	26	12
		firstReactRender	61	119	80	15	7
		getState	63	114	92	13	6
		initialActions	0	1	0	0	0
		loadScripts	1339	2174	1541	187	90
		setupStore	6	20	9	3	1
		uiStartup	2124	3034	2389	235	113

Bundle size diffs

background: 0 Bytes (0.00%)
ui: 0 Bytes (0.00%)
common: 0 Bytes (0.00%)

rerun from failed

936457d

github-actions bot added the team-extension-platform label Oct 29, 2024

seaona added 9 commits October 29, 2024 11:26

failure on purpose to test

789ea52

remove extra lines

8ac2c9a

rerun script always

78de128

required completed or canceled

7068048

run always

f911c7f

rerun using trigger

8098eea

typo

26da987

add description

02cde93

description

29a895e

dabossdana approved these changes Nov 10, 2024

View reviewed changes

seaona added 4 commits November 11, 2024 16:32

rerun from failed

f801957

remov extre deps

2cc00ae

config fix

4c0ef28

edit name

08d53b1

seaona commented Nov 11, 2024

View reviewed changes

Merge branch 'develop' into rerun-workflow-failed

6e99343

testing

894483d

seaona added 7 commits November 12, 2024 09:00

add logs

6aad247

parse response

1fac883

add more logs

65a2bc2

api-v2

81baf57

add logging on each step

bc5d452

final logs

6f1a1d0

finished testing, adding filter back

8206ed0

seaona marked this pull request as ready for review November 12, 2024 08:28

DDDDDanica reviewed Nov 13, 2024

View reviewed changes

.circleci/scripts/rerun-ci-workflow-from-failed.ts Outdated Show resolved Hide resolved

seaona and others added 3 commits November 15, 2024 17:16

remove token for get requests, and remove filter for tesst

48529ba

add filter back

39017df

Merge branch 'develop' into rerun-workflow-failed

88c0e66

DDDDDanica previously approved these changes Nov 19, 2024

View reviewed changes

hjetpoluru self-requested a review November 19, 2024 20:50

hjetpoluru previously approved these changes Nov 19, 2024

View reviewed changes

seaona enabled auto-merge November 20, 2024 09:36

seaona disabled auto-merge November 20, 2024 09:38

Gudahtt reviewed Nov 20, 2024

View reviewed changes

.circleci/config.yml Outdated Show resolved Hide resolved

Gudahtt reviewed Nov 20, 2024

View reviewed changes

.circleci/scripts/rerun-ci-workflow-from-failed.ts Outdated Show resolved Hide resolved

Gudahtt requested changes Nov 20, 2024

View reviewed changes

addressed comment: uncaught promise error

e95eec3

Co-authored-by: Mark Stacey <markjstacey@gmail.com>

seaona dismissed stale reviews from hjetpoluru and DDDDDanica via e95eec3 November 20, 2024 17:59

address comment for tsx use

a4dd676

seaona requested a review from a team as a code owner November 20, 2024 18:37

seaona added 2 commits November 20, 2024 19:38

remove filters to test last changes

eb7ef3a

tested successfully - add filters back

e01f7ee

Gudahtt approved these changes Nov 20, 2024

View reviewed changes

DDDDDanica approved these changes Nov 21, 2024

View reviewed changes

seaona added this pull request to the merge queue Nov 22, 2024

Merged via the queue into develop with commit b6613df Nov 22, 2024
77 checks passed

seaona deleted the rerun-workflow-failed branch November 22, 2024 07:05

github-actions bot locked and limited conversation to collaborators Nov 22, 2024

metamaskbot added the release-12.9.0 Issue or pull request that will be included in release 12.9.0 label Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: rerun workflow from failed #28143

chore: rerun workflow from failed #28143

seaona commented Oct 29, 2024 •

edited

Loading

github-actions bot commented Oct 29, 2024

dabossdana Nov 10, 2024

seaona Nov 11, 2024

metamaskbot commented Nov 11, 2024

metamaskbot commented Nov 12, 2024

metamaskbot commented Nov 19, 2024

Gudahtt Nov 20, 2024

Gudahtt left a comment •

edited

Loading

Gudahtt left a comment

metamaskbot commented Nov 20, 2024

chore: rerun workflow from failed #28143

chore: rerun workflow from failed #28143

Conversation

seaona commented Oct 29, 2024 • edited Loading

Description

Implementation

Related issues

Manual testing steps

Screenshots/Recordings

Pre-merge author checklist

Pre-merge reviewer checklist

github-actions bot commented Oct 29, 2024

dabossdana Nov 10, 2024

Choose a reason for hiding this comment

seaona Nov 11, 2024

Choose a reason for hiding this comment

metamaskbot commented Nov 11, 2024

metamaskbot commented Nov 12, 2024

metamaskbot commented Nov 19, 2024

Gudahtt Nov 20, 2024

Choose a reason for hiding this comment

Gudahtt left a comment • edited Loading

Choose a reason for hiding this comment

Gudahtt left a comment

Choose a reason for hiding this comment

metamaskbot commented Nov 20, 2024

seaona commented Oct 29, 2024 •

edited

Loading

Gudahtt left a comment •

edited

Loading