Commit 122f221
authored
[CI] Limit AMD E2E tests to 1 thread (#17422)
There's recurring instabilities on the AMD pre-commit runs, everytime
they fail two things will happen:
* 1 or more test will fail with a memory access fault
* 1 or more test will hang and end up timing out
This seemingly only happens when running the pre-built E2E tests in
parallel.
It is quite difficult to debug and could potentially be an issue in the
AMD drivers.
So as a workaround until we can figure out what's going on, this patch
switches the AMD E2E prebuit tests to run in a single thread.
This is obviously slower than running the tests in parallel, but because
the instability causes hangs that end up hitting the 10 minutes timeout,
a one thread run is faster than a failing multi-thread run. So we get
consistent runs that are slower but may actually end up going through
the job queue faster as they won't be hitting timeouts so often.
On a local setup using the same AMD GPU as the CI:
* Successful multi-thread run: ~73s
* Successful single-thread run: ~255s
* Failed multi-thread run: 600s+1 parent 21ccf55 commit 122f221
1 file changed
+1
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
| 76 | + | |
76 | 77 | | |
77 | 78 | | |
78 | 79 | | |
| |||
0 commit comments