-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: run integration tests serially #1143
ci: run integration tests serially #1143
Conversation
Integration tests are incredibly flakey currently. Perhaps this is related to the parallel execution of the tests. Certainly the fact that the tests are randomly distributed across threads doesn't help with figuring out what works and what doesn't.
Thinking about this, I suspect the integration tests will hit the 150 minute limit and time out, since the |
Woah, it actually completed. 73 minutes! There sure is a lot of parallelism overhead if 4 workers only gets us down to around 75% of that time (55 minutes). Three failed tests:
Also of note is that there's a third test that would fail 100% of the time, but didn't fail here -- |
This is clearly an improvement! |
Having run integration tests twice on this PR shows that while this greatly reduces flakiness, we aren't quite at deterministic integration testing yet -- the two runs have two failed tests in common, but each have one additional test fail which passes in the other. I'm going to run tests again to collect additional data, and maybe create a second draft PR to parallelize exploring this.
The third run on this PR got stuck and failed due to timeout, so we've still got that to contend with
We have two very consistent failing tests, but weirdly all 3 completed runs have an extra test failure unique to them. Common failures: Unique failures: The second run on #1144 runs long and has multiple failures
From running the tests in my fork too, we have a first run
Featuring the two consistent failures and two of the unique ones. And a second run
Which looks significantly worse ... the two consistent failures, two of the unique ones, and five more ... note also the significantly longer running time https://github.com/juju/python-libjuju/actions/runs/11226829336/job/31213287003
https://github.com/juju/python-libjuju/actions/runs/11224778599/job/31212363093
https://github.com/juju/python-libjuju/actions/runs/11226829336/job/31219977035
https://github.com/juju/python-libjuju/actions/runs/11224778599/job/31219987411
|
Here are tables summarising the (so far) 15 runs of the serialised tests. I'll probably continue to edit the previous comment with output, and this comment to update the tables. commit=501cc36b7a1da0bfc329894e71e478dba900dc28
How many failing tests does each job have?
How many tests fail once, twice, etc?
|
In addition to tests still failing apparently at random, and tests sometimes failing to terminate, the integration test suite can also sometimes fail due to external causes https://github.com/james-garner-canonical/python-libjuju/actions/runs/11226791048/job/31260866663
https://github.com/juju/python-libjuju/actions/runs/11226829336/job/31260861822
https://github.com/juju/python-libjuju/actions/runs/11224778599/job/31260863660
|
Closing in favour of #1149 |
…ine-and-serialise #1149 Tests in `integration/test_model.py` seem to be flaky even when run serially. All tests in `integration/test_crossmodel.py` are currently skipped, except one which used to be skipped, and is currently flaky even when run serially. This PR: * Serialises all integration tests following #1143 * Skips two tests from `test_model.py` that seem to always fail currently, whether run in serial or in parallel, following #1145 * Moves the flaky tests noted above into a separate job, so that the job running the remaining integration tests will hopefully have a shot at succeeding As a bonus feature, this split of the tests into two runners with `-n 1` seems to be faster than the original method of running all the integration tests in a single runner with `-n auto` (which worked out to be 4 processes on github).
Description
Integration tests are incredibly flakey currently. Perhaps this is related to the parallel execution of the tests. Certainly the fact that the tests are randomly distributed across threads doesn't help with figuring out what works and what doesn't.
QA Steps
Run integration tests and see.