-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: fix CI to run consistent tests 100% of the time #33089
Conversation
Thanks for contributing to openpilot! In order for us to review your PR as quickly as possible, check the following:
|
wasn't the issue I initially thought it was. |
@@ -34,6 +34,7 @@ def test_missing_translation_files(self): | |||
assert os.path.exists(os.path.join(TRANSLATIONS_DIR, f"{self.file}.ts")), \ | |||
f"{self.name} has no XML translation file, run selfdrive/ui/update_translations.py" | |||
|
|||
@pytest.mark.serial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't translations_dir
work to solve this? Is TemporaryDirectory
somehow returning the same directory for two workers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was removed since I used -n0 for the test_translation.py that make it serial. Serial marked tests are configured to run with single processor.
Current attempt to serialize tests that seems to fail tests running in parallel with pytest-xdist. pytest-dev/pytest#3216 (comment) Two runs with 50 unit_tests runs each (100% pass) https://github.com/commaai/openpilot/actions/runs/10120588057 |
findings:
|
baca33a
to
c97df97
Compare
Are the errors now specific to a particular test? If so, that's worth including here.
It still seems like we don't know what's going on here. What if it's this #27512? |
Included in comments.
A lot of people experience these random worker crashes using pytest-xdist, and there haven't been to have good repro or clear solution (comment)
Did a quick run |
I looked into this briefly, and seems like there's multiple things going on and the issues are interacting with each other. When that happens, it's important to carefully fix each one.
This seems quite reproducible, so we should be able to make progress on it. I pushed the shm bump here: 76fd5b0 encoderd crash from https://github.com/commaai/openpilot/actions/runs/10174071612/job/28139196053:
|
Recently, I only see that mostly unit_test section is failing, and from looking at the failing logs, it seems that there is this.Fatal Python error: Bus error
that causes crashes in pytest-xdist workersThis error is most likly from workers trying to access a unavailable memory region, so I just increase the shm-size of docker from 1GB -> 2GB.From running 2 runs of this new size (50 runs of unit_test each), I don't see that error happening anymore. `run 1: https://github.com/commaai/openpilot/actions/runs/10115487281/job/27976375147run 2: https://github.com/commaai/openpilot/actions/runs/10115625540/job/27976846602The failing tests of these two runs are from a known flaky test in,test_loggerd.py
and I guess it has some weird association with matrix that causing it to fail (didn't investigate further)but should be fine if we have one run ofunit_test
I did another run to make sure that if the shm-size is 1GB then we would see the bus error in unittest againhttps://github.com/commaai/openpilot/actions/runs/10115830453/job/27977440198If the issues still continues after this is merged, I'll happy to finish it.This is probably something due to pytest-xdist running tests that mess around with threading.
resolves #32939