Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kola testiso tests on f41+ sometimes time out #1796

Open
jlebon opened this issue Sep 16, 2024 · 3 comments
Open

kola testiso tests on f41+ sometimes time out #1796

jlebon opened this issue Sep 16, 2024 · 3 comments
Labels
pipeline failure This issue or pull request is derived from CI failures

Comments

@jlebon
Copy link
Member

jlebon commented Sep 16, 2024

We've seen this a couple of times in the Bodhi testing, but now it also showed up in next-devel:

[2024-09-16T08:12:53.904Z] FAIL: pxe-offline-install.bios (10m0.007s)
[2024-09-16T08:12:53.904Z]     timed out after 10m0s

But the system logs don't show anything weird. Almost like either QEMU was killed, or kola just lost contact with it. Pod has 9Gi of RAM and testiso tests run serially so memory limits shouldn't be a concern here.

pxe-offline-install.bios.zip

@dustymabe
Copy link
Member

FTR I did look at the code the other day to assure that if the process was killed it should print out a message to the console. I tested that today and it seems true:

Detected development build; disabling signature verification
Running test: pxe-offline-install.bios
FAIL: pxe-offline-install.bios (49.882s)
    QEMU unexpectedly exited while awaiting completion: process killed
Error: harness: test suite failed
2024-09-16T14:04:43Z cli: harness: test suite failed
failed to execute cmd-kola: exit status 1
+ rc=1
+ set +x

killed the qemu process with a kill -9.

dustymabe added a commit to dustymabe/fedora-coreos-pipeline that referenced this issue Sep 20, 2024
We're trying to get more information on the root cause for
coreos/fedora-coreos-tracker#1796
and maybe this will help us find a clue.
@dustymabe
Copy link
Member

dustymabe commented Sep 20, 2024

Saw this again today in CI for coreos/fedora-coreos-config#3171

Opened coreos/fedora-coreos-pipeline#1039 to see if we can get more information about the problem.

jlebon pushed a commit to coreos/fedora-coreos-pipeline that referenced this issue Sep 20, 2024
We're trying to get more information on the root cause for
coreos/fedora-coreos-tracker#1796
and maybe this will help us find a clue.
@marmijo marmijo added the pipeline failure This issue or pull request is derived from CI failures label Oct 7, 2024
@dustymabe
Copy link
Member

Saw this again today in bodhi tests for https://bodhi.fedoraproject.org/updates/FEDORA-2024-5a61a2fa45

Unfortunately coreos/fedora-coreos-pipeline#1039 doesn't help us here because that isn't used in those CI tests.

jlebon added a commit to jlebon/coreos-ci that referenced this issue Oct 29, 2024
This matches what we do in the pipeline.

Motivated by wanting to get to the bottom of
coreos/fedora-coreos-tracker#1796, which
happens often in the Bodhi tests.
dustymabe pushed a commit to coreos/coreos-ci that referenced this issue Oct 29, 2024
This matches what we do in the pipeline.

Motivated by wanting to get to the bottom of
coreos/fedora-coreos-tracker#1796, which
happens often in the Bodhi tests.
dustymabe added a commit to dustymabe/coreos-assembler that referenced this issue Nov 1, 2024
Otherwise only `info` level messages will go to the console
of the machine.

We were digging into a new instance of [1] and found that we
weren't really getting debug messages on the console of the
machine (which means no new clues as to why the tests are
timing out). Turns out this is why we weren't getting those
new debug messages we thought we would get.

[1] coreos/fedora-coreos-tracker#1796
dustymabe added a commit to coreos/coreos-assembler that referenced this issue Nov 1, 2024
Otherwise only `info` level messages will go to the console
of the machine.

We were digging into a new instance of [1] and found that we
weren't really getting debug messages on the console of the
machine (which means no new clues as to why the tests are
timing out). Turns out this is why we weren't getting those
new debug messages we thought we would get.

[1] coreos/fedora-coreos-tracker#1796
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pipeline failure This issue or pull request is derived from CI failures
Projects
None yet
Development

No branches or pull requests

3 participants