-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CI: misc parallel flakes #23600
base: main
Are you sure you want to change the base?
Fix CI: misc parallel flakes #23600
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Honny1 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I think you accidentally included #23275 :-) |
02d43e8
to
3f1ad80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edsantiago PTAL and test these changes in your PR.
test/system/035-logs.bats
Outdated
@@ -104,6 +104,8 @@ function _log_test_multi() { | |||
doit $cname1 "echo a1; echo a2" | |||
doit $cname2 "echo b1; echo b2" | |||
|
|||
run_podman wait $cname1 $cname2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-f already wait for the containers to exit and is supposed to read until the end
I think #21914 is the much more likely problem with logs -f. The extra wait delay might fix that silently but I don't see this one here as proper fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that -f
was waiting for containers to exit. I will remove this change.
test/system/035-logs.bats
Outdated
@@ -328,12 +330,12 @@ function _log_test_follow_since() { | |||
|
|||
# sleep is required to make sure the podman event backend no longer sees the start event in the log | |||
# This value must be greater or equal than the value given in --since below | |||
sleep 0.2 | |||
sleep 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a proper fix, the issue can go in both directions. If we are slow here and need 1.1 s in total the log line printed in the ctr will be missed and the next line is only written after 5s so I think at the very least we should make the container print every 1s or something
4b019f7
to
0dd9179
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, seems to work and I'm OK giving this a try on main. One change request please.
test/system/035-logs.bats
Outdated
assert "$output" =~ "^$content | ||
timeout: sending signal TERM to command.*" "logs --since -f on running container works" | ||
assert "$output" =~ "$content | ||
timeout: sending signal TERM to command.*$" "logs --since -f on running container works" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding $
here is a NOP which may confuse future maintenance programmers
Also, I changed the |
0dd9179
to
7157f3b
Compare
test/system/035-logs.bats
Outdated
assert "$output" =~ "^$content | ||
timeout: sending signal TERM to command.*" "logs --since -f on running container works" | ||
assert "$output" =~ "${content} | ||
timeout: sending signal TERM to command.*\$" "logs --since -f on running container works" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I wasn't clear.
You added a dollar sign. It is a NOP, it does not do anything. So when the next programmer comes to look at this code some day, she or he will be confused by it, wondering why it exists. This will cost them time.
Please remove it, making it back like it was. Even better, remove the .*
which is also useless (that one is my fault). Make it just ...signal TERM to command"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I get it now.
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
7157f3b
to
b086ec5
Compare
@mheon I've seen this flake in my parallel testing and, just now, in regular plain old usual bats:
Is this obvious to you, or shall I file an issue and try to find a reproducer? |
Not obvious. We pulled the image successfully 3 previous times in the test by my count. Is it always that last call? |
I don't have enough data to answer an "always" question. I'll look into it. |
A friendly reminder that this PR had no activity for 30 days. |
This PR fixes parallel flakes.
|035| podman logs - --until --follow journald
The bash$SECONDS
are not accurate because the time is rounded to whole seconds. So if the t0 is 1856ms, but the $SECONDS is still 1, this inaccuracy causes the command to be at most only about 150ms late which is less variation than I observed between test runs (the time was around 3150-3650ms). At higher workloads, this delay can be larger. Measuring time inms
should solve the problem.[035] podman logs - multi k8s-file test
: issue: Long k8s-file logs are truncated on 'podman logs --follow' output of stopped container #21914[035] podman logs - --since --follow journald
I would say that when running in parallel the journald is used by multiple containers, so it will be necessary to increase the timeout time to give the container more time to write to the journald, and also perform a check, end of journald content.Fixes: #23682
Does this PR introduce a user-facing change?