Fix CI: misc parallel flakes #23600

Honny1 · 2024-08-13T11:44:49Z

This PR fixes parallel flakes.

|035| podman logs - --until --follow journald The bash $SECONDS are not accurate because the time is rounded to whole seconds. So if the t0 is 1856ms, but the $SECONDS is still 1, this inaccuracy causes the command to be at most only about 150ms late which is less variation than I observed between test runs (the time was around 3150-3650ms). At higher workloads, this delay can be larger. Measuring time in ms should solve the problem.
[035] podman logs - multi k8s-file test: issue: Long k8s-file logs are truncated on 'podman logs --follow' output of stopped container #21914
[035] podman logs - --since --follow journald I would say that when running in parallel the journald is used by multiple containers, so it will be necessary to increase the timeout time to give the container more time to write to the journald, and also perform a check, end of journald content.

Does this PR introduce a user-facing change?

None

openshift-ci · 2024-08-13T11:45:12Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Honny1
Once this PR has been reviewed and has the lgtm label, please assign giuseppe for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

edsantiago · 2024-08-13T12:09:36Z

I think you accidentally included #23275 :-)

Luap99

@edsantiago PTAL and test these changes in your PR.

Luap99 · 2024-08-16T15:27:10Z

test/system/035-logs.bats

@@ -104,6 +104,8 @@ function _log_test_multi() {
 doit $cname1 "echo a1; echo a2"
 doit $cname2 "echo b1; echo b2"

+ run_podman wait $cname1 $cname2


-f already wait for the containers to exit and is supposed to read until the end
I think #21914 is the much more likely problem with logs -f. The extra wait delay might fix that silently but I don't see this one here as proper fix.

I didn't know that -f was waiting for containers to exit. I will remove this change.

Luap99 · 2024-08-16T15:29:26Z

test/system/035-logs.bats

@@ -328,12 +330,12 @@ function _log_test_follow_since() {

 # sleep is required to make sure the podman event backend no longer sees the start event in the log
 # This value must be greater or equal than the value given in --since below
- sleep 0.2
+ sleep 0.5


I don't think this is a proper fix, the issue can go in both directions. If we are slow here and need 1.1 s in total the log line printed in the ctr will be missed and the next line is only written after 5s so I think at the very least we should make the container print every 1s or something

edsantiago

LGTM, seems to work and I'm OK giving this a try on main. One change request please.

edsantiago · 2024-08-20T11:44:52Z

test/system/035-logs.bats

- assert "$output" =~ "^$content
-timeout: sending signal TERM to command.*" "logs --since -f on running container works"
+ assert "$output" =~ "$content
+timeout: sending signal TERM to command.*$" "logs --since -f on running container works"


Adding $ here is a NOP which may confuse future maintenance programmers

edsantiago · 2024-08-20T13:59:29Z

Also, I changed the Fixes line.

edsantiago · 2024-08-21T10:49:40Z

test/system/035-logs.bats

- assert "$output" =~ "^$content
-timeout: sending signal TERM to command.*" "logs --since -f on running container works"
+ assert "$output" =~ "${content}
+timeout: sending signal TERM to command.*\$" "logs --since -f on running container works"


Sorry, I wasn't clear.

You added a dollar sign. It is a NOP, it does not do anything. So when the next programmer comes to look at this code some day, she or he will be confused by it, wondering why it exists. This will cost them time.

Please remove it, making it back like it was. Even better, remove the .* which is also useless (that one is my fault). Make it just ...signal TERM to command"

Thanks, I get it now.

Signed-off-by: Jan Rodák <hony.com@seznam.cz>

edsantiago · 2024-08-21T11:57:20Z

@mheon I've seen this flake in my parallel testing and, just now, in regular plain old usual bats:

[+0089s] not ok 60 [030] podman run --rmi in 2526ms
...
<+009ms> # # podman-remote --url unix:///tmp/CI_0fS1/podman_tmp_nSJR run --rmi quay.io/libpod/testimage:00000004 /bin/true
<+498ms> # time="2024-08-21T11:33:50Z" level=warning msg="quay.io/libpod/testimage:00000004: image not known"
         # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
         # #| FAIL: Command succeeded, but issued unexpected warnings
         # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Is this obvious to you, or shall I file an issue and try to find a reproducer?

mheon · 2024-08-21T12:53:53Z

Not obvious. We pulled the image successfully 3 previous times in the test by my count. Is it always that last call?

edsantiago · 2024-08-21T12:59:02Z

I don't have enough data to answer an "always" question. I'll look into it.

github-actions · 2024-09-21T00:08:14Z

A friendly reminder that this PR had no activity for 30 days.

openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None labels Aug 13, 2024

Honny1 force-pushed the misc-parallel-flakes branch 6 times, most recently from 02d43e8 to 3f1ad80 Compare August 15, 2024 11:46

openshift-ci bot added release-note-none and removed do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None labels Aug 15, 2024

Honny1 changed the title ~~Flake hunting of CI: misc parallel flakes~~ Fix CI: misc parallel flakes Aug 15, 2024

Honny1 marked this pull request as ready for review August 15, 2024 15:30

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 15, 2024

Luap99 reviewed Aug 16, 2024

View reviewed changes

Honny1 force-pushed the misc-parallel-flakes branch 2 times, most recently from 4b019f7 to 0dd9179 Compare August 19, 2024 08:41

edsantiago requested changes Aug 20, 2024

View reviewed changes

Honny1 force-pushed the misc-parallel-flakes branch from 0dd9179 to 7157f3b Compare August 21, 2024 09:52

edsantiago reviewed Aug 21, 2024

View reviewed changes

Honny1 added 2 commits August 21, 2024 13:18

Measure time execution ms insted bash $SECONDS

8fbb955

Signed-off-by: Jan Rodák <hony.com@seznam.cz>

Extend time and check log output from the end

b086ec5

Signed-off-by: Jan Rodák <hony.com@seznam.cz>

Honny1 force-pushed the misc-parallel-flakes branch from 7157f3b to b086ec5 Compare August 21, 2024 11:18

edsantiago mentioned this pull request Aug 22, 2024

run --rmi : image not known #23719

Closed

github-actions bot added the stale-pr label Sep 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CI: misc parallel flakes #23600

Fix CI: misc parallel flakes #23600

Honny1 commented Aug 13, 2024 •

edited by edsantiago

Loading

openshift-ci bot commented Aug 13, 2024

edsantiago commented Aug 13, 2024

Luap99 left a comment

Luap99 Aug 16, 2024

Honny1 Aug 19, 2024

Luap99 Aug 16, 2024

edsantiago left a comment

edsantiago Aug 20, 2024

edsantiago commented Aug 20, 2024

edsantiago Aug 21, 2024

Honny1 Aug 21, 2024

edsantiago commented Aug 21, 2024

mheon commented Aug 21, 2024

edsantiago commented Aug 21, 2024

github-actions bot commented Sep 21, 2024

Fix CI: misc parallel flakes #23600

Are you sure you want to change the base?

Fix CI: misc parallel flakes #23600

Conversation

Honny1 commented Aug 13, 2024 • edited by edsantiago Loading

Does this PR introduce a user-facing change?

openshift-ci bot commented Aug 13, 2024

edsantiago commented Aug 13, 2024

Luap99 left a comment

Choose a reason for hiding this comment

Luap99 Aug 16, 2024

Choose a reason for hiding this comment

Honny1 Aug 19, 2024

Choose a reason for hiding this comment

Luap99 Aug 16, 2024

Choose a reason for hiding this comment

edsantiago left a comment

Choose a reason for hiding this comment

edsantiago Aug 20, 2024

Choose a reason for hiding this comment

edsantiago commented Aug 20, 2024

edsantiago Aug 21, 2024

Choose a reason for hiding this comment

Honny1 Aug 21, 2024

Choose a reason for hiding this comment

edsantiago commented Aug 21, 2024

mheon commented Aug 21, 2024

edsantiago commented Aug 21, 2024

github-actions bot commented Sep 21, 2024

Honny1 commented Aug 13, 2024 •

edited by edsantiago

Loading