-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman logs -f: misses final container output #10675
Comments
Here's a reproducer (copy-paste into #!/bin/bash
while :;do
cid=$(./bin/podman run -d quay.io/libpod/testimage:20210610 sh -c 'while ! test -e /stop; do sleep 0.05; done;echo DONE')
./bin/podman logs -f $cid >| /tmp/foo.log &
./bin/podman exec $cid touch /stop
./bin/podman wait $cid
grep DONE /tmp/foo.log || exit 1
./bin/podman rm -a
done What it looks like to me is a resurrection of an old bug: Note that when this script fails, Connected to @vrothberg's recent work on events and logging? [EDITED: simplified the reproducer. Failure has nothing to do with |
UPDATE: I edited the reproducer, simplifying it because signals and $ sh -x ./foo.sh - (2)
+ :
++ ./bin/podman run -d quay.io/libpod/testimage:20210610 sh -c 'while ! test -e /stop; do sleep 0.05; done;echo DONE'
+ cid=082f99f2206144e2169cca936b8ff632e684eb4bfd0c623b0016f4fdecca3fdb
+ ./bin/podman exec 082f99f2206144e2169cca936b8ff632e684eb4bfd0c623b0016f4fdecca3fdb touch /stop
+ ./bin/podman logs -f 082f99f2206144e2169cca936b8ff632e684eb4bfd0c623b0016f4fdecca3fdb
+ ./bin/podman wait 082f99f2206144e2169cca936b8ff632e684eb4bfd0c623b0016f4fdecca3fdb
ERRO[0000] <nil>
0
+ grep DONE /tmp/foo.log
DONE
+ ./bin/podman rm -a
082f99f2206144e2169cca936b8ff632e684eb4bfd0c623b0016f4fdecca3fdb
+ :
++ ./bin/podman run -d quay.io/libpod/testimage:20210610 sh -c 'while ! test -e /stop; do sleep 0.05; done;echo DONE'
+ cid=9dbddfcfa4de3b39b155db659e7b9e132525bc92b9502838966895270c70a8d4
+ ./bin/podman exec 9dbddfcfa4de3b39b155db659e7b9e132525bc92b9502838966895270c70a8d4 touch /stop
+ ./bin/podman logs -f 9dbddfcfa4de3b39b155db659e7b9e132525bc92b9502838966895270c70a8d4
ERRO[0000] <nil>
+ ./bin/podman wait 9dbddfcfa4de3b39b155db659e7b9e132525bc92b9502838966895270c70a8d4
0
+ grep DONE /tmp/foo.log
DONE
+ ./bin/podman rm -a
9dbddfcfa4de3b39b155db659e7b9e132525bc92b9502838966895270c70a8d4
+ :
++ ./bin/podman run -d quay.io/libpod/testimage:20210610 sh -c 'while ! test -e /stop; do sleep 0.05; done;echo DONE'
+ cid=52b6e82bc2aa97254faf525b8eab789d4a051970ec33417da4d77ab4b1da9a37
+ ./bin/podman exec 52b6e82bc2aa97254faf525b8eab789d4a051970ec33417da4d77ab4b1da9a37 touch /stop
+ ./bin/podman logs -f 52b6e82bc2aa97254faf525b8eab789d4a051970ec33417da4d77ab4b1da9a37
^C+ ./bin/podman wait 52b6e82bc2aa97254faf525b8eab789d4a051970ec33417da4d77ab4b1da9a37
0
+ grep DONE /tmp/foo.log
+ exit 1 |
Yeah, there is something broken in - ./bin/podman logs -f $cid >| /tmp/foo.log &
+ ./bin/podman logs -f $cid >| /tmp/foo.log 2>&1 & ...makes the ERRO message go away. Instead, it shows up in foo.log as:
And, bizarrely, it takes many more iterations for the failure to happen (about 10-20, instead of 2-5). |
@edsantiago Just to confirm you are using the file log driver? |
I'm just using whatever is default, I haven't touched anything. I assume the CI systems (which are failing, which is the motivation for filing this issue) also use default. On my system: $ podman info --format json | gron | grep -i log
json.host.eventLogger = "journald"; Is |
Yeah the default is |
$ ./bin/podman inspect -l --format '{{.HostConfig.LogConfig.Type}}'
k8s-file |
I cannot reproduce this if I switch the -cid=$(./bin/podman run -d ... sh -c 'while ! test -e /stop; do sleep 0.05; done;echo DONE')
+cid=$(./bin/podman run -d ... sh -c 'sleep 5;echo DONE') (even if I keep the |
I cannot reproduce if I change |
Followup from this morning's conversation: every so often, the reproducer will hang. I just got one of those; it was 30 seconds, and my CPU was pegged during that period: $ ./foo.sh | ts
Jun 15 12:49:23 0
Jun 15 12:49:23 DONE
Jun 15 12:49:23 e51322e2b563e2154b77a90d2d6300b15a07e7978dc22a9193d6efd96b74132a
Jun 15 12:49:23 4b0c79103a429bb544df1a70a7a9c32fe674438555504ac0425a3a56d542098d
Jun 15 12:49:24 0
Jun 15 12:49:24 DONE
Jun 15 12:49:24 6424c6c49f9f61af84695d557978ab69af4a5f7c7eb3314ca0cebcb36056c2aa
ERRO[0000] <nil>
[ this is where it hung ]
Jun 15 12:49:52 0
Jun 15 12:49:52 DONE I don't have the granularity to know exactly where it hung: it could've been |
The hang is in |
Filed #10701 for the |
I'll take a look. I recently touch the code and may have caused this regression. |
Opened #10742 to clean up the ugly error log and fix the race condition. |
Commit 84b55ee attempted to fix a race waiting for the container died event. Previously, Podman slept for duration of the polling frequence which I considerred to be a mistake. As it turns out, I was mistaken since the file logger will, in fact, NOT read until EOF and then stop logging but stop logging immediately _after_ it woke up. [NO TESTS NEEDED] as the race condition cannot be hit reliably. Fixes: containers#10675 Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Commit 84b55ee attempted to fix a race waiting for the container died event. Previously, Podman slept for duration of the polling frequence which I considerred to be a mistake. As it turns out, I was mistaken since the file logger will, in fact, NOT read until EOF and then stop logging but stop logging immediately _after_ it woke up. [NO TESTS NEEDED] as the race condition cannot be hit reliably. Fixes: containers#10675 Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Commit 84b55ee attempted to fix a race waiting for the container died event. Previously, Podman slept for duration of the polling frequence which I considerred to be a mistake. As it turns out, I was mistaken since the file logger will, in fact, NOT read until EOF and then stop logging but stop logging immediately _after_ it woke up. [NO TESTS NEEDED] as the race condition cannot be hit reliably. Fixes: containers#10675 Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
This one smells a little like #10003 but it's podman-local, not -remote.
(The above is not a real reproducer, just an overview of the problem)
sys: podman kill - test signal handling in containers
The text was updated successfully, but these errors were encountered: