-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
race condition in logs -f
with journald driver
#10323
Comments
@rhatdan FYI |
As pointed out in https://github.com/containers/podman/pull/10222/files#diff-20cc30e1cdf302ef7404e5923eada3912c68c8b8943c0a7a0a834b29236eba69R92, using the I looked at the journal and the died event is always printed after the logs are written. The problem at the moment is that we have to many things running concurrently. Having one goroutine reading the log and filtering out what's necessary until we read the died event seems one way to avoid that race. |
We should refactor the Wait code to use the logic where we wait for the
Died event - it’s already implemented for the Compat API, and should not be
that bad to pry into Libpod. We should then be able to use that in the
Follow function, as well as turning off polling for every consumer of the
existing Wait code.
…On Wed, May 12, 2021 at 12:07 Valentin Rothberg ***@***.***> wrote:
As pointed out in
https://github.com/containers/podman/pull/10222/files#diff-20cc30e1cdf302ef7404e5923eada3912c68c8b8943c0a7a0a834b29236eba69R92,
using the Follow API is racy. In order to get it done correctly, *we*
have to implement our custom follow function that forward everything from
stdout and stderr UNTIL we read on the journal that the container died
(i.e., get the died event).
I looked at the journal and the died event is always printed *after* the
logs are written. The problem at the moment is that we have to many things
running concurrently. Having one goroutine reading the log and filtering
out what's necessary until we read the died event seems one way to avoid
that race.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#10323 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3AOCCUX7XY6TJVWPQ3WT3TNKRVJANCNFSM44Y4JX6Q>
.
|
That is not enough to resolve the race unfortunately (tried that). The race
is in reading the journal. Follow() may sleep when streaming which can
cause the died event to be read before the last logs.
…On Wed 12 May 2021 at 20:29, Matthew Heon ***@***.***> wrote:
We should refactor the Wait code to use the logic where we wait for the
Died event - it’s already implemented for the Compat API, and should not be
that bad to pry into Libpod. We should then be able to use that in the
Follow function, as well as turning off polling for every consumer of the
existing Wait code.
On Wed, May 12, 2021 at 12:07 Valentin Rothberg ***@***.***>
wrote:
> As pointed out in
>
https://github.com/containers/podman/pull/10222/files#diff-20cc30e1cdf302ef7404e5923eada3912c68c8b8943c0a7a0a834b29236eba69R92
,
> using the Follow API is racy. In order to get it done correctly, *we*
> have to implement our custom follow function that forward everything from
> stdout and stderr UNTIL we read on the journal that the container died
> (i.e., get the died event).
>
> I looked at the journal and the died event is always printed *after* the
> logs are written. The problem at the moment is that we have to many
things
> running concurrently. Having one goroutine reading the log and filtering
> out what's necessary until we read the died event seems one way to avoid
> that race.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <
#10323 (comment)>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AB3AOCCUX7XY6TJVWPQ3WT3TNKRVJANCNFSM44Y4JX6Q
>
> .
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#10323 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACZDRA5OUWH54IUZZHBF3O3TNLCHJANCNFSM44Y4JX6Q>
.
|
Fix a race in journald driver. Following the logs implies streaming until the container is dead. Streaming happened in one goroutine, waiting for the container to exit/die and signaling that event happened in another goroutine. The nature of having two goroutines running simultaneously is pretty much the core of the race condition. When the streaming goroutines received the signal that the container has exitted, the routine may not have read and written all of the container's logs. Fix this race by reading both, the logs and the events, of the container and stop streaming when the died/exited event has been read. The died event is guaranteed to be after all logs in the journal which guarantees not only consistencty but also a deterministic behavior. Note that the journald log driver now requires the journald event backend to be set. Fixes: containers#10323 Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Reproducer from #10222 (comment):
I know where the error is and will drop a
// FIXME
in libpod/container_logs_linux.go (and a link to the code once #10222 is merged).The text was updated successfully, but these errors were encountered: