-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman image tree: restore previous behavior #10222
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vrothberg The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@containers/podman-maintainers PTAL @edsantiago this restores the previous order of layers. |
LGTM |
Looks like the journald change broke e2e tests, having a look now. |
Currently failing. Requires containers/common#540. |
40f3a24
to
9821f32
Compare
This turned out to be more work. Fixed bugs in the journald driver and updated tests. @edsantiago, please take a look at the changes. |
b276ec1
to
dbd57f2
Compare
@rhatdan, given the changes in this PR to get journald to pass CI, we should make sure to highlight that in the changelog. |
@containers/podman-maintainers PTAL |
Commit 98ff7e1 changed the default logging driver k8s-file to journald. The only consumer of the log-driver is Podman which I think still needs some more time to stabilize. Vendoring containers/common into Podman has revealed quite some warts (see containers/podman/pull/10222) which reduced my confidence level. To resolve the chicken-egg-problem of maturing the journald driver, I want to only partially revert commit 98ff7e1. The built-in default remains k8s-file while the containers.conf sets it to journald. The intention behind is to make sure that running systems are not impacted but we can change Fedora to journald to increase coverage. Once the confidence level is back to normal, we can change the default to journald. Latest before RHEL9. Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
This is really bad. The fact that this PR is mucking with logging makes me suspect that there's something broken in logging. |
And no, bumping the timeout to 15 seconds is not a solution |
Still trying to get a feeling for it. Increasing the timeout worked for most cases but not rootless. I don't yet see a relation to logging in this specific case of exec. |
I'd appreciate help. I wanted to fix image-tree and ended up fixing things that are very unrelated but merged untested in c/common 🙈 |
Something is broken in logging. The timeout has nothing to do with anything; this should trigger in less than a second. If it doesn't, you can bump the timeout to an hour, it will not change anything, the test will still fail. Maybe it's the change to journald logging? Maybe journald logging doesn't work? Maybe you can try: --- a/test/system/130-kill.bats
+++ b/test/system/130-kill.bats
@@ -8,7 +8,7 @@ load helpers
@test "podman kill - test signal handling in containers" {
# Start a container that will handle all signals by emitting 'got: N'
local -a signals=(1 2 3 4 5 6 8 10 12 13 14 15 16 20 21 22 23 24 25 26 64)
- run_podman run -d $IMAGE sh -c \
+ run_podman run --log-driver=k8s-file -d $IMAGE sh -c \
"for i in ${signals[*]}; do trap \"echo got: \$i\" \$i; done;
echo READY;
while ! test -e /stop; do sleep 0.05; done; If that fixes it, then the problem is with the journald driver, and I kind of think it is super important to find and fix that bug. |
Thanks, Ed!
Yes. I did that in the last push.
Agreed. I find it curious though that it started flaking today. The only thing that changed since yesterday is the diff you proposed. |
Basically, what seems to be happening is that UPDATE: I have a reproducer. $ while :;do ./bin/podman run --log-driver=journald -d --name foo quay.io/libpod/testimage:20210427 sh -c 'echo hi;sleep 2;echo bye';./bin/podman logs -f foo;./bin/podman rm foo;done
caf4dce7ce3ac5d08da339a8c0ecd4ec97df498cd472fbff61182352e71cf831
hi <---- there is no subsequent bye!
caf4dce7ce3ac5d08da339a8c0ecd4ec97df498cd472fbff61182352e71cf831
08f58203f738027300d651fed861f1673eb8a1a953b869c788d905664b709938
hi
bye |
Thank you so much, @edsantiago! That gives me something to chew on. |
The initial version of libimage changed the order of layers which has now been restored to remain backwards compatible. Further changes: * Fix a bug in the journald logging which requires to strip trailing new lines from the message. The system tests did not pass due to empty new lines. Triggered by changing the default logger to journald in containers/common. * Fix another bug in the journald logging which embedded the container ID inside the message rather than the specifid field. That surfaced in a preceeding whitespace of each log line which broke the system tests. * Alter the system tests to make sure that the k8s-file and the journald logging drivers are executed. * A number of e2e tests have been changed to force the k8s-file driver to make them pass when running inside a root container. * Increase the timeout in a kill test which seems to take longer now. Reasons are unknown. Tests passed earlier and no signal-related changes happend. It may be CI VM flake since some system tests but other flaked. Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
@edsantiago, I had a look and analyzed it. The race condition is nasty and not easy to fix. I did what I could in the given time but opened #10323 to track it and dropped a FIXME in the code (see https://github.com/containers/podman/pull/10222/files#diff-20cc30e1cdf302ef7404e5923eada3912c68c8b8943c0a7a0a834b29236eba69R92). FWIW, I opened containers/common#546 earlier today to revert the journald defaulting in c/common. |
@rhatdan FYI |
Reluctant LGTM. This journald thing is making me reeeeeeeeally uncomfortable. |
Thanks! Yes, I am uncomfortable as well. It’ll get reverted in common soon.
…On Wed 12 May 2021 at 18:51, Ed Santiago ***@***.***> wrote:
Reluctant LGTM. This journald thing is making me reeeeeeeeally
uncomfortable.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#10222 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACZDRA3Q5PRDCAMWP6LGR23TNKWZJANCNFSM44FD5XBQ>
.
|
Tests green. @containers/podman-maintainers PTAL |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rhatdan, vrothberg The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The initial version of libimage changed the order of layers which has
now been restored to remain backwards compatible.
Further changes:
Fix a bug in the journald logging which requires to strip trailing
new lines from the message. The system tests did not pass due to
empty new lines. Triggered by changing the default logger to
journald in containers/common.
Fix another bug in the journald logging which embedded the container
ID inside the message rather than the specifid field. That surfaced
in a preceeding whitespace of each log line which broke the system
tests.
Alter the system tests to make sure that the k8s-file and the
journald logging drivers are executed.
A number of e2e tests have been changed to force the k8s-file driver
to make them pass when running inside a root container.
Increase the timeout in a kill test which seems to take longer now.
Reasons are unknown. Tests passed earlier and no signal-related
changes happend. It may be CI VM flake since some system tests but
other flaked.
Signed-off-by: Valentin Rothberg rothberg@redhat.com