-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warn if podman stop timeout expires that sigkill was sent #11880
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rhatdan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
docs/source/markdown/podman.1.md
Outdated
@@ -82,7 +82,7 @@ Remote connections use local containers.conf for default. | |||
|
|||
#### **--log-level**=*level* | |||
|
|||
Log messages above specified level: debug, info, warn, error (default), fatal or panic (default: "error") | |||
Log messages above specified level: debug, info, warn, error (default), fatal or panic (default: "warn") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is pre-existing, but should it be this?
Log messages above specified level: debug, info, warn, error (default), fatal or panic (default: "warn") | |
Log messages at the specified level: debug, info, warn, error (default), fatal, or panic (default: "warn") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default is no longer error, it's Warn
couple small nits in a message, otherwise LGTM |
libpod/oci_conmon_linux.go
Outdated
@@ -441,7 +441,7 @@ func (r *ConmonOCIRuntime) StopContainer(ctr *Container, timeout uint, all bool) | |||
} | |||
|
|||
if err := waitContainerStop(ctr, time.Duration(timeout)*time.Second); err != nil { | |||
logrus.Infof("Timed out stopping container %s, resorting to SIGKILL: %v", ctr.ID(), err) | |||
logrus.Warnf("Timed out stopping container %s, resorting to SIGKILL: %v", ctr.ID(), err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not be printing the error in the WARN-level logs. Make that a debug-level IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh? The hole discussion was to raise this to a warn level, not a Info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need two logrus lines here: One to print "Timed out stopping container..." (warn level) and one to print "Error stopping container" (debug level).
Sorry I missed your ping this afternoon; I was clerking at our local election. I don't see a bats test, so I don't know what went wrong with your tests, but I'll look at it tomorrow. |
Here is the test I was trying to write.
|
Ah - that explains it. Using
|
Yikes, I must have accidently cut & Pasted that from where I stole the code. /dope slap |
test/system/050-stop.bats
Outdated
|
||
run_podman run --rm --name stopme -d $IMAGE sleep 100 | ||
cid=$output | ||
run_podman stop -t 1 stopme >/dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The >/dev/null
is unnecessary (and misleading, because it actually gets ignored); but not worth repushing for. LGTM.
test/system/050-stop.bats
Outdated
@test "podman stop -t 1 Generate warning" { | ||
|
||
run_podman run --rm --name stopme -d $IMAGE sleep 100 | ||
cid=$output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This too is unnecessary (you never use $cid
) but again, not worth a re-push
test/system/050-stop.bats
Outdated
@@ -166,4 +166,12 @@ load helpers | |||
is "$output" "137" "Exit code of killed container" | |||
} | |||
|
|||
@test "podman stop -t 1 Generate warning" { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like you need skip_if_remote
; when you repush, can you take care of the two issues below? Thanks
libpod/oci_conmon_linux.go
Outdated
@@ -441,7 +441,8 @@ func (r *ConmonOCIRuntime) StopContainer(ctr *Container, timeout uint, all bool) | |||
} | |||
|
|||
if err := waitContainerStop(ctr, time.Duration(timeout)*time.Second); err != nil { | |||
logrus.Infof("Timed out stopping container %s, resorting to SIGKILL: %v", ctr.ID(), err) | |||
logrus.Debugf("Timed out stopping container %s with %s, resorting to SIGKILL: %v", ctr.ID(), unix.SignalName(syscall.Signal(stopSignal)), err) | |||
logrus.Warnf("Failed to stopping container %s with %s, resorting to SIGKILL: %v", ctr.ID(), unix.SignalName(syscall.Signal(stopSignal)), err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Failed to stopping"??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I see this is in response to @mheon's comment:
We need two logrus lines here: One to print "Timed out stopping container..." (warn level) and one to print "Error stopping container" (debug level).
...but I don't understand the request. @mheon could you clarify, possibly with a code sample to reduce ambiguity?
test/e2e/stop_test.go
Outdated
session.WaitWithDefaultTimeout() | ||
Expect(session).Should(Exit(0)) | ||
warning := session.ErrorToString() | ||
Expect(warning).To(ContainSubstring("Timed out stopping container")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean StopSignal TERM failed to stop container
here (the Timed out
one is now just debug).
But I still don't understand @mheon's reason for requesting two messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to print the error message every time, it's largely irrelevant to the situation and will reduce clarity (I expect it's going to be rather long - "failed to stop PID %d after %d seconds" or somesuch.
What we actually want is to print the "Timed out stopping container %s" message first as a logrus.Warnf, then print "Error stopping container %s using stop signal: %v" including the error message as a Debugf so the error isn't entirely lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR now prints.
./bin/podman stop -t 2 -l
WARN[0002] StopSignal SIGTERM failed to stop container hungry_bassi in 2 seconds, resorting to SIGKILL
729d71716d7c024e1b274a6ebf7d41bf65faa43b4324d0017a72cf1375059df0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug message actually includes the error.
test/e2e/stop_test.go
Outdated
session.WaitWithDefaultTimeout() | ||
Expect(session).Should(Exit(0)) | ||
warning := session.ErrorToString() | ||
Expect(warning).To(ContainSubstring("StopSignal SIGTERM failed to stop container test4 in 1 seconds, resorting to SIGKILL")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/test4/test5/
Note: the Warning message will not come to podman-remote. It would be difficult to plumb, and not really worth the effort. Fixes: containers#11854 Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
LGTM |
/lgtm Quite a slog for something that seemed so simple! Thanks, @rhatdan! |
/hold cancel |
Note: the Warning message will not come to podman-remote.
It would be difficult to plumb, and not really worth the effort.
Fixes: #11854
Signed-off-by: Daniel J Walsh dwalsh@redhat.com
What this PR does / why we need it:
How to verify it
Which issue(s) this PR fixes:
Special notes for your reviewer: