Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container sigkill after stop timeout not logged as error #11854

Closed
pboguslawski opened this issue Oct 4, 2021 · 11 comments · Fixed by #11880
Closed

Container sigkill after stop timeout not logged as error #11854

pboguslawski opened this issue Oct 4, 2021 · 11 comments · Fixed by #11880
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@pboguslawski
Copy link

/kind bug

Description

If podman stop kills container after timeout, no info about timeout is logged in journal/stderr; killing container after timeout should be treated as dangerous because of data loss possibility so message

logrus.Infof("Timed out stopping container %s, resorting to SIGKILL: %v", ctr.ID(), err)

priority should be higher than info (at least warn) and systemd unit files generated with podman generate should produce such warnings to journal/syslog in case of killing container by podman with sigkill.

Using podman stop --log-level=info is not an option because it produces too much noise.

Steps to reproduce the issue:

  1. Run any container that need >1s to stop.

  2. Stop it using podman stop --time=1

Describe the results you received:

No traces in journal/syslog about container killing.

Describe the results you expected:

Journal/syslog should contain message about stop timeout and container killing with level warn (or maybe even error).

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 4, 2021
@rhatdan
Copy link
Member

rhatdan commented Oct 4, 2021

Interested in opening a PR to fix?

@edsantiago
Copy link
Member

I agree that falling back to SIGKILL is noteworthy, but I'm not convinced that podman stop is the right place for this. The careful developer makes sure to check container exit status:

$ podman run --name foo -d alpine sleep 100
7e6c79489feac65fa34353146047214e05014c30bf4efa048b09192c29c5513c
$ podman stop --time=1 foo
foo
$ podman inspect --format '{{.State.ExitCode}}' foo
137

In an automated environment, a warning would be of limited use: nobody would see it. In an automated environment, it really is kind of important for the developer to check container exit status.

When playing on the command line, I can see a "timed out" warning being mildly helpful -- but in this environment the user will (presumably) also notice the timeout delay itself.

@pboguslawski in what scenario do you think this kind of warning would actually help someone?

@pboguslawski
Copy link
Author

pboguslawski commented Oct 5, 2021

in what scenario do you think this kind of warning would actually help someone?

The issue is about using systemd unit generated with podman generate as system service without any other automated environment. In this scenario no problem traces in journal/syslog. Also some careful developers still like to use syslog watchers... :)

@rhatdan
Copy link
Member

rhatdan commented Oct 5, 2021

You could change the --log-level info command on the podman stop command to see this output.

@mheon
Copy link
Member

mheon commented Oct 5, 2021

I don't know if this is worthy of printing by default - podman stop on a container running a binary that ignores signals (sleep is a good example) will always trigger it, so my feeling is that we'll be printing this warning a lot, there's not much the user can do to resolve it (maybe some binaries are like systemd and require a specific stop signal but that seems to be the minority), and it will increase logspam making it more difficult to locate serious problems.

@pboguslawski
Copy link
Author

pboguslawski commented Oct 5, 2021

You could change the --log-level info command on the podman stop command to see this output.

Adding --log-level info to systemd service unit file fills journal with too much noise; this particular message level should be warn or error (if warn then podman should generate unit files with podman stop --log-level warn to feed journal with it in case of timeout).

I don't know if this is worthy of printing by default

IHMO: its definitely more important message than podmans messages with

container start
container stop
container cleanup
container died
container attach

that are sent to syslog by default.

Would be also nice to see similar message in syslog with for example

container killed (stop timeout)

@edsantiago
Copy link
Member

@pboguslawski thanks, that's a good point (your use case in systemd). I think your proposal is promising: the timeout really should IMHO be more important than info (although absolutely not error-level); and by automatically setting log-level in the unit file, the warning will be visible to log scrapers but not simple schmoes playing with sleep containers on the command line.

@rhatdan
Copy link
Member

rhatdan commented Oct 5, 2021

Podman defaults to log-level Warn, the question would be do we raise it to Warn, so it is in everyones face when it happens. I am ok with it, but if users start to complain, then we might want to revisit it.
@pboguslawski Please open a PR.

@mheon
Copy link
Member

mheon commented Oct 5, 2021

If we actually do this, be prepared for the integration tests to break in a lot of places - which might not be the worst thing, considering we usually shouldn't be waiting for that timeout anyways.

@edsantiago
Copy link
Member

Podman defaults to log-level Warn

Man page says "error", which is why I was OK with raising the severity of this message.

  • If the man page is wrong, can someone please fix it? (I would offer, but I'm OOTO tomorrow with no net access)
    • Also if the man page is wrong, then I withdraw my comments above. I don't think this would be a useful warning for regular users. And int tests will be unaffected, because they don't really check any output anywhere, but sys tests may need a bit of massaging.

@mheon
Copy link
Member

mheon commented Oct 6, 2021

Manpages are wrong, then - we swapped to Warn around 9 months ago, AFAIK.

rhatdan added a commit to rhatdan/podman that referenced this issue Oct 8, 2021
Note: the Warning message will not come to podman-remote.
It would be difficult to plumb, and not really worth the effort.

Fixes: containers#11854

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants