Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of healthcheck from image #3539

Merged
merged 4 commits into from
Jul 16, 2019

Conversation

stefanb2
Copy link
Contributor

@stefanb2 stefanb2 commented Jul 9, 2019

  1. proposed fix for Does not schedule healthcheck for images with HEALTHCHECK #3525 cmd/podman/shared/create.go:
    If the image doesn't provide any options, e.g. interval, timeout, etc.,
    then apply the standard defaults when creating the container. Now podman
    correctly schedules the healtcheck service & timer for the container.

  2. proposed fix for podman incompatiblity with healthcheck from docker image #3507 libpod/healthcheck.go

    • remove duplicate check, already called in HealthCheck()
    • reject zero-length command list
    • check for Docker command list, starting with NONE, CMD or CMD-SHELL
    • otherwise pass command list as-is to container

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 9, 2019
@openshift-ci-robot
Copy link
Collaborator

Hi @stefanb2. Thanks for your PR.

I'm waiting for a containers or openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 9, 2019
@haircommander
Copy link
Collaborator

/ok-to-test

@openshift-ci-robot openshift-ci-robot added ok-to-test and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 9, 2019
@rh-atomic-bot
Copy link
Collaborator

Can one of the admins verify this patch?
I understand the following commands:

  • bot, add author to whitelist
  • bot, test pull request
  • bot, test pull request once

libpod/healthcheck.go Outdated Show resolved Hide resolved
cmd/podman/shared/create.go Outdated Show resolved Hide resolved
@stefanb2
Copy link
Contributor Author

/test special_testing_rootless

@stefanb2 stefanb2 force-pushed the topic-pr-3507-3525 branch from f30eb45 to 6f0fdc6 Compare July 10, 2019 06:55
@mheon
Copy link
Member

mheon commented Jul 10, 2019

That one's a known flake. For reference, the /retest commands don't work on most tests - you need to go into the page for the failing test, and click the restart link

@stefanb2
Copy link
Contributor Author

stefanb2 commented Jul 10, 2019

Tests with Docker V1 images, generated on Fedora 30 with default docker installation. F30 podman package rebuilt with WIP patches included.

  1. Dockerfile HEALTHCHECK line
  2. output from podman image inspect <IMAGE> | fgrep HEALTHCHECK
  3. output from podman container inspect --format {{json .Config.HealthCheck}} test
  4. output from systemctl --user show <CONTAINER ID>.timer | fgrep TimersMonotonic=
  5. output from podman --log-level=debug healthcheck run test 2>&1 | fgrep "executing health check command"

Healthcheck disabled

1 - HEALTHCHECK NONE
2 - HEALTHCHECK \u0026{[\"NONE\"] \"0s\" \"0s\" '\\x00'}
3 - null
4 - timer not found (i.e. none scheduled)
5 - ERRO[0000] container <CONTAINER ID> has no defined healthcheck

Healthcheck with one command

1 - HEALTHCHECK CMD /bin/true
2 - HEALTHCHECK \u0026{[\"CMD-SHELL\" \"/bin/true\"] \"0s\" \"0s\" '\\x00'}
3 - {"Test":["CMD-SHELL","/bin/true"],"Interval":30000000000,"Timeout":30000000000,"Retries":3}
4 - TimersMonotonic={ OnUnitInactiveUSec=30s ...
5 - executing health check command /bin/sh -c /bin/true for <CONTAINER ID>

Healthcheck with 1-element command list

1 - HEALTHCHECK CMD ["/bin/true"]
2 - HEALTHCHECK \u0026{[\"CMD\" \"/bin/true\"] \"0s\" \"0s\" '\\x00'}
3 - {"Test":["CMD","/bin/true"],"Interval":30000000000,"Timeout":30000000000,"Retries":3}
4 - TimersMonotonic={ OnUnitInactiveUSec=30s ...
5 - executing health check command /bin/true for <CONTAINER ID>

Healthcheck with 2-element command list

1 - HEALTHCHECK CMD ["/bin/echo", "test"]
2 - HEALTHCHECK \u0026{[\"CMD\" \"/bin/echo\" \"test\"] \"0s\" \"0s\" '\\x00'}
3 - {"Test":["CMD","/bin/echo","test"],"Interval":30000000000,"Timeout":30000000000,"Retries":3}
4 - TimersMonotonic={ OnUnitInactiveUSec=30s ...
5 - executing health check command /bin/echo test for <CONTAINER ID>

Healthcheck with one command and non-default interval

1 - HEALTHCHECK --interval=40s CMD /bin/true
2 - HEALTHCHECK \u0026{[\"CMD-SHELL\" \"/bin/true\"] \"40s\" \"0s\" '\\x00'}
3 - {"Test":["CMD-SHELL","/bin/true"],"Interval":40000000000,"Timeout":30000000000,"Retries":3}
4 - TimersMonotonic={ OnUnitInactiveUSec=40s
5 - executing health check command /bin/sh -c /bin/true for <CONTAINER ID>

Healthcheck with one command and non-default timeout

1 - HEALTHCHECK --timeout=50s CMD /bin/true
2 - HEALTHCHECK \u0026{[\"CMD-SHELL\" \"/bin/true\"] \"0s\" \"50s\" '\\x00'}
3 - {"Test":["CMD-SHELL","/bin/true"],"Interval":30000000000,"Timeout":50000000000,"Retries":3}
4 - TimersMonotonic={ OnUnitInactiveUSec=30s ...
5 - executing health check command /bin/sh -c /bin/true for <CONTAINER ID>

Healthcheck with one command and non-default retries

1 - HEALTHCHECK --retries=10 CMD /bin/true
2 - HEALTHCHECK \u0026{[\"CMD-SHELL\" \"/bin/true\"] \"0s\" \"0s\" '\\n'}
3 - {"Test":["CMD-SHELL","/bin/true"],"Interval":30000000000,"Timeout":30000000000,"Retries":10}
4 - TimersMonotonic={ OnUnitInactiveUSec=30s ...
5 - executing health check command /bin/sh -c /bin/true for <CONTAINER ID>

libpod/healthcheck.go Outdated Show resolved Hide resolved
libpod/healthcheck.go Outdated Show resolved Hide resolved
@stefanb2
Copy link
Contributor Author

Tests with Docker V2 images, generated on Fedora 30 with Docker CE 19.03 installation. F30 podman package rebuilt with WIP patches included.

  1. Dockerfile HEALTHCHECK line
  2. output from podman image inspect <IMAGE> | fgrep HEALTHCHECK
  3. output from podman container inspect --format {{json .Config.HealthCheck}} test
  4. output from systemctl --user show <CONTAINER ID>.timer | fgrep TimersMonotonic=
  5. output from podman --log-level=debug healthcheck run test 2>&1 | fgrep "executing health check command"

Healthcheck disabled

1 - HEALTHCHECK NONE
2 - HEALTHCHECK \u0026{[\"NONE\"] \"0s\" \"0s\" \"0s\" '\\x00'}
3 - null
4 - timer not found (i.e. none scheduled)
5 - ERRO[0000] container <CONTAINER ID> has no defined healthcheck

Healthcheck with one command

1 - HEALTHCHECK CMD /bin/true
2 - HEALTHCHECK \u0026{[\"CMD-SHELL\" \"/bin/true\"] \"0s\" \"0s\" \"0s\" '\\x00'}
3 - {"Test":["CMD-SHELL","/bin/true"],"Interval":30000000000,"Timeout":30000000000,"Retries":3}
4 - TimersMonotonic={ OnUnitInactiveUSec=30s ...
5 - executing health check command /bin/sh -c /bin/true for <CONTAINER ID>

Healthcheck with 1-element command list

1 - HEALTHCHECK CMD ["/bin/true"]
2 - HEALTHCHECK \u0026{[\"CMD\" \"/bin/true\"] \"0s\" \"0s\" \"0s\" '\\x00'}
3 - {"Test":["CMD","/bin/true"],"Interval":30000000000,"Timeout":30000000000,"Retries":3}
4 - TimersMonotonic={ OnUnitInactiveUSec=30s ...
5 - executing health check command /bin/true for <CONTAINER ID>

Healthcheck with 2-element command list

1 - HEALTHCHECK CMD ["/bin/echo", "test"]
2 - HEALTHCHECK \u0026{[\"CMD\" \"/bin/echo\" \"test\"] \"0s\" \"0s\" \"0s\" '\\x00'}
3 - {"Test":["CMD","/bin/echo","test"],"Interval":30000000000,"Timeout":30000000000,"Retries":3}
4 - TimersMonotonic={ OnUnitInactiveUSec=30s ...
5 - executing health check command /bin/echo test for <CONTAINER ID>

Healthcheck with one command and non-default interval

1 - HEALTHCHECK --interval=40s CMD /bin/true
2 - HEALTHCHECK \u0026{[\"CMD-SHELL\" \"/bin/true\"] \"40s\" \"0s\" \"0s\" '\\x00'}
3 - {"Test":["CMD-SHELL","/bin/true"],"Interval":40000000000,"Timeout":30000000000,"Retries":3}
4 - TimersMonotonic={ OnUnitInactiveUSec=40s
5 - executing health check command /bin/sh -c /bin/true for <CONTAINER ID>

Healthcheck with one command and non-default timeout

1 - HEALTHCHECK --timeout=50s CMD /bin/true
2 - HEALTHCHECK \u0026{[\"CMD-SHELL\" \"/bin/true\"] \"0s\" \"50s\" \"0s\" '\\x00'}
3 - {"Test":["CMD-SHELL","/bin/true"],"Interval":30000000000,"Timeout":50000000000,"Retries":3}
4 - TimersMonotonic={ OnUnitInactiveUSec=30s ...
5 - executing health check command /bin/sh -c /bin/true for <CONTAINER ID>

Healthcheck with one command and non-default start period

1 - HEALTHCHECK --start-period=60s CMD /bin/true
2 - HEALTHCHECK \u0026{[\"CMD-SHELL\" \"/bin/true\"] \"0s\" \"0s\" \"1m0s\" '\\x00'}
3 - {"Test":["CMD-SHELL","/bin/true"],"StartPeriod":60000000000,"Interval":30000000000,"Timeout":30000000000,"Retries":3}
4 - TimersMonotonic={ OnUnitInactiveUSec=30s ...
5 - executing health check command /bin/sh -c /bin/true for <CONTAINER ID>

Healthcheck with one command and non-default retries

1 - HEALTHCHECK --retries=10 CMD /bin/true
2 - HEALTHCHECK \u0026{[\"CMD-SHELL\" \"/bin/true\"] \"0s\" \"0s\" \"0s\" '\\n'
3 - {"Test":["CMD-SHELL","/bin/true"],"Interval":30000000000,"Timeout":30000000000,"Retries":10}
4 - TimersMonotonic={ OnUnitInactiveUSec=30s ...
5 - executing health check command /bin/sh -c /bin/true for <CONTAINER ID>

@stefanb2 stefanb2 force-pushed the topic-pr-3507-3525 branch from 6f0fdc6 to 5f700e1 Compare July 11, 2019 08:09
@stefanb2 stefanb2 changed the title WIP fix handling of healthcheck from image Fix handling of healthcheck from image Jul 11, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 11, 2019
@stefanb2
Copy link
Contributor Author

From my PoV this PR is ready for final review & approval.

cmd/podman/shared/create.go Show resolved Hide resolved
cmd/podman/shared/create.go Show resolved Hide resolved
libpod/healthcheck.go Show resolved Hide resolved
@mheon
Copy link
Member

mheon commented Jul 11, 2019

/approve

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon, stefanb2

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 11, 2019
@stefanb2 stefanb2 force-pushed the topic-pr-3507-3525 branch from 5f700e1 to f0c24e5 Compare July 11, 2019 17:40
@stefanb2
Copy link
Contributor Author

Healthcheck with empty command

1 - HEALTHCHECK CMD [""]
2 - HEALTHCHECK \u0026{[\"CMD\" \"\"] \"0s\" \"0s\" \"0s\" '\\x00'}
3 - {"Test":["CMD",""],"Interval":30000000000,"Timeout":30000000000,"Retries":3}
4 - TimersMonotonic={ OnUnitInactiveUSec=30s ...
5 - ERRO[0000] container <CONTAINER ID> has no defined healthcheck 

i.e. podman does create the unit & timer, but any attempt to start the healthcheck is rejected by podman. Hence there are no attempts logged in the container and its health state stays at Starting.

stefanb2 added 4 commits July 16, 2019 07:01
- remove duplicate check, already called in HealthCheck()
- reject zero-length command list and empty command string as errorneous
- support all Docker command list keywords: NONE, CMD or CMD-SHELL
- use Docker default "/bin/sh -c" for CMD-SHELL

Fixes containers#3507

Signed-off-by: Stefan Becker <chemobejk@gmail.com>
If the image doesn't provide any options, e.g. interval, timeout, etc.,
then apply the Docker defaults when creating the container. Otherwise
the defaults will be left 0 and podman doesn't schedule the healtcheck
service & timer for the container or incorrectly reports unhealthy state
when the check is executed.

Fixes containers#3525

Signed-off-by: Stefan Becker <chemobejk@gmail.com>
If the image was built with "HEALTHCHECK NONE" then we should create a
container without healthcheck configuration. Otherwise executing the
healthcheck on the container will return "unhealthy" instead of the
correct error message that the container doesn't have a healthcheck.

We also ignore the healthcheck configuration if the command list is
empty or the command string is empty.

Fixes containers#3525

Signed-off-by: Stefan Becker <chemobejk@gmail.com>
An image with "HEALTHCHECK CMD ['']" is valid but as there is no command
defined the healthcheck will fail. Reject such a configuration.

Fixes containers#3507

Signed-off-by: Stefan Becker <chemobejk@gmail.com>
@stefanb2 stefanb2 force-pushed the topic-pr-3507-3525 branch from f0c24e5 to 5ed2de1 Compare July 16, 2019 04:01
@stefanb2
Copy link
Contributor Author

Can I have a LGTM from the reviewers?

@mheon
Copy link
Member

mheon commented Jul 16, 2019

LGTM

@mheon
Copy link
Member

mheon commented Jul 16, 2019

@rhatdan @giuseppe @vrothberg PTAL

@rhatdan
Copy link
Member

rhatdan commented Jul 16, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 16, 2019
@openshift-merge-robot openshift-merge-robot merged commit 386ffd2 into containers:master Jul 16, 2019
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 26, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. ok-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants