Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make image listing more resilient #18980

Merged
merged 1 commit into from
Jun 26, 2023

Conversation

vrothberg
Copy link
Member

Handle more TOCTOUs operating on listed images. Also pull in containers/common/pull/1520 which does the same on the internal layer tree.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2216700

Does this PR introduce a user-facing change?

Make listing images more resilient towards concurrently running image removals.

@openshift-ci openshift-ci bot added release-note do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 23, 2023
@vrothberg
Copy link
Member Author

Draft until containers/common#1520 is merged.

@vrothberg vrothberg marked this pull request as ready for review June 23, 2023 12:09
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 23, 2023
@vrothberg
Copy link
Member Author

@Luap99 @rhatdan PTAL

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love for @edsantiago to have a look at the test.
code LGTM

test/system/010-images.bats Show resolved Hide resolved
test/system/010-images.bats Outdated Show resolved Hide resolved
Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is such a freaky weird failure, I'd like to see a super-clear comment immediately next to the podman images.

Suggested retry code is untested. Low on time today, heading out.

test/system/010-images.bats Outdated Show resolved Hide resolved
test/system/010-images.bats Show resolved Hide resolved
@TomSweeneyRedHat
Copy link
Member

LGTM once the tests are hip and @edsantiago 's comments are addressed.

@vrothberg
Copy link
Member Author

Need another PR in c/common. I am pretty happy about the test as it's quite helpful in finding (more) issues.

vrothberg added a commit to vrothberg/common that referenced this pull request Jun 26, 2023
Commit 97961a6 took a first stab at making operating on the layer
tree more resilient by detecting unknown-image errors.  Testing in
containers/podman/pull/18980 revealed that we also need to detect
unknown-layer and -size errors as well.

Move the errors checks into a convenience function and update the
relevant call sites to facilitate future changes.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
@vrothberg
Copy link
Member Author

Now vendoring containers/common#1522 which is needed to address more errors that popped up. I've been running the test for the past 20 mins on my machine but let's see what CI says.

vrothberg added a commit to vrothberg/common that referenced this pull request Jun 26, 2023
Commit 97961a6 took a first stab at making operating on the layer
tree more resilient by detecting unknown-image errors.  Testing in
containers/podman/pull/18980 revealed that we also need to detect
unknown-layer and -size errors as well.

Move the errors checks into a convenience function and update the
relevant call sites to facilitate future changes.  Export the
function since Podman needs the very same checks when operating
on images, for instance, when looking up image labels.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
@vrothberg vrothberg force-pushed the bz-2216700 branch 2 times, most recently from a080bf2 to 93ea7a8 Compare June 26, 2023 08:35
vrothberg added a commit to vrothberg/common that referenced this pull request Jun 26, 2023
Commit 97961a6 took a first stab at making operating on the layer
tree more resilient by detecting unknown-image errors.  Testing in
containers/podman/pull/18980 revealed that we also need to detect
unknown-layer and -size errors as well.

Move the errors checks into a convenience function and update the
relevant call sites to facilitate future changes.  Export the
function since Podman needs the very same checks when operating
on images, for instance, when looking up image labels.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
vrothberg added a commit to vrothberg/common that referenced this pull request Jun 26, 2023
Commit 97961a6 took a first stab at making operating on the layer
tree more resilient by detecting unknown-image errors.  Testing in
containers/podman/pull/18980 revealed that we also need to detect
unknown-layer and -size errors as well.

Move the errors checks into a convenience function and update the
relevant call sites to facilitate future changes.  Export the
function since Podman needs the very same checks when operating
on images, for instance, when looking up image labels.

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
@vrothberg vrothberg force-pushed the bz-2216700 branch 2 times, most recently from c1e1074 to 2741635 Compare June 26, 2023 12:08
Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vrothberg
Copy link
Member Author

Almost green and ready to merge

Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 26, 2023
Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explanation.

test/system/010-images.bats Outdated Show resolved Hide resolved
Handle more TOCTOUs operating on listed images.  Also pull in
containers/common/pull/1520 and containers/common/pull/1522 which do the
same on the internal layer tree.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2216700
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
@edsantiago
Copy link
Member

@vrothberg this is a new (to me) error:

<+009ms> # # podman-remote --url unix:/tmp/podman_tmp_sgM5 pod rm cf317c49437acdecad132d64cbc436369ab08bef4531c4c0fe1cbacd9dd9299d
<+273ms> # Error: not all containers could be removed from pod cf317c49437acdecad132d64cbc436369ab08bef4531c4c0fe1cbacd9dd9299d: removing pod containers

Is there any chance your code could've caused this?

@vrothberg
Copy link
Member Author

Is there any chance your code could've caused this?

It is not obvious to me but "never say never". Let's keep an eye on it.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 26, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, giuseppe, vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [edsantiago,giuseppe,vrothberg]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@edsantiago
Copy link
Member

Flake turned out to be the unmount/EINVAL one (#18831), which is well known. This PR LGTM.

@rhatdan
Copy link
Member

rhatdan commented Jun 26, 2023

/lgtm
/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 26, 2023
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 26, 2023
@openshift-merge-robot openshift-merge-robot merged commit c2d46ac into containers:main Jun 26, 2023
@vrothberg vrothberg deleted the bz-2216700 branch June 27, 2023 05:56
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 25, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants