Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd-cloud-prune: Prune images for builds #3867

Merged
merged 1 commit into from
Sep 23, 2024

Conversation

gursewak1997
Copy link
Member

Extend the garbage collection to the images for the builds. We will prune all the images apart from what is specified in the images_keep list for each stream in gc-policy.yaml

@gursewak1997
Copy link
Member Author

One question pertaining to the images pruning is whether we should update the meta.json for each build specifying/deleting the pruned images from the images key?

@dustymabe
Copy link
Member

One question pertaining to the images pruning is whether we should update the meta.json for each build specifying/deleting the pruned images from the images key?

I think we should keep them for historical reference. It's also safer to not have to touch other files in s3 (I don't want to get it wrong).

@dustymabe
Copy link
Member

Can you open a corresponding PR to the pipeline that proposes updates to the gc-policy.yaml based on these changes? I think it will help to illustrate and drive the review process.

Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial round. I've got just a little left to do but will have to pick it up later.

src/cmd-cloud-prune Show resolved Hide resolved
src/cmd-cloud-prune Outdated Show resolved Hide resolved
src/cmd-cloud-prune Outdated Show resolved Hide resolved
src/cmd-cloud-prune Outdated Show resolved Hide resolved
src/cmd-cloud-prune Outdated Show resolved Hide resolved
src/cmd-cloud-prune Outdated Show resolved Hide resolved
src/cmd-cloud-prune Outdated Show resolved Hide resolved
@gursewak1997
Copy link
Member Author

I think we should keep them for historical reference. It's also safer to not have to touch other files in s3 (I don't want to get it wrong).

I thought the same. Ideally, we shouldn't modify meta.json to ensure it's safe and to minimize updates to S3.

@gursewak1997
Copy link
Member Author

Can you open a corresponding PR to the pipeline that proposes updates to the gc-policy.yaml based on these changes? I think it will help to illustrate and drive the review process.

coreos/fedora-coreos-pipeline#1038

@gursewak1997 gursewak1997 force-pushed the add-images-gc branch 2 times, most recently from 6875cfe to 33575de Compare September 20, 2024 08:27
src/cmd-cloud-prune Outdated Show resolved Hide resolved
src/cmd-cloud-prune Outdated Show resolved Hide resolved
src/cmd-cloud-prune Show resolved Hide resolved
src/cmd-cloud-prune Show resolved Hide resolved
src/cmd-cloud-prune Outdated Show resolved Hide resolved
src/cmd-cloud-prune Show resolved Hide resolved
src/cmd-cloud-prune Outdated Show resolved Hide resolved
src/cmd-cloud-prune Outdated Show resolved Hide resolved
src/cmd-cloud-prune Outdated Show resolved Hide resolved
@dustymabe
Copy link
Member

NOTE: I don't want to make the following suggested change in this PR. If we did decide to do it I'd propose it for a separate PR after this one merges.

Ultimately I think it may be better here to iterate over builds at the highest level for loop rather than the action (policy). i.e.

for build in builds:
    for action in ['cloud-uploads', 'images', 'build']:

rather than how it is now. A few reasons why:

  1. right now we download a build's meta.json (for each arch) 3 times (for each action in the policy) from s3. I think that means we download a meta.json 12 times, rather than 4 for each build in the worst case. It's a small file so doesn't matter a ton, but worth noting.
  2. This change might allow us to update the build.json file less times with the results of the cleanup.

Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. One change needed.

# i.e. there may be additional images we need prune
previous_images_kept = previous_cleanup.get("images-kept", [])
if set(images_to_keep) == set(previous_images_kept):
print(f"Build {build_id} has already had {action} pruning completed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're missing the key piece:

Suggested change
print(f"Build {build_id} has already had {action} pruning completed")
print(f"Build {build_id} has already had {action} pruning completed")
continue

src/cmd-cloud-prune Outdated Show resolved Hide resolved
src/cmd-cloud-prune Show resolved Hide resolved
src/cmd-cloud-prune Outdated Show resolved Hide resolved
src/cmd-cloud-prune Show resolved Hide resolved
src/cmd-cloud-prune Outdated Show resolved Hide resolved
Extend the garbage collection to the images and whole builds. We
will prune all the images apart from what is specified in the
images_keep list for each stream in gc-policy.yaml. For pruning
the whole builds, we will delete all the resources in s3 for that
build and add those builds under tombstone-builds in the
respective builds.json
Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's merge this and then discuss if we think it's worth a followup for the proposal in #3867 (comment) or not.

@gursewak1997 gursewak1997 merged commit ea78d55 into coreos:main Sep 23, 2024
5 checks passed
@gursewak1997 gursewak1997 deleted the add-images-gc branch September 23, 2024 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants