Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less obnoxious OCI pulling on sync #6804

Merged
merged 4 commits into from
Sep 19, 2023
Merged

Conversation

absoludity
Copy link
Contributor

Description of the change

This PR update the way we pull OCI repos, so that we:

  • Don't pull all manifests of all tags to verify that they are helm charts before syncing
  • Don't do an all-or-nothing sync (previous code synced all tags from the repo if the checksum differed), instead only syncing those tags that don't exist in the cache.
  • Reduces the number of workers to 5 for now, but I want to put this back to 10 in a subsequent PR with other changes.

Benefits

We can pull large OCI catalogs from public repos (in particular, can pull the Bitnami catalog from Dockerhub) without hitting public pull limits.

Possible drawbacks

The initial time until the catalog can be used is slightly longer. But I have two things I want to improve in a subsequent PR:

  • Update the cache after each app/repo is synced so they can be viewed in the catalog more quickly (more responsive). Currently it doesn't update the cache until the sync completes.
  • Only sync a maximum of 5 missing tags per app per sync. This means that we can run the initial sync with 10 workers, and will get the latest 5 versions for each app, then subsequent syncs will add more each time. We should make this adjustable too, perhaps.

This combination of changes will also remove another issue we have where a subsequent sync, running within 10m of the original (by default) will do the same work currently, rather than just doing the work that hasn't been done yet.

Applicable issues

Additional information

Signed-off-by: Michael Nelson <minelson@vmware.com>
…ssarily.

Signed-off-by: Michael Nelson <minelson@vmware.com>
Signed-off-by: Michael Nelson <minelson@vmware.com>
@netlify
Copy link

netlify bot commented Sep 13, 2023

Deploy Preview for kubeapps-dev canceled.

Name Link
🔨 Latest commit 5a761b1
🔍 Latest deploy log https://app.netlify.com/sites/kubeapps-dev/deploys/6501522da7b4f600082dcf6d

Signed-off-by: Michael Nelson <minelson@vmware.com>
Copy link
Collaborator

@ppbaena ppbaena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Michael!

@absoludity
Copy link
Contributor Author

Thanks Michael!

Thanks for the review. Just note, I'm holding of landing this until I have the follow-up PRs also done and can check it works as expected locally. First one is at #6825 but there will be one more following that also. I'll land them once tested locally.

@absoludity absoludity merged commit cf4430e into main Sep 19, 2023
@absoludity absoludity deleted the 6706-sync-public-oci-repo-3 branch September 19, 2023 04:33
absoludity added a commit that referenced this pull request Sep 19, 2023
### Description of the change

Follows on from #6804, this PR updates so that the actual Syncing
happens per chart+versions, rather than at the end of the process. Much
of the change is refactoring functions that worked with slices of charts
to work with a chart instead. The complexity is the actual refactoring
of the sync, requiring channels to iterate on the results etc.

### Benefits

We'll be able no have results in the UX much more quickly (important
with OCI imports, which take longer without a big index.yaml to simply
parse).

### Possible drawbacks

Currently this will not work as expected, as two more changes are
required:

1. Need to *update* the existing chart versions for a chart, rather than
re-writing them (with only the chart versions that were synced, as it
will do now)
2. Need to collect the charts to be deleted and pass those explicitly,
rather than deleting all charts that are not in the current set being
synced (which it currently does - as this worked when syncing
everything).

### Applicable issues

- ref #6706 

### Additional information

Last PR to follow.

---------

Signed-off-by: Michael Nelson <minelson@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants