-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor local storage to avoid duplicating blobs #390
Conversation
f2c5895
to
722adf8
Compare
@@ -14,7 +14,7 @@ COPY . . | |||
RUN \ | |||
CGO_ENABLED=0 go build \ | |||
-o kit \ | |||
-ldflags="-s -w -X kitops/pkg/cmd/version.Version=${version} -X kitops/pkg/cmd/version.GitCommit=$gitCommit -X kitops/pkg/cmd/version.BuildTime=$(date -u +'%Y-%m-%dT%H:%M:%SZ')" | |||
-ldflags="-s -w -X kitops/pkg/lib/constants.Version=${version} -X kitops/pkg/lib/constants.GitCommit=$gitCommit -X kitops/pkg/lib/constants.BuildTime=$(date -u +'%Y-%m-%dT%H:%M:%SZ')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really relevant to this PR but we probably need to add the go generate ./...
step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible that during migration it can fail for some reason like full disk. When that happens kit command becomes unusable and if somebody wants to use it for cleaning up the only option is to use an older version of kit. I am not sure if this is something that needs to be addressed but at least noted.
pkg/lib/repo/local/repository.go
Outdated
} | ||
if curTag != "" { | ||
li.modelTags.tagToDigest[curTag] = manifestDesc | ||
if err := li.modelTags.save(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err does not really need to be checked explicitly here.
if err := li.modelTags.save(); err != nil { | |
return li.modelTags.save() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, makes sense. I think I had it like this because it's annoying to have to refactor the lines to add some context to the error, but it's cleaner your way.
pkg/lib/repo/local/registry.go
Outdated
|
||
canDelete, err := canSafelyDeleteManifest(ctx, l.storagePath, target) | ||
if err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to detail the errors here and the following 2 errors
return err | |
return fmt.Errorf("failed to check if manifest can be safely deleted: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added context to this error, but I'm not sure we need more on the other ones -- they'll generally have os/file writing errors. We want to avoid logging too much in this handler code because the user shouldn't care about the idiosyncrasies of how we store files and why we can't actually remove a manifest yet.
} | ||
|
||
func MigrateStorage(ctx context.Context, baseStoragePath string) error { | ||
localStores, err := GetAllLocalStores(baseStoragePath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a long running operation. Can we have progress bars? or something that indicates progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point, I've added a basic progress bar (and some reusable code if we ever need a simple bar in the future).
Two more observations. All my pull command fails with
Also It keeps printing the |
8f1cd0e
to
fbefac3
Compare
Hm, strange -- I can't reproduce the pull issue. Could you share what your |
7f94a85
to
f1505fa
Compare
pkg/lib/repo/local/repository.go
Outdated
return nil | ||
} | ||
|
||
func (li *localIndex) exists(_ context.Context, target ocispec.Descriptor) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove the context.
2ebdea0
to
8d6038a
Compare
To avoid library packages importing from cmd, move the variables that store version info into the constants package
Split functions formerly mixed between the kitfile and repo packages more cleanly to avoid almost-import-cycles. Separate concerns between * Processing kitfiles + modelkits on-disk --> pkg/lib/kitfile * Handling local storage --> pkg/lib/repo/local * Handling remote repositories --> pkg/lib/repo/remote Also, run goimports with the --local flag to split imports between 1) standard library, 2) in-project, and 3) dependency imports.
Rework how modelkits are stored locally on disk. With this change, all blobs are stored within a single ORAS oci.Store, with additional indexes overlaid to restrict blobs to specific repos (tags). Additional indexes are stored alongside the main index.json and are named <name>-index.json, where <name> is the base64-encoded repository name (e.g. example.com/my-org/my-repo). This avoids duplicating blobs between multiple OCI stores to simulate multiple different 'repositories' (i.e. 'my-org/my-repo' and 'my-org/my-repo-2' now share blob storage without showing each other's manifests.
Swap implementatations for these commands since they're largely unchanged interface-wise.
With this change, tags no longer require copying blobs between directories on disk
Since the OCI spec is incredibly unergonomic for managing tags, store local per-repo indexes with no tag annotations and instead store tags as a separate map file. This makes many things a lot easier.
Since each local repo share storage, we cannot delete manifests from the main OCI store if that manifest is used in other repositories. Otherwise, we break those repositories (index contains a manifest, but blob storage does not).
Add automatic migration from the previous storage layout (multiple OCI stores nested within storage) to the new layout (shared storage between blobs)
Once all modelkits in a repository are migrated to the new format, remove the migrated modelkits. This hopefully avoids storage issues when migrating a large number of large modelkits, as previously we briefly required up to double to size of the local store.
8d6038a
to
9ad4d76
Compare
Description
This PR is a large-scale refactor of how local storage is implemented in order to avoid duplicating blobs to multiple on-disk OCI stores.
The first three commits are a reorganization of existing packages (since we almost had an import cycle) without functional changes. The remaining commits implement local storage as follows:
$KITOPS_HOME/storage
. This is the usual implementation, except we don't store tags there. Garbage collection of blobs is deferred there, and all known manifests are stored inside its index.<name>-index.json
and<name>-tags.json
, where<name>
is the base64-encoded registry/repo pair.*-tags.json
file. It maps tags to descriptors.I've also added code to automatically migrate existing local stores to the new format and clean up old files. This could likely be tested more thoroughly (and would be if we were in 1.0) but in my testing it's fast and safe.
Linked issues
Preparation to make #311 and #387 easier to implement.
Fixes #75