Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor local storage to avoid duplicating blobs #390

Merged
merged 16 commits into from
Jul 9, 2024

Conversation

amisevsk
Copy link
Contributor

@amisevsk amisevsk commented Jul 3, 2024

Description

This PR is a large-scale refactor of how local storage is implemented in order to avoid duplicating blobs to multiple on-disk OCI stores.

The first three commits are a reorganization of existing packages (since we almost had an import cycle) without functional changes. The remaining commits implement local storage as follows:

  • There is one OCI on-disk store, located at $KITOPS_HOME/storage. This is the usual implementation, except we don't store tags there. Garbage collection of blobs is deferred there, and all known manifests are stored inside its index.
  • For each repo name (i.e. registry/repository), a separate index.json and tag table is stored, named <name>-index.json and <name>-tags.json, where <name> is the base64-encoded registry/repo pair.
  • The local storage implementation intercepts read/write actions on manifests and serves them from the namespaced index.json, while blobs are stored in the general OCI store, allowing blobs to be shared between repositories.
  • Tags are stored as a basic json map in the *-tags.json file. It maps tags to descriptors.

I've also added code to automatically migrate existing local stores to the new format and clean up old files. This could likely be tested more thoroughly (and would be if we were in 1.0) but in my testing it's fast and safe.

Linked issues

Preparation to make #311 and #387 easier to implement.
Fixes #75

@amisevsk amisevsk requested a review from gorkem July 3, 2024 23:19
@amisevsk amisevsk force-pushed the local-storage-refactor branch from f2c5895 to 722adf8 Compare July 5, 2024 15:03
@@ -14,7 +14,7 @@ COPY . .
RUN \
CGO_ENABLED=0 go build \
-o kit \
-ldflags="-s -w -X kitops/pkg/cmd/version.Version=${version} -X kitops/pkg/cmd/version.GitCommit=$gitCommit -X kitops/pkg/cmd/version.BuildTime=$(date -u +'%Y-%m-%dT%H:%M:%SZ')"
-ldflags="-s -w -X kitops/pkg/lib/constants.Version=${version} -X kitops/pkg/lib/constants.GitCommit=$gitCommit -X kitops/pkg/lib/constants.BuildTime=$(date -u +'%Y-%m-%dT%H:%M:%SZ')"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really relevant to this PR but we probably need to add the go generate ./... step.

Copy link
Contributor

@gorkem gorkem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that during migration it can fail for some reason like full disk. When that happens kit command becomes unusable and if somebody wants to use it for cleaning up the only option is to use an older version of kit. I am not sure if this is something that needs to be addressed but at least noted.

}
if curTag != "" {
li.modelTags.tagToDigest[curTag] = manifestDesc
if err := li.modelTags.save(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err does not really need to be checked explicitly here.

Suggested change
if err := li.modelTags.save(); err != nil {
return li.modelTags.save()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense. I think I had it like this because it's annoying to have to refactor the lines to add some context to the error, but it's cleaner your way.


canDelete, err := canSafelyDeleteManifest(ctx, l.storagePath, target)
if err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to detail the errors here and the following 2 errors

Suggested change
return err
return fmt.Errorf("failed to check if manifest can be safely deleted: %w", err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added context to this error, but I'm not sure we need more on the other ones -- they'll generally have os/file writing errors. We want to avoid logging too much in this handler code because the user shouldn't care about the idiosyncrasies of how we store files and why we can't actually remove a manifest yet.

}

func MigrateStorage(ctx context.Context, baseStoragePath string) error {
localStores, err := GetAllLocalStores(baseStoragePath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a long running operation. Can we have progress bars? or something that indicates progress.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, I've added a basic progress bar (and some reusable code if we ever need a simple bar in the future).

@gorkem
Copy link
Contributor

gorkem commented Jul 8, 2024

Two more observations. All my pull command fails with

[ERROR] Failed to push: failed to copy to remote: failed to resolve tuned: not found

Also It keeps printing the Migrating local storage to new format with every command. Although the migration should have completed.

@amisevsk amisevsk force-pushed the local-storage-refactor branch 2 times, most recently from 8f1cd0e to fbefac3 Compare July 8, 2024 18:40
@amisevsk
Copy link
Contributor Author

amisevsk commented Jul 8, 2024

Two more observations. All my pull command fails with

[ERROR] Failed to push: failed to copy to remote: failed to resolve tuned: not found

Also It keeps printing the Migrating local storage to new format with every command. Although the migration should have completed.

Hm, strange -- I can't reproduce the pull issue. Could you share what your $KITOPS_HOME/storage looks like? Sounds like Kit is finding an unmigrated localStore and is failing to resolve a modelkit within it?

@amisevsk amisevsk force-pushed the local-storage-refactor branch from 7f94a85 to f1505fa Compare July 8, 2024 22:56
return nil
}

func (li *localIndex) exists(_ context.Context, target ocispec.Descriptor) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove the context.

@amisevsk amisevsk force-pushed the local-storage-refactor branch 2 times, most recently from 2ebdea0 to 8d6038a Compare July 9, 2024 16:12
amisevsk added 16 commits July 9, 2024 12:17
To avoid library packages importing from cmd, move the variables that
store version info into the constants package
Split functions formerly mixed between the kitfile and repo packages
more cleanly to avoid almost-import-cycles. Separate concerns between

* Processing kitfiles + modelkits on-disk --> pkg/lib/kitfile
* Handling local storage --> pkg/lib/repo/local
* Handling remote repositories --> pkg/lib/repo/remote

Also, run goimports with the --local flag to split imports between 1)
standard library, 2) in-project, and 3) dependency imports.
Rework how modelkits are stored locally on disk. With this change, all
blobs are stored within a single ORAS oci.Store, with additional indexes
overlaid to restrict blobs to specific repos (tags). Additional indexes
are stored alongside the main index.json and are named
<name>-index.json, where <name> is the base64-encoded repository name
(e.g. example.com/my-org/my-repo).

This avoids duplicating blobs between multiple OCI stores to simulate
multiple different 'repositories' (i.e. 'my-org/my-repo' and
'my-org/my-repo-2' now share blob storage without showing each other's
manifests.
Swap implementatations for these commands since they're largely
unchanged interface-wise.
With this change, tags no longer require copying blobs between
directories on disk
Since the OCI spec is incredibly unergonomic for managing tags, store
local per-repo indexes with no tag annotations and instead store tags as
a separate map file. This makes many things a lot easier.
Since each local repo share storage, we cannot delete manifests from the
main OCI store if that manifest is used in other repositories.
Otherwise, we break those repositories (index contains a manifest, but
blob storage does not).
Add automatic migration from the previous storage layout (multiple OCI
stores nested within storage) to the new layout (shared storage between
blobs)
Once all modelkits in a repository are migrated to the new format,
remove the migrated modelkits. This hopefully avoids storage issues when
migrating a large number of large modelkits, as previously we briefly
required up to double to size of the local store.
@amisevsk amisevsk force-pushed the local-storage-refactor branch from 8d6038a to 9ad4d76 Compare July 9, 2024 18:30
@amisevsk amisevsk merged commit 84c842b into jozu-ai:main Jul 9, 2024
3 checks passed
@amisevsk amisevsk deleted the local-storage-refactor branch July 9, 2024 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve how artifacts are stored locally to avoid duplicating data
2 participants