-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support storing Ollama [non-]OCI image layers #2075
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: yeahdongcn The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
@baude is working on storing and producing artifacts, Currently looking at storing these at a higher level then containers/storage, but under the container storage directory tree. I will point him to this issue. BTW Have you looked at ramalama, an alternative to ollama. |
yes, we are working on something called libartifact that will mimic some of the behaviors of c/s but will be singularly purposed for oci artifacts. that work has started but is in its infancy. |
Is there a tracking/design issue for this? I have Opinions on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
@@ -790,6 +790,23 @@ func supportsOverlay(home string, homeMagic graphdriver.FsMagic, rootUID, rootGI | |||
return supportsDType, fmt.Errorf("'overlay' not found as a supported filesystem on this host. Please ensure kernel is new enough and has overlay support loaded.: %w", graphdriver.ErrNotSupported) | |||
} | |||
|
|||
func cp(r io.Reader, dest string, filename string) error { | |||
if seeker, ok := r.(io.Seeker); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is something that should have generally been set up by the caller.
} | ||
} | ||
|
||
f, err := os.Create(filepath.Join(dest, filename)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use ioutils.NewAtomicFileWriter
to ensure we have more transactional/idempotent semantics (see its various uses in this repo).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layer creation already has transaction semantics via incompleteFlag
; doing that for individual files inside a layer is unnecessary.
InUserNS: unshare.IsRootless(), | ||
}); err != nil { | ||
return 0, err | ||
if options.LayerFilename != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a real expert in c/storage but it is a fact that the codebase predates the creation/concept of artifacts, and is very much designed around storing layers.
I suspect that this options.LayerFilename
conditional thing could use a bit more design bikeshedding. I haven't looked...but for example, I think it may make more sense to actually have a separate datastore path entirely for artifacts that just happens to share code with c/storage, instead of trying to co-locate artifacts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
We need to have a proper design discussion about storing non-image artifacts, and what does that mean.
Do not merge until that happens.
(Also ☠️ privilege escalation vulnerability.)
The “image volume” feature explicitly talks about image volumes. I don’t see that arbitrarily extending it to also accept other content is obviously desirable.
And even if it were desirable at the Kubernetes level, that doesn’t at all imply that the data should be stored in “images” with “layers” and OverlayFS and that it should be possible to podman run
the thing.
… and even if we wanted to present this data as storage.Image
/storage.Layer
objects, I don’t know why we would want to use overlay this way; that just adds overhead.
Maybe we need that, maybe we don’t; but that requires actually thinking about that.
All of this is at best a sketch of an implementation. It corrupts data. It silently does the wrong thing on other code paths.
It doesn’t allow pushing the consumed data afterwards, and it’s not obvious how the design for that would work within the existing .Diff
API.
@@ -72,6 +72,7 @@ type ApplyDiffOpts struct { | |||
MountLabel string | |||
IgnoreChownErrors bool | |||
ForceMask *os.FileMode | |||
LayerFilename *string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens in all of the other storage drivers which weren’t modified?
layer.CompressedSize = size | ||
layer.UncompressedSize = size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this just doesn’t work; the data was already interpreted as a tar file, and uncompressed. This is at best pretending that didn’t happen.
return 0, err | ||
if options.LayerFilename != nil { | ||
logrus.Debugf("Applying file in %s", applyDir) | ||
err := cp(options.Diff, applyDir, *options.LayerFilename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
☠️
This is an unrestricted path traversal vulnerability, run as root.
return 0, err | ||
if options.LayerFilename != nil { | ||
logrus.Debugf("Applying file in %s", applyDir) | ||
err := cp(options.Diff, applyDir, *options.LayerFilename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
options.Diff
is an uncompressed stream (that the caller is interpreting as a tar). This just doesn’t work for arbitrary blobs.
Compare also containers/skopeo#2395 (and, I think, ollama/ollama#6510 ): “Ollama” itself is not even remotely an OCI-compliant registry right now. What is the ecosystem and community situation for this data format? Is this a multi-faceted interoperability effort, where all of these things are going to happily fall into place, or is this setting us up on an adversarial interoperability path? Cc: @baude , another thing to have a very clear position on. |
Right, good call out. While we should probably debate all of this in a proper design issue, my strawman starting point would be to support storing OCI artifacts in a possibly tweaked version of the standard OCI image layout in a standard way in a subdirectory of a containers-storage, i.e. |
I think a useful general intuition is “remote OCI artifact is no more specific than an URL; storing OCI artifacts is no more specific than storing files”. It is obviously possible to work at that level of abstraction (to pull URLs, to store files), and it happens all the time; but it’s extremely rarely the most convenient abstraction for applications to work with, with no other support. Applications don’t inherently want to traverse OCI structures. So maybe we should hard-code some specific artifact MIME types, and provide “native” storage / filesystem formats? (This pair of PR is already very strongly going in that direction, hard-coding aspects of one specific structure, which is not even OCI.) To me, the primary point is that container runtimes don’t have to be involved in consuming artifacts. For any proposed artifact use:
And we should only be considering use cases where the two alternatives above are not suitable, i.e. where a container-runtime native feature adds a clearly-identified value. What are we doing here?
|
We absolutely expect people to dynamically link (i.e. runtime) containers with things like ConfigMaps, and that also makes sense for "data" like AI models and other things.
I think it's more that there are some common needs between "artifacts" and runnable container images. For example, signing support, runtime integrity, an API to name and version them, support for concurrent reads/writes, garbage collection, etc. Whether those needs are met by literally having them in the same codebase as "the container runtime" or whether they're better met by the two things using shared libraries is more of an implementation debate - but the need for some common functionality is IMO clear. On the topic of runtime integrity for examlpe I am strongly of the opinion that the underlying technical heart for us to use to store "files" (container images and artifact content that can be mounted) is composefs - it allows robust online verification, deduplication, etc. And there is some composefs-related code in this repository (though, I definitely want to change some of how it works...more on that later). |
I'm sending out these two PRs (my initial work) to bring your attention to using OCI images to deliver artifacts like GGUF models and integrate them into the Kubernetes ecosystem. Considering the original purpose of |
Ramalama looks cool! But I'm looking for a way to avoid reinventing the CLI and instead leverage existing infrastructure. |
That’s a very fair point.
Fair, and that sort of implies the same on-registry storage, but it doesn’t imply storing them in the same overlay-based format, or even under the same namespace, as images.
We need to define local storage mechanism first. Maybe there will be nothing in common.
(I wouldn’t say that what we have with container images, basically two plain text strings, is the best that can be done. It isn’t even clearly defined what is the “application” that is being versioned and what is the version number, see OpenShift shipping dozens of functionally-different images all in one repo. And that, in turn, is one of the blockers to a meaningful concept of “an application update”.)
That’s ~not a native Kubernetes feature, IIRC (“use NFS”), not shared with containers, and ~incompatible with the “runtime integrity” desire.
I think that’s possible but at this point it’s not obvious to me the details will be shared — for images we want runtime integrity not just at a layer level, but for the whole layer sequence, and that might not apply to other data. |
c/storage and c/image does target OCI images, not just the ~frozen schema[12] formats. And I think prototyping / figuring out how to distribute large data (of any kind) in the OCI ecosystem is clearly in scope of the project. But starting that conversation with directly using a ~completely OCI-foreign format is a very surprising place to start. This PR, as proposed, has ~nothing in common with, as you say, “using OCI images to deliver…”. E.g. I could well imagine that somebody somewhere defines an OCI artifact MIME type + format for storing this data. Maybe that format should have a specific implementation in c/image + c/storage, separate from a general OCI artifact feature. In such a hypothetical ecosystem, it might well make sense to have a utility that converts the Ollama data into the OCI artifact format, as a compatiblity/interoperability mechanism. Or, to say this another way, if the Ollama format should become the way to “use OCI images”, that needs to be proposed to the OCI image-spec maintainers. (And considering that the OCI artifact format, with its full generality, exists, I don’t know why they would be inclined to support that proposal, but it’s also not up to me.) |
On second thought — name/version the AI data on the build system side, not on the consumer side. Build the AI data into the container image as a (bit-for-bit reproducible) layer. Net effect: Most of the features listed features are already there.
The one thing that I see missing from the above is the efficiency of avoiding overlay (does that matter??). For that, I could plausibly see an OCI image format extension, where any layer can:
|
I more meant support for pulling multiple artifacts at a time and garbage collection, not for mutating individual files in artifacts. We're not talking about log files or databases as artifacts, we're talking about configmaps, AI models, dpkg/rpm/language-package-manager-etc content (this is a big one). |
Make the data layers in the application image, that inherits existing concurrent pulling. |
(There's probably a better place to discuss a design for OCI artifacts, not totally sure where; I did create https://github.com/cgwalters/composefs-oci recently which touches on this and starts with composefs as the heart of things, for general interest)
By "application image" here you mean an OCI image (not an artifact)? If so your suggestion seems tantamount to "have a build process that transforms OCI artifacts into a tarball which can become a layer" which is basically saying "don't use OCI artifacts natively" as far as I can tell. The use cases I have in mind very much want to maintain the identify of an artifact end-to-end from the registry to the client system. |
Close enough. OCI artifacts are by nature immutable, they are not mutable volumes. The difference between a pair of (immutable OCI image used as a root FS, immutable OCI artifact mounted into the root FS), and (immutable OCI image also containing the contents of the artifact) is… just in whether the metadata is combined at build time or at runtime. To combine the two, we are talking about conceptually cheap metadata operations either way, although aligning all inputs to allow that cheap metadata operation to happen can be pretty complex. If we have to implement something complex, I’d rather do it in the concentrated build pipeline than at runtime in every single cluster node. That removes complexity from the runtimes; removes complexity from whole application deployment pipeline; removes complexity from the application tracking — everything has a single “image reference” that uniquely references the whole thing, instead of having to add a new “and also mount these artifacts” feature all over the ecosystem. A hypothetical fancy AI model management product can build a single clearly-identified application image as an output, and that application image can then be deployed to ~existing clusters. (I do acknowledge that ConfigMaps are a thing, but K8s imposes severe size and count limitations on them (because they are implemented by storing them in the etcd consensus-maintained database); and they naturally follow a split between “application vendor” and “sysadmin deploying the application”, that might be entirely different organizations. What is the role split between application author and model data creator that would warrant a similarly separate OCI artifact? |
This discussion also relates to e.g. opencontainers/image-spec#1197 (comment) and the linked to opencontainers/image-spec#1190 And perhaps what would make the most sense is for "us" to form a consensus that we take to the spec?
I don't agree. I think tooling that uses OCI artifacts "natively" would include maintaining their independent identity all the way to the client end - for example, key metadata such as an
There's always a long-running tension between "split into lots of individually updatable bits" and "stitch things together with higher level snapshots". The way OCP does the "release image" is in this vein, as is "big bag of free-floating RPMs" and a "compose". Recently in bootc we also added logically bound images which are very much in this space, and note it's actually an OCI image that has references to child images, but tied together as you say into one super "image reference". (Except it has the downside today that bootc doesn't know how much data it will need to download to succeed at an update, xref containers/bootc#128 (comment) ) I agree with you that the problem domain of "mount these artifacts" is a bit open ended and would require some standardization work that could be trivially bypassed by just materializing them as tar layers in a container image context.
When one is talking about Large Data like AI models (but not exclusively), I think in many operational contexts it will be very much desired for them to be decoupled from the application and "dynamically linked" in effect.
Yes but that's just because OCI artifacts didn't exist when ConfigMaps were invented! I think it would make total sense to store them as artifacts in a registry - do you agree? On the bootc side I also want to support configmaps attached to the host that can be rev'd independently from the base OS image and dynamically updated without restart (if desired) for basically exactly the same reason Kubernetes does. |
Also I wanted to say another downside IMO of flatting artifacts to tar layers is that because OCI images are architecture dependent, if we're talking about Large artifacts, one needs to be sure those tar layers are generated "reproducibly" and not include random junk data with floating timestamps and the like, so you don't duplicate your artifact data across N architectures (and also lacking reproducibility would mean it just gets duplicated each rebuild, even if the data didn't change). |
OK just to play out your suggestion a bit more...if "we" wanted to encourage that approach I think there'd need to be some recommended tooling at least integrated with a I think a strawman for that would be a lot like COPY --link except also with support for mapping artifact (non-tar) layers to tar...which still has all the open questions about how exactly artifact layers appear as files in a generic context but maybe a simple strawman would be |
“Dynamically linking” by uploading a new manifest to a registry… is different but not obviously inferior, of course as long as that does not trigger a transfer of gigabytes of the raw data. (Compare your proposal to replace config maps with on-registry data, that’s the same thing!) I can see an argument that it’s a “worse is better” solution, giving up on explicitly modeling the versioning of the data, to get the benefit of much easier integration into existing platforms.
That’s very unclear to me; it would force giving every user a writeable registry space, and something would need to manage credentials to that registry space, both for users and for nodes deploying the configuration; vs. using native Kubernetes RBAC. And there is some overhead to talking to the registry, even assuming the data is stored directly inside the manifest and to avoid extra per-item roundtrips. I suppose most, or all, of that, could be hidden by tooling. |
Yes, definitely. Well, users might want to use some higher-level tools instead, but the underlying capability needs to be there.
Yes, vaguely.
This is still assuming that artifacts play a role at all. Maybe?
That sounds technically correct but inconvenient to use in applications, forcing them to include an OCI parser. And in a naive implementation in would result in two on-registry representations, one as an artifact with raw data, and one as a tar containing the My strawman would be
which:
Where the [1] step finding the right layer would probably rely on the annotation created in [2]. (We’d also need a “bootstrapping” version that creates the annotation on ordinary COPY.) This allows
|
The app wouldn't need to be OCI aware exactly, assuming the artifact has just one layer it could just call
I don't understand how your proposal avoids the double representation (once as tar, once not) either (mine doesn't), but fundamentally OCI images need to be tarballs today. I guess bigger picture it probably does make sense to just try to standardize the "ship ai models as uncompressed tar" plus we probably need standard for "this is architecture independent data". It's just a bit ugly. |
Hi 👋 We're looking into the usage of OCI Artifacts as a first-class citizen for a complementary storage solution for Model Registry. I would like to receive guidance please if in the short term is best for me to:
Some example use-cases summarized in this website and some demos. |
The way I’m thinking about it, the registry upload would only be as tar. The data might exist as non-tar in other places, sure. |
I think a useful framing is that “publishing OCI artifacts” is pretty close to like “publishing directories of files”; just the fact that it is an OCI artifact makes no promise of interoperability. Interoperability comes from consumers and producers agreeing on these other details. Very short-term, the only thing I’m aware that is in(?) some kind of production is https://kubernetes.io/blog/2024/08/16/kubernetes-1-31-image-volume-source/ , which requires an ordinary image, I think an artifact would not work at all. Beyond that… I think this is one of the many places where the design is being hashed out. |
Background:
Kubernetes 1.31 introduced a new feature: Read-Only Volumes Based on OCI Artifacts. I believe this feature could be very useful for deploying a dedicated model alongside Ollama in Kubernetes.
Ollama has introduced several new media types (e.g. application/vnd.ollama.image.model) for storing GGUF models, system prompts, and more. Each layer is essentially a file and does not need to be untarred.
A PR for
containers/image
has addedlayerFilename
toaddedLayerInfo
, and this PR handles the layer creation through overlay driver.Please see the following logs for instructions on how to mount the Ollama image as a volume: