Skip to content

Commit

Permalink
image: cas: implement CAS engine
Browse files Browse the repository at this point in the history
Signed-off-by: Aleksa Sarai <asarai@suse.com>
  • Loading branch information
cyphar committed Nov 5, 2016
1 parent cf1277a commit 667271b
Show file tree
Hide file tree
Showing 6 changed files with 635 additions and 1 deletion.
6 changes: 5 additions & 1 deletion glide.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 10 additions & 0 deletions image/cas/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
### `umoci/image/cas` ###

This is a reimplemented version of the currently in-flight [`image-tools` CAS
PR][cas-pr], which combines the `cas` and `refs` interfaces into a single
`Engine` that represents the image. In addition, I've implemented more

This comment has been minimized.

Copy link
@wking

wking Nov 6, 2016

Contributor

Any particular reason to combine refs and CAs? They seem orthogonal to me, but having them separate does mean two engines to open, pass around, and close.

This comment has been minimized.

Copy link
@cyphar

cyphar Nov 6, 2016

Author Member

To a user, they are not orthogonal. They are both part of the same image. This isn't a generic CAS, it's an interface for OCI images -- IMO your PR is way over-engineered.

This comment has been minimized.

Copy link
@wking

wking Nov 6, 2016

Contributor

To a user, they are not orthogonal.

Your user still has to decide whether to call GetReference or GetBlob, so I don't see any higher-level abstractions resulting from putting both in a single interface. I have earlier thoughts on separate refs/CAS backends in docker-archive/docker-registry#704, and with an approach like that you could be using different URIs for the refs and CAS (although you don't need different URIs if you use the same image-layout for both).

But if the doubled opens, passing, and closings are a problem, I can certainly give my PR a:

type Engine interface {
    CAS *cas.Engine
    Refs *refs.Engine
}

add methods for generating one with a single image-layout open, and use the dual engine where I currently need both CAS and refs. Would that make my engine interfaces more convenient?

This comment has been minimized.

Copy link
@cyphar

cyphar Nov 6, 2016

Author Member

An OCI image will always have refs, and will always have blobs. There is no point in splitting the two sets of semantics (internally it can be implemented differently). Your proposed interface could work, but then you have two different list interfaces for blobs and refs -- it's just a matter of taste.

auto-detection and creature comforts.

When the PR is merged, these changes will probably go upstream as well.

[cas-pr]: https://github.com/opencontainers/image-tools/pull/5
116 changes: 116 additions & 0 deletions image/cas/blob.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
/*
* umoci: Umoci Modifies Open Containers' Images
* Copyright (C) 2016 SUSE LLC.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package cas

import (
"encoding/json"
"io"

"github.com/opencontainers/image-spec/specs-go/v1"
"golang.org/x/net/context"
)

// Blob represents a "parsed" blob in an OCI image's blob store. MediaType
// offers a type-safe way of checking what the type of Data is.
type Blob struct {
// MediaType is the OCI media type of Data.
MediaType string

// Digest is the digest of the parsed image. Note that this does not update
// if Data is changed (it is the digest that this blob was parsed *from*).
Digest string

This comment has been minimized.

Copy link
@wking

wking Nov 6, 2016

Contributor

You have MediaType and Digest. Maybe just use a Descriptor so you can hold Size too. That would let you validate the size (if you knew it) when fetching from CAS.


// Data is the "parsed" blob taken from the OCI image's blob store, and is
// typed according to the media type. The mapping from MIME => type is as
// follows.
//
// v1.MediaTypeDescriptor => *v1.Descriptor
// v1.MediaTypeImageManifest => *v1.Manifest
// v1.MediaTypeImageManifestList => *v1.ManifestList
// v1.MediaTypeImageLayer => io.ReadCloser
// v1.MediaTypeImageLayerNonDistributable => io.ReadCloser
// v1.MediaTypeImageConfig => *v1.Image
Data interface{}
}

func (b *Blob) load(ctx context.Context, engine Engine) error {
reader, err := engine.GetBlob(ctx, b.Digest)
if err != nil {
return err
}

// The layer media types are special, we don't want to do any parsing (or
// close the blob reference).
switch b.MediaType {
// v1.MediaTypeImageLayer => io.ReadCloser
// v1.MediaTypeImageLayerNonDistributable => io.ReadCloser
case v1.MediaTypeImageLayer, v1.MediaTypeImageLayerNonDistributable:
// There isn't anything else we can practically do here.
b.Data = reader
return nil
}

defer reader.Close()

var parsed interface{}
switch b.MediaType {
// v1.MediaTypeDescriptor => *v1.Descriptor
case v1.MediaTypeDescriptor:
parsed = &v1.Descriptor{}
// v1.MediaTypeImageManifest => *v1.Manifest
case v1.MediaTypeImageManifest:
parsed = &v1.Manifest{}
// v1.MediaTypeImageManifestList => *v1.ManifestList
case v1.MediaTypeImageManifestList:
parsed = &v1.ManifestList{}
// v1.MediaTypeImageConfig => *v1.ImageConfig
case v1.MediaTypeImageConfig:
parsed = &v1.Image{}
}

if err := json.NewDecoder(reader).Decode(parsed); err != nil {
return err
}

b.Data = parsed
return nil
}

func (b *Blob) Close() {
switch b.MediaType {
case v1.MediaTypeImageLayer, v1.MediaTypeImageLayerNonDistributable:
if b.Data != nil {
b.Data.(io.Closer).Close()
}
}
}

// FromDescriptor parses the blob referenced by the given descriptor.
func FromDescriptor(ctx context.Context, engine Engine, descriptor *v1.Descriptor) (*Blob, error) {
blob := &Blob{
MediaType: descriptor.MediaType,
Digest: descriptor.Digest,
Data: nil,
}

if err := blob.load(ctx, engine); err != nil {
return nil, err
}

return blob, nil
}
151 changes: 151 additions & 0 deletions image/cas/cas.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
/*
* umoci: Umoci Modifies Open Containers' Images
* Copyright (C) 2016 SUSE LLC.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package cas

import (
"errors"
"fmt"
"io"
"os"
"path/filepath"
"strings"

"github.com/opencontainers/image-spec/specs-go/v1"
"golang.org/x/net/context"
)

const (
// BlobAlgorithm is the name of the only supported digest algorithm for blobs.
// FIXME: We can make this a list.
BlobAlgorithm = "sha256"

// refDirectory is the directory inside an OCI image that contains references.
refDirectory = "refs"

// blobDirectory is the directory inside an OCI image that contains blobs.
blobDirectory = "blobs"

// layoutFile is the file in side an OCI image the indicates what version
// of the OCI spec the image is.
layoutFile = "oci-layout"
)

// Exposed errors.
var (
// ErrInvalid is returned when an image was detected as being invalid.
ErrInvalid = errors.New("invalid image detected")

// ErrNotImplemented is returned when a requested operation has not been
// implementing the backing image store.
ErrNotImplemented = errors.New("operation not implemented")

// ErrClobber is returned when a requested operation would require clobbering a
// reference or blob which already exists.
ErrClobber = errors.New("operation would clobber existing object")
)

// Engine is an interface that provides methods for accessing and modifying an
// OCI image, namely allowing access to reference descriptors and blobs.
type Engine interface {
// PutBlob adds a new blob to the image. This is idempotent; a nil error
// means that "the content is stored at DIGEST" without implying "because
// of this PutBlob() call".
PutBlob(ctx context.Context, reader io.Reader) (digest string, size int64, err error)

// PutBlobJSON adds a new JSON blob to the image (marshalled from the given
// interface). This is equivalent to calling PutBlob() with a JSON payload
// as the reader. Note that due to intricacies in the Go JSON
// implementation, we cannot guarantee that two calls to PutBlobJSON() will
// return the same digest.
PutBlobJSON(ctx context.Context, data interface{}) (digest string, size int64, err error)

// PutReference adds a new reference descriptor blob to the image. This is
// idempotent; a nil error means that "the descriptor is stored at NAME"
// without implying "because of this PutReference() call". ErrClobber is
// returned if there is already a descriptor stored at NAME, but does not
// match the descriptor requested to be stored.
PutReference(ctx context.Context, name string, descriptor *v1.Descriptor) (err error)

// GetBlob returns a reader for retrieving a blob from the image, which the
// caller must Close(). Returns os.ErrNotExist if the digest is not found.
GetBlob(ctx context.Context, digest string) (reader io.ReadCloser, err error)

// GetReference returns a reference from the image. Returns os.ErrNotExist
// if the name was not found.
GetReference(ctx context.Context, name string) (descriptor *v1.Descriptor, err error)

// DeleteBlob removes a blob from the image. This is idempotent; a nil
// error means "the content is not in the store" without implying "because
// of this DeleteBlob() call".
DeleteBlob(ctx context.Context, digest string) (err error)

// DeleteReference removes a reference from the image. This is idempotent;
// a nil error means "the content is not in the store" without implying
// "because of this DeleteReference() call".
DeleteReference(ctx context.Context, name string) (err error)

// ListBlobs returns the set of blob digests stored in the image.
ListBlobs(ctx context.Context) (digests []string, err error)

This comment has been minimized.

Copy link
@wking

wking Nov 6, 2016

Contributor

I expect this will only be useful for the sweep GC pass. My future plan for GC in image-tools was to keep this internal to the GC implementation and expose a GCEngine in the image/cas package which added a trigger method but did not require a public blob-listing method. But you can always adjust this API if/when image-tools gets GC that looks good to you.

This comment has been minimized.

Copy link
@cyphar

cyphar Nov 6, 2016

Author Member

Doing this means that GC(...) can be agnostic of Engine.

This comment has been minimized.

Copy link
@wking

wking Nov 6, 2016

Contributor

Agreed, and there is some benefit to that. However, I don't think it's worth breaking a scalable CAS API to get generic GC. I am fine in having another layer, so on top:

  • Scalable CAS API (with an optional generic GC trigger).
    • Generic GC implementation.
      • Few-blob CAS API exposing this ListBlobs or similar.
    • Custom GC implementation more tightly bound to a particular CAS engine.

Callers that don't care about the GC implemenation would use the scalable, high-level interface. CAS engines that didn't care about scalability would target the few-blob interface and benefit from the generic GC implementation.

So I think exposing ListBlobs to consumers who don't care about the GC implementation is a mistake. But the mitigating factor is that consumers who don't care about the GC implementation are unlikely to use ListBlobs, so migrating if/when you eventually drop it shouldn't be too painful.

This comment has been minimized.

Copy link
@cyphar

cyphar Nov 6, 2016

Author Member

I will change this at my own discretion, currently I'm working on getting everything to work. The road to hell is paved with over-engineered interfaces.


// ListReferences returns the set of reference names stored in the image.
ListReferences(ctx context.Context) (names []string, err error)

This comment has been minimized.

Copy link
@wking

wking Nov 6, 2016

Contributor

The non-callback, list-all-refs approach works fine for small ref stores, but doesn't scale as well as the list API floated in opencontainers/image-tools#5. For umoci, that probably doesn't matter unless the image-tools PR lands and you want to use it as your future ref-engine interface/implementation. I'm not sure what the odds are for either of those conditions ;).

This comment has been minimized.

Copy link
@cyphar

cyphar Nov 6, 2016

Author Member

Let's be honest, if you need to have a built-in filter of the references for the iterator then the actual filepath.Walk will also be inefficient anyway. Besides, opencontainers/image-spec#449 is a much larger scaling issue that hasn't been addressed.

This comment has been minimized.

Copy link
@wking

wking Nov 6, 2016

Contributor

readdir(2) will still work well (at least, you won't have to hold the whole dir in memory). And as I point out in opencontainers/image-spec#449, it's nice when your public API allows folks to change to a more performant backend without API consumers having to update their calls. That makes “there are fundamental flaws in this backend strategy” a purely-backend problem which can be fixed without bothering API consumers, so you don't have to be as proactive about identifying those flaws.

This comment has been minimized.

Copy link
@cyphar

cyphar Nov 6, 2016

Author Member

I will change this at my own discretion, currently I'm working on getting everything to work. The road to hell is paved with over-engineered interfaces.


// Close releases all references held by the engine. Subsequent operations
// may fail.
Close() (err error)
}

// Open will create an Engine reference to the OCI image at the provided
// path. If the image format is not supported, ErrNotImplemented will be
// returned. If the path does not exist, os.ErrNotExist will be returned.
func Open(path string) (Engine, error) {
fi, err := os.Stat(path)
if err != nil {
return nil, err
}

if fi.IsDir() {
return newDirEngine(path)
}

return nil, ErrNotImplemented
}

// blobPath returns the path to a blob given its digest, relative to the root
// of the OCI image. The digest must be of the form algorithm:hex.
func blobPath(digest string) (string, error) {
fields := strings.SplitN(digest, ":", 2)
if len(fields) != 2 {
return "", fmt.Errorf("invalid digest: %q", digest)
}

algo := fields[0]
hash := fields[1]

if algo != BlobAlgorithm {
return "", fmt.Errorf("unsupported algorithm: %q", algo)
}

return filepath.Join(blobDirectory, algo, hash), nil
}

// refPath returns the path to a reference given its name, relative to the r
// oot of the OCI image.

This comment has been minimized.

Copy link
@wking

wking Nov 6, 2016

Contributor

You've split "root" over two lines.

func refPath(name string) (string, error) {
return filepath.Join(refDirectory, name), nil
}
Loading

0 comments on commit 667271b

Please sign in to comment.