Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[remotecache] Add Azure Blob Storage support #3010

Merged
merged 1 commit into from
Sep 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,29 @@ jobs:
run: |
hack/s3_test/run_test.sh

test-azblob:
runs-on: ubuntu-20.04
needs:
- base
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Expose GitHub Runtime
uses: crazy-max/ghaction-github-runtime@v2
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
with:
version: ${{ env.BUILDX_VERSION }}
driver-opts: image=${{ env.REPO_SLUG_ORIGIN }}
buildkitd-flags: --debug
-
name: Test
run: |
hack/azblob_test/run_test.sh

test-os:
runs-on: ${{ matrix.os }}
strategy:
Expand Down
45 changes: 45 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ Join `#buildkit` channel on [Docker Community Slack](http://dockr.ly/slack)
- [Local directory](#local-directory-1)
- [GitHub Actions cache (experimental)](#github-actions-cache-experimental)
- [S3 cache (experimental)](#s3-cache-experimental)
- [Azure Blob Storage cache (experimental)](#azure-blob-storage-cache-experimental)
- [Consistent hashing](#consistent-hashing)
- [Metadata](#metadata)
- [Systemd socket activation](#systemd-socket-activation)
Expand Down Expand Up @@ -493,6 +494,50 @@ Others options are:
* `manifests_prefix=<prefix>`: set global prefix to store / read manifests on s3 (default: `manifests/`)
* `name=<manifest>`: name of the manifest to use (default `buildkit`)

#### Azure Blob Storage cache (experimental)

```bash
buildctl build ... \
--output type=image,name=docker.io/username/image,push=true \
--export-cache type=azblob,account_url=https://myaccount.blob.core.windows.net,name=my_image \
--import-cache type=azblob,account_url=https://myaccount.blob.core.windows.net,name=my_image
```

The following attributes are required:
* `account_url`: The Azure Blob Storage account URL (default: `$BUILDKIT_AZURE_STORAGE_ACCOUNT_URL`)

Storage locations:
* blobs: `<account_url>/<container>/<prefix><blobs_prefix>/<sha256>`, default: `<account_url>/<container>/blobs/<sha256>`
* manifests: `<account_url>/<container>/<prefix><manifests_prefix>/<name>`, default: `<account_url>/<container>/manifests/<name>`

Azure Blob Storage configuration:
* `container`: The Azure Blob Storage container name (default: `buildkit-cache` or `$BUILDKIT_AZURE_STORAGE_CONTAINER` if set)
* `blobs_prefix`: Global prefix to store / read blobs on the Azure Blob Storage container (`<container>`) (default: `blobs/`)
* `manifests_prefix`: Global prefix to store / read blobs on the Azure Blob Storage container (`<container>`) (default: `manifests/`)

Azure Blob Storage authentication:

There are 2 options supported for Azure Blob Storage authentication:

* Any system using environment variables supported by the [Azure SDK for Go](https://docs.microsoft.com/en-us/azure/developer/go/azure-sdk-authentication). The configuration must be available for the buildkit daemon, not for the client.
* Secret Access Key, using the `secret_access_key` attribute to specify the primary or secondary account key for your Azure Blob Storage account. [Azure Blob Storage account keys](https://docs.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage)

`--export-cache` options:
* `type=azblob`
* `mode=<min|max>`: specify cache layers to export (default: `min`)
* `min`: only export layers for the resulting image
* `max`: export all the layers of all intermediate steps
* `prefix=<prefix>`: set global prefix to store / read files on the Azure Blob Storage container (`<container>`) (default: empty)
* `name=<manifest>`: specify name of the manifest to use (default: `buildkit`)
* Multiple manifest names can be specified at the same time, separated by `;`. The standard use case is to use the git sha1 as name, and the branch name as duplicate, and load both with 2 `import-cache` commands.

`--import-cache` options:
* `type=azblob`
* `prefix=<prefix>`: set global prefix to store / read files on the Azure Blob Storage container (`<container>`) (default: empty)
* `blobs_prefix=<prefix>`: set global prefix to store / read blobs on the Azure Blob Storage container (`<container>`) (default: `blobs/`)
* `manifests_prefix=<prefix>`: set global prefix to store / read manifests on the Azure Blob Storage container (`<container>`) (default: `manifests/`)
* `name=<manifest>`: name of the manifest to use (default: `buildkit`)

### Consistent hashing

If you have multiple BuildKit daemon instances but you don't want to use registry for sharing cache across the cluster,
Expand Down
209 changes: 209 additions & 0 deletions cache/remotecache/azblob/exporter.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
package azblob

import (
"bytes"
"context"
"encoding/json"
"fmt"
"io"
"time"

"github.com/Azure/azure-sdk-for-go/sdk/storage/azblob"
"github.com/containerd/containerd/content"
"github.com/moby/buildkit/cache/remotecache"
v1 "github.com/moby/buildkit/cache/remotecache/v1"
"github.com/moby/buildkit/session"
"github.com/moby/buildkit/solver"
"github.com/moby/buildkit/util/compression"
"github.com/moby/buildkit/util/progress"
digest "github.com/opencontainers/go-digest"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
)

// ResolveCacheExporterFunc for "azblob" cache exporter.
func ResolveCacheExporterFunc() remotecache.ResolveCacheExporterFunc {
return func(ctx context.Context, g session.Group, attrs map[string]string) (remotecache.Exporter, error) {
config, err := getConfig(attrs)
if err != nil {
return nil, fmt.Errorf("failed to create azblob config: %v", err)
}

containerClient, err := createContainerClient(ctx, config)
if err != nil {
return nil, fmt.Errorf("failed to create container client: %v", err)
}

cc := v1.NewCacheChains()
return &exporter{
CacheExporterTarget: cc,
chains: cc,
containerClient: containerClient,
config: config,
}, nil
}
}

var _ remotecache.Exporter = &exporter{}

type exporter struct {
solver.CacheExporterTarget
chains *v1.CacheChains
containerClient *azblob.ContainerClient
config *Config
}

func (ce *exporter) Finalize(ctx context.Context) (map[string]string, error) {
config, descs, err := ce.chains.Marshal(ctx)
if err != nil {
return nil, err
}

for i, l := range config.Layers {
dgstPair, ok := descs[l.Blob]
if !ok {
return nil, errors.Errorf("missing blob %s", l.Blob)
}
if dgstPair.Descriptor.Annotations == nil {
return nil, errors.Errorf("invalid descriptor without annotations")
}
var diffID digest.Digest
v, ok := dgstPair.Descriptor.Annotations["containerd.io/uncompressed"]
if !ok {
return nil, errors.Errorf("invalid descriptor without uncompressed annotation")
}
dgst, err := digest.Parse(v)
if err != nil {
return nil, errors.Wrapf(err, "failed to parse uncompressed annotation")
}
diffID = dgst

key := blobKey(ce.config, dgstPair.Descriptor.Digest.String())

exists, err := blobExists(ctx, ce.containerClient, key)
if err != nil {
return nil, err
}

logrus.Debugf("layers %s exists = %t", key, exists)

if !exists {
layerDone := progress.OneOff(ctx, fmt.Sprintf("writing layer %s", l.Blob))
ra, err := dgstPair.Provider.ReaderAt(ctx, dgstPair.Descriptor)
if err != nil {
return nil, layerDone(fmt.Errorf("failed to get reader for %s: %v", dgstPair.Descriptor.Digest, err))
}
if err := ce.uploadBlobIfNotExists(ctx, key, content.NewReader(ra)); err != nil {
return nil, layerDone(err)
}
layerDone(nil)
}

la := &v1.LayerAnnotations{
DiffID: diffID,
Size: dgstPair.Descriptor.Size,
MediaType: dgstPair.Descriptor.MediaType,
}
if v, ok := dgstPair.Descriptor.Annotations["buildkit/createdat"]; ok {
var t time.Time
if err := (&t).UnmarshalText([]byte(v)); err != nil {
return nil, err
}
la.CreatedAt = t.UTC()
}
config.Layers[i].Annotations = la
}

dt, err := json.Marshal(config)
if err != nil {
return nil, fmt.Errorf("failed to marshal config: %v", err)
}

for _, name := range ce.config.Names {
if innerError := ce.uploadManifest(ctx, manifestKey(ce.config, name), bytesToReadSeekCloser(dt)); innerError != nil {
return nil, errors.Errorf("error writing manifest %s: %v", name, innerError)
}
}

return nil, nil
}

func (ce *exporter) Config() remotecache.Config {
return remotecache.Config{
Compression: compression.New(compression.Default),
}
}

// For uploading manifests, use the Upload API which follows "last writer wins" sematics
// This is slightly slower than UploadStream call but is safe to call concurrently from multiple threads. Refer to:
// https://github.com/Azure/azure-sdk-for-go/issues/18490#issuecomment-1170806877
func (ce *exporter) uploadManifest(ctx context.Context, manifestKey string, reader io.ReadSeekCloser) error {
defer reader.Close()
blobClient, err := ce.containerClient.NewBlockBlobClient(manifestKey)
if err != nil {
return errors.Wrap(err, "error creating container client")
}

ctx, cnclFn := context.WithTimeout(ctx, time.Minute*5)
defer cnclFn()

_, err = blobClient.Upload(ctx, reader, &azblob.BlockBlobUploadOptions{})
if err != nil {
return errors.Wrapf(err, "failed to upload blob %s: %v", manifestKey, err)
}

return nil
}

// For uploading blobs, use the UploadStream with access conditions which state that only upload if the blob
// does not already exist. Since blobs are content addressable, this is the right thing to do for blobs and it gives
// a performance improvement over the Upload API used for uploading manifests.
func (ce *exporter) uploadBlobIfNotExists(ctx context.Context, blobKey string, reader io.Reader) error {
blobClient, err := ce.containerClient.NewBlockBlobClient(blobKey)
if err != nil {
return errors.Wrap(err, "error creating container client")
}

uploadCtx, cnclFn := context.WithTimeout(ctx, time.Minute*5)
defer cnclFn()

// Only upload if the blob doesn't exist
eTagAny := azblob.ETagAny
_, err = blobClient.UploadStream(uploadCtx, reader, azblob.UploadStreamOptions{
BufferSize: IOChunkSize,
MaxBuffers: IOConcurrency,
BlobAccessConditions: &azblob.BlobAccessConditions{
ModifiedAccessConditions: &azblob.ModifiedAccessConditions{
IfNoneMatch: &eTagAny,
},
},
})

if err == nil {
return nil
}

var se *azblob.StorageError
if errors.As(err, &se) && se.ErrorCode == azblob.StorageErrorCodeBlobAlreadyExists {
return nil
}

return errors.Wrapf(err, "failed to upload blob %s: %v", blobKey, err)
}

var _ io.ReadSeekCloser = &readSeekCloser{}

type readSeekCloser struct {
io.Reader
io.Seeker
io.Closer
}

func bytesToReadSeekCloser(dt []byte) io.ReadSeekCloser {
bytesReader := bytes.NewReader(dt)
return &readSeekCloser{
Reader: bytesReader,
Seeker: bytesReader,
Closer: io.NopCloser(bytesReader),
}
}
Loading