Skip to content

Commit

Permalink
Fix reuse of existing layers twice in the same image
Browse files Browse the repository at this point in the history
- When we extract a layer, allow reusing it only by the DiffID, not
  by the compressed digest; we don't have the compressed data, and
  reusing by compressed digest would result (via PutLayer LayerOptions.OriginalDigest)
  in a layer with an compressed CompressedDigest value, but an uncompressed
  CompressedSize value.

  Reuse by DiffID is quite a bit less likely to lead to a match
  in TryReusingBlob, probably causing us to find the reused layer and having to
  extract it again.

  We could improve on this by recording more data; for now, let's just assume
  that images which reuse the same compressed layer twice are pretty rare, and
  prefer simpler code.

- On the positive side, record the item in fileSizes, so that we actually do
  find the layer in TryReusing, and not happen to reuse the file purely
  by accident.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
  • Loading branch information
mtrmac committed Feb 13, 2024
1 parent 1f3b20c commit 5567453
Showing 1 changed file with 19 additions and 9 deletions.
28 changes: 19 additions & 9 deletions storage/storage_dest.go
Original file line number Diff line number Diff line change
Expand Up @@ -87,10 +87,12 @@ type storageImageDestinationLockProtected struct {
// When creating a layer, the c/storage layer metadata and image IDs must _only_ be based on trusted values
// we have computed ourselves. (Layer reuse can then look up against such trusted values, but it might not
// recompute those values for incomding layers — the point of the reuse is that we don’t need to consume the incoming layer.)
blobDiffIDs map[digest.Digest]digest.Digest // Mapping from layer blobsums to their corresponding DiffIDs
indexToTOCDigest map[int]digest.Digest // Mapping from layer index to a TOC Digest, IFF the layer was created/found/reused by TOC digest
fileSizes map[digest.Digest]int64 // Mapping from layer blobsums to their sizes
filenames map[digest.Digest]string // Mapping from layer blobsums to names of files we used to hold them
blobDiffIDs map[digest.Digest]digest.Digest // Mapping from layer blobsums to their corresponding DiffIDs
indexToTOCDigest map[int]digest.Digest // Mapping from layer index to a TOC Digest, IFF the layer was created/found/reused by TOC digest
// Mapping from layer blobsums to names of files we used to hold them. If set, fileSizes and blobDiffIDs must also be set.
filenames map[digest.Digest]string
// Mapping from layer blobsums to their sizes. If set, filenames and blobDiffIDs must also be set.
fileSizes map[digest.Digest]int64
blobAdditionalLayer map[digest.Digest]storage.AdditionalLayer // Mapping from layer blobsums to their corresponding additional layer
diffOutputs map[int]*graphdriver.DriverWithDifferOutput // Mapping from layer index to a partially-pulled layer intermediate data
}
Expand Down Expand Up @@ -138,8 +140,8 @@ func newImageDestination(sys *types.SystemContext, imageRef storageReference) (*
indexToAddedLayerInfo: make(map[int]addedLayerInfo),
blobDiffIDs: make(map[digest.Digest]digest.Digest),
indexToTOCDigest: make(map[int]digest.Digest),
fileSizes: make(map[digest.Digest]int64),
filenames: make(map[digest.Digest]string),
fileSizes: make(map[digest.Digest]int64),
blobAdditionalLayer: make(map[digest.Digest]storage.AdditionalLayer),
diffOutputs: make(map[int]*graphdriver.DriverWithDifferOutput),
},
Expand Down Expand Up @@ -387,6 +389,8 @@ func (s *storageImageDestination) tryReusingBlobAsPending(blobDigest digest.Dige

// Check if we've already cached it in a file.
if size, ok := s.lockProtected.fileSizes[blobDigest]; ok {
// s.lockProtected.blobDiffIDs is set either by putBlobToPendingFile or in createNewLayer when creating the
// filenames/fileSizes entry.
return true, private.ReusedBlob{
Digest: blobDigest,
Size: size,
Expand Down Expand Up @@ -843,16 +847,22 @@ func (s *storageImageDestination) createNewLayer(index int, layerDigest digest.D
// Copy the data to the file.
// TODO: This can take quite some time, and should ideally be cancellable using
// ctx.Done().
_, err = io.Copy(file, diff)
fileSize, err := io.Copy(file, diff)
diff.Close()
file.Close()
if err != nil {
return nil, fmt.Errorf("storing blob to file %q: %w", filename, err)
}
// Make sure that we can find this file later, should we need the layer's
// contents again.
// Allow using the already-collected layer contents without extracting the layer again.
//
// This only matches against the uncompressed digest.
// We don’t have the original compressed data here to trivially set filenames[layerDigest].
// In particular we can’t achieve the correct Layer.CompressedSize value with the current c/storage API.
// Within-image layer reuse is probably very rare, for now we prefer to avoid that complexity.
s.lock.Lock()
s.lockProtected.filenames[layerDigest] = filename
s.lockProtected.blobDiffIDs[diffID] = diffID
s.lockProtected.filenames[diffID] = filename
s.lockProtected.fileSizes[diffID] = fileSize
s.lock.Unlock()
}
// Read the cached blob and use it as a diff.
Expand Down

0 comments on commit 5567453

Please sign in to comment.