-
Notifications
You must be signed in to change notification settings - Fork 691
For gzip operations, use a new Go helper by default instead of which(gzip) #1613
Conversation
…tool. This is mostly for consistency across platforms, and to remove an external dependency.
/assign @smukherj1 |
container/go/cmd/zipper/BUILD
Outdated
load("@io_bazel_rules_go//go:def.bzl", "go_binary", "go_library") | ||
|
||
# gazelle:prefix github.com/bazelbuild/rules_docker | ||
gazelle(name = "gazelle") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be unnecessary given we already have it here https://github.com/bazelbuild/rules_docker/blob/master/BUILD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
skylib/zip.bzl
Outdated
) | ||
gzip_path = _gzip_path(toolchain_info) | ||
|
||
if gzip_path == "": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this if/else condition has a few subtleties, I suggest refactoring it into a function and re-using it for both gzip & gunzip and pass in parameters that change (I think there should only be 2-3 such parameters?)
Regarding the condition itself, IIUC what we want to do is the following in order of precedence:
- If neither gzip_path nor gzip_target is specified, use internal tool
- else if gzip_path is specified, use the gzip tool at the specified absolute path
- else if gzip_target is specified, build the specified target and use it assuming it accepts the same flags as gzip
There's a possibility if gzip_path == ""
at the beginning might break this because, if gzip_target is specified, gzip_path might be blank (or maybe it's None?). If gzip_path defaults to None instead of "" when gzip_target is set, your logic still works but given this is very subtle, I suggest commenting exactly which scenario we're hitting above the if
and else
conditions and the values we expect for the gzip_* variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Note that the gzip_path == "" was done after resolving gzip_path from the gzip_path/gzip_target attributes.
I replaced it with if not gzip_path
to avoid the None issue.
Also added some comments.
@@ -238,13 +238,13 @@ container_push( | |||
# BEGIN_DO_NOT_IMPORT | |||
file_test( | |||
name = "test_digest_output1", | |||
content = "sha256:18e27a73e3faabc819f903fc8e603de2bfd49813e231c2e8ce98b533577450e2", | |||
content = "sha256:4ec78686f1edd4f30cca6222f17ace5a2b9f8db9708727c3a140c954af0a534d", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm curious why the digest changed given we're still using gzip in the Go tool. Are we specifying a different compression level than before? In any case, I expect you'll need to update a bunch of digests here (https://github.com/bazelbuild/rules_docker/blob/master/container/image_test.py) as well in case we can't get the digests to match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried digging into this a little. None of the compression levels in compress/gzip produce a digest that matches the gzip tool. There may be subtle differences in how gzip and compress/gzip package the header or something?
//container/image_test.py passes; I suspect it's because we never verify the digest of gzipped files, only manifests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried digging into this a little. None of the compression levels in compress/gzip produce a digest that matches the gzip tool. There may be subtle differences in how gzip and compress/gzip package the header or something?
SGTM. Thanks for looking into it.
//container/image_test.py passes; I suspect it's because we never verify the digest of gzipped files, only manifests.
This is weird. I believe manifest transitively includes the digests the compressed layers. I don't remember if the manifest contents directly list the layer digests or the manifest which includes the digest of the image config which in turn includes the digests of the compressed layers. Note that some images only include uncompressed layers. However, I think most of our images include compressed layers. I'm a little worried that the current implementation isn't actually calling the Go tool to compress the layers for the images used by this test which may be undesirable for your use case :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see comment below: #1613 (comment)
container/go/cmd/zipper/zipper.go
Outdated
if err != nil { | ||
log.Fatalf("Unable to read input file: %v", err) | ||
} | ||
var buf bytes.Buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest using io.Copy (https://pkg.go.dev/io?tab=doc#Copy) to directly copy from a buffer pointing to the source file to a buffer pointing to the destination file. This avoids loading the entire src into memory.
When decompressing, the src would be a gzip.NewReader(dat), dst would be os.Create (https://pkg.go.dev/os?tab=doc#Create). When compressing, src would be dat and dst would be gzip.NewWriterLevel wrapping a os.Create.
Finally use io.Copy (https://pkg.go.dev/io?tab=doc#Copy) to copy from the source to destination buffer. If you declare io.Reader (https://pkg.go.dev/io?tab=doc#Reader) and io.Writer interface variables outside the if conditions, you can have one io.Copy at the end that handles both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit and one suggested exploration to ensure your change does what you wanted it to do. LGTM otherwise. Thanks for working on this!
skylib/zip.bzl
Outdated
mnemonic = "GZIP", | ||
tools = tools, | ||
) | ||
_gzip(ctx, artifact, out, False, options, "GZIP") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: here and below use named parameters to improve readability. Example:
_gzip(ctx=ctx,
artifact=artifact,
...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -238,13 +238,13 @@ container_push( | |||
# BEGIN_DO_NOT_IMPORT | |||
file_test( | |||
name = "test_digest_output1", | |||
content = "sha256:18e27a73e3faabc819f903fc8e603de2bfd49813e231c2e8ce98b533577450e2", | |||
content = "sha256:4ec78686f1edd4f30cca6222f17ace5a2b9f8db9708727c3a140c954af0a534d", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried digging into this a little. None of the compression levels in compress/gzip produce a digest that matches the gzip tool. There may be subtle differences in how gzip and compress/gzip package the header or something?
SGTM. Thanks for looking into it.
//container/image_test.py passes; I suspect it's because we never verify the digest of gzipped files, only manifests.
This is weird. I believe manifest transitively includes the digests the compressed layers. I don't remember if the manifest contents directly list the layer digests or the manifest which includes the digest of the image config which in turn includes the digests of the compressed layers. Note that some images only include uncompressed layers. However, I think most of our images include compressed layers. I'm a little worried that the current implementation isn't actually calling the Go tool to compress the layers for the images used by this test which may be undesirable for your use case :)
1613.zip |
Similarly, if I 'docker save' a Docker image to a tarball, none of the internal layers are gz. |
TL; DR I'll approve this to make progress but I suggest keeping this point in mind if you notice the Go compressor isn't being called. The tarball writer is special (See here). The tarball always writes out uncompressed layers even if the layer media type indicated the layer is compressed. Thus, given that you noticed the compression by the Go tool results in a different digest, I would expect the digest of compressed layers to have changed as well resulting in image manifests being updated. You would need to see the image manifest (not the one in the tarball, the tarball manifest has a different format. |
/gcbrun |
@faube there's a buildifier formatting issue that's making buildkite fail:
|
/gcbrun |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: faube, smukherj1 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This is mostly for consistency across platforms, and to remove an external dependency.