Skip to content
This repository has been archived by the owner on Oct 2, 2023. It is now read-only.

For gzip operations, use a new Go helper by default instead of which(gzip) #1613

Merged
merged 11 commits into from
Sep 8, 2020

Conversation

faube
Copy link
Contributor

@faube faube commented Sep 2, 2020

This is mostly for consistency across platforms, and to remove an external dependency.

…tool.

This is mostly for consistency across platforms, and to remove an external dependency.
@faube
Copy link
Contributor Author

faube commented Sep 2, 2020

/assign @smukherj1

load("@io_bazel_rules_go//go:def.bzl", "go_binary", "go_library")

# gazelle:prefix github.com/bazelbuild/rules_docker
gazelle(name = "gazelle")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be unnecessary given we already have it here https://github.com/bazelbuild/rules_docker/blob/master/BUILD

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

skylib/zip.bzl Outdated
)
gzip_path = _gzip_path(toolchain_info)

if gzip_path == "":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this if/else condition has a few subtleties, I suggest refactoring it into a function and re-using it for both gzip & gunzip and pass in parameters that change (I think there should only be 2-3 such parameters?)

Regarding the condition itself, IIUC what we want to do is the following in order of precedence:

  1. If neither gzip_path nor gzip_target is specified, use internal tool
  2. else if gzip_path is specified, use the gzip tool at the specified absolute path
  3. else if gzip_target is specified, build the specified target and use it assuming it accepts the same flags as gzip

There's a possibility if gzip_path == "" at the beginning might break this because, if gzip_target is specified, gzip_path might be blank (or maybe it's None?). If gzip_path defaults to None instead of "" when gzip_target is set, your logic still works but given this is very subtle, I suggest commenting exactly which scenario we're hitting above the if and else conditions and the values we expect for the gzip_* variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Note that the gzip_path == "" was done after resolving gzip_path from the gzip_path/gzip_target attributes.
I replaced it with if not gzip_path to avoid the None issue.
Also added some comments.

@@ -238,13 +238,13 @@ container_push(
# BEGIN_DO_NOT_IMPORT
file_test(
name = "test_digest_output1",
content = "sha256:18e27a73e3faabc819f903fc8e603de2bfd49813e231c2e8ce98b533577450e2",
content = "sha256:4ec78686f1edd4f30cca6222f17ace5a2b9f8db9708727c3a140c954af0a534d",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm curious why the digest changed given we're still using gzip in the Go tool. Are we specifying a different compression level than before? In any case, I expect you'll need to update a bunch of digests here (https://github.com/bazelbuild/rules_docker/blob/master/container/image_test.py) as well in case we can't get the digests to match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried digging into this a little. None of the compression levels in compress/gzip produce a digest that matches the gzip tool. There may be subtle differences in how gzip and compress/gzip package the header or something?

//container/image_test.py passes; I suspect it's because we never verify the digest of gzipped files, only manifests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried digging into this a little. None of the compression levels in compress/gzip produce a digest that matches the gzip tool. There may be subtle differences in how gzip and compress/gzip package the header or something?

SGTM. Thanks for looking into it.

//container/image_test.py passes; I suspect it's because we never verify the digest of gzipped files, only manifests.

This is weird. I believe manifest transitively includes the digests the compressed layers. I don't remember if the manifest contents directly list the layer digests or the manifest which includes the digest of the image config which in turn includes the digests of the compressed layers. Note that some images only include uncompressed layers. However, I think most of our images include compressed layers. I'm a little worried that the current implementation isn't actually calling the Go tool to compress the layers for the images used by this test which may be undesirable for your use case :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comment below: #1613 (comment)

if err != nil {
log.Fatalf("Unable to read input file: %v", err)
}
var buf bytes.Buffer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest using io.Copy (https://pkg.go.dev/io?tab=doc#Copy) to directly copy from a buffer pointing to the source file to a buffer pointing to the destination file. This avoids loading the entire src into memory.

When decompressing, the src would be a gzip.NewReader(dat), dst would be os.Create (https://pkg.go.dev/os?tab=doc#Create). When compressing, src would be dat and dst would be gzip.NewWriterLevel wrapping a os.Create.

Finally use io.Copy (https://pkg.go.dev/io?tab=doc#Copy) to copy from the source to destination buffer. If you declare io.Reader (https://pkg.go.dev/io?tab=doc#Reader) and io.Writer interface variables outside the if conditions, you can have one io.Copy at the end that handles both cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

@smukherj1 smukherj1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit and one suggested exploration to ensure your change does what you wanted it to do. LGTM otherwise. Thanks for working on this!

skylib/zip.bzl Outdated
mnemonic = "GZIP",
tools = tools,
)
_gzip(ctx, artifact, out, False, options, "GZIP")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: here and below use named parameters to improve readability. Example:

_gzip(ctx=ctx,
         artifact=artifact,
         ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -238,13 +238,13 @@ container_push(
# BEGIN_DO_NOT_IMPORT
file_test(
name = "test_digest_output1",
content = "sha256:18e27a73e3faabc819f903fc8e603de2bfd49813e231c2e8ce98b533577450e2",
content = "sha256:4ec78686f1edd4f30cca6222f17ace5a2b9f8db9708727c3a140c954af0a534d",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried digging into this a little. None of the compression levels in compress/gzip produce a digest that matches the gzip tool. There may be subtle differences in how gzip and compress/gzip package the header or something?

SGTM. Thanks for looking into it.

//container/image_test.py passes; I suspect it's because we never verify the digest of gzipped files, only manifests.

This is weird. I believe manifest transitively includes the digests the compressed layers. I don't remember if the manifest contents directly list the layer digests or the manifest which includes the digest of the image config which in turn includes the digests of the compressed layers. Note that some images only include uncompressed layers. However, I think most of our images include compressed layers. I'm a little worried that the current implementation isn't actually calling the Go tool to compress the layers for the images used by this test which may be undesirable for your use case :)

@faube
Copy link
Contributor Author

faube commented Sep 3, 2020

1613.zip
I looked at test_files_base and I am attaching the resulting files_base.tar
None of the layers in the file are .gz
In my experience, the intermediate layers that appear on disk (e.g. files_base-layer.tar.gz) are gz but the final image file that gets built in a :imagename.tar rule (as inspected in image_test) has no compression, so I'm under the impression that this is all WAI.

@faube
Copy link
Contributor Author

faube commented Sep 3, 2020

Similarly, if I 'docker save' a Docker image to a tarball, none of the internal layers are gz.

@smukherj1
Copy link
Collaborator

1613.zip
I looked at test_files_base and I am attaching the resulting files_base.tar
None of the layers in the file are .gz
In my experience, the intermediate layers that appear on disk (e.g. files_base-layer.tar.gz) are gz but the final image file that gets built in a :imagename.tar rule (as inspected in image_test) has no compression, so I'm under the impression that this is all WAI.

TL; DR I'll approve this to make progress but I suggest keeping this point in mind if you notice the Go compressor isn't being called.

The tarball writer is special (See here). The tarball always writes out uncompressed layers even if the layer media type indicated the layer is compressed. Thus, given that you noticed the compression by the Go tool results in a different digest, I would expect the digest of compressed layers to have changed as well resulting in image manifests being updated. You would need to see the image manifest (not the one in the tarball, the tarball manifest has a different format.

@smukherj1
Copy link
Collaborator

/gcbrun

@smukherj1
Copy link
Collaborator

@faube there's a buildifier formatting issue that's making buildkite fail:

buildifier: found 1 lint issue in your WORKSPACE, BUILD and *.bzl files
skylib/zip.bzl:31: function-docstring: Argument "mnemonic" is not documented.

@k8s-ci-robot k8s-ci-robot removed the lgtm label Sep 8, 2020
@smukherj1
Copy link
Collaborator

/gcbrun

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: faube, smukherj1

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rbe-toolchains-pr-bot rbe-toolchains-pr-bot merged commit 5237def into bazelbuild:master Sep 8, 2020
cgdolan pushed a commit to cgdolan/rules_docker that referenced this pull request Sep 13, 2020
jamiees2 pushed a commit to jamiees2/rules_docker that referenced this pull request Sep 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants