Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document reproducible builds #123

Merged
merged 5 commits into from
Apr 1, 2020
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions content/docs/reference/reproducibility.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
+++
title="Reproducible Builds"
weight=9
+++

## Summary
The Cloud Native Buildpacks project aims to create "Reproducible Builds" of container images. For image creation commands (`create-builder`, `package-buildpackag`, `build`) `pack` aims to create in a reproducible fashion. "Reproducible" is hard to define but we'll do so by example:
zmackie marked this conversation as resolved.
Show resolved Hide resolved

---
Running `pack build` produce a container image with the same image ID (*local* case)
zmackie marked this conversation as resolved.
Show resolved Hide resolved

**Given**:
- A workspace directory containing the same source code
- A builder image with a *given*
zmackie marked this conversation as resolved.
Show resolved Hide resolved
- One or more buildpacks that produce identical layers given their input*
zmackie marked this conversation as resolved.
Show resolved Hide resolved

---
Running `pack build --publish` produce a container image with the same image digest (*remote* case)
zmackie marked this conversation as resolved.
Show resolved Hide resolved

**Given**:
- A workspace directory containing the same source code
- A builder image with a *given*
zmackie marked this conversation as resolved.
Show resolved Hide resolved
- One or more buildpacks that produce identical layers given their input

### Consequences and Caveats

We achieve reproducible builds by "zeroing" various timestamps of the layers that `pack` creates. When images are inspected (via something like `docker inspect`) they may have confusing creation times:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"layers that pack creates" -- is all of the zeroing done by pack? I thought the exporter was responsible for some of this. I think it's worth being precise about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that being precise here is useful, but the fact that pack creates these layers via the lifecycle is getting a bit into the weeds for this is as a first pass. I was aiming for a higher level overview of the behavior that users will observe. That said, I the goal in getting this out is to point folks who have questions about reproducibility here and then further improve the documentation. I could very well image having a more technical/architectural explanation of this concept on this page at some later point, but I don't know if we need that right now.

All that said, I'm open to being wrong on that!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, we can always add more detail later :)


```bash
REPOSITORY TAG IMAGE ID CREATED SIZE
cnbs/sample-builder <none> def52b23918d 40 years ago 234MB
sample-kotlin-app alpine 45dc2d2681a1 40 years ago 18.9MB
```

All that said, the CNB lifecycle cannot fix non-reproducible buildpack layer file contents. This means that the underlying buildpack and language ecosystem have to implement reproducible output (for example `go` binaries are reproducible by default).

A local and remote build will not produce the same image digest because:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very interesting. If true, I think we should dig into it deeper. Why does the local image need to be an image ID instead of a reference? References can existing in the daemon. See docker tag command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the key here is that local daemon stored images won't have a digest until pushed (this is because of docker):

○ → docker inspect cnbs/sample-builder:alpine | jq --raw-output '.[0].RepoDigests[]'
cnbs/sample-builder@sha256:9e3cfea3f90fb4fbbe855a2cc9ce505087ae10d6805cfcb44bd67a4b72628641

  |2.6.3| NY-Floater-15565 in ~/workspace
○ → docker inspect cnb-test | jq --raw-output '.[0].RepoDigests[]'
Error: No such object: cnb-test
jq: error (at <stdin>:1): Cannot iterate over null (null)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did confirm that the runImage.reference does change between whether you run pack build or pack build ... --publish. If we could calculate the image digest of a local image I'm curious why we don't. This is now totally irrelevant to this issue. Just something that was brought to light so thank you.

- The remote image will have an image digest reference in the `runImage.reference` field in the `io.buildpacks.lifecycle.metadata` label
- The local image will have an image ID in the `runImage.reference` field in the `io.buildpacks.lifecycle.metadata` label

This occurs because, in the daemon case, the run-image may not have a repository digest reference (if it was created locally).