-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explode flattened buildpacks when pack build
#1787
Conversation
215b882
to
e610d3f
Compare
When `pack build --buildpack <flattened buildpack>` is invoked, `pack` doesn't know the buildpack is flattened, so it treats it like `pack build --buildpack bp/1 --buildpack bp2`, resulting in unnecessary tar files being created and duplicate layers being added to the ephemeral builder, which can exhaust disk space depending on the environment. This optimization avoids writing the tar file more than once by returning a no-op read closer when the diffID has already been queued for downloading. Signed-off-by: Natalie Arellano <narellano@vmware.com> Signed-off-by: Juan Bustamante <jbustamante@vmware.com>
3cb9f5a
to
6f76a88
Compare
97e1fda
to
c8041e7
Compare
285b0dc
to
da011d3
Compare
56fc8bf
to
d4441ca
Compare
calculate which buildpacks must be added into the ephemeral builder Skipping pack build -b <flattened-buildpack> in windows container because acceptance tests doesn't support it yet Signed-off-by: Juan Bustamante <jbustamante@vmware.com>
d4441ca
to
03808b7
Compare
Signed-off-by: Juan Bustamante <jbustamante@vmware.com>
Signed-off-by: Juan Bustamante <jbustamante@vmware.com>
Signed-off-by: Natalie Arellano <narellano@vmware.com>
Signed-off-by: Natalie Arellano <narellano@vmware.com>
Signed-off-by: Juan Bustamante <jbustamante@vmware.com>
Signed-off-by: Natalie Arellano <narellano@vmware.com>
Signed-off-by: Natalie Arellano <narellano@vmware.com>
Signed-off-by: Natalie Arellano <narellano@vmware.com>
Signed-off-by: Natalie Arellano <narellano@vmware.com>
Signed-off-by: Natalie Arellano <narellano@vmware.com>
Signed-off-by: Natalie Arellano <narellano@vmware.com>
Signed-off-by: Natalie Arellano <narellano@vmware.com>
Signed-off-by: Juan Bustamante <jbustamante@vmware.com>
// write the rest of the package | ||
var header *tar.Header | ||
for { | ||
header, err = tr.Next() |
Check failure
Code scanning / CodeQL
Arbitrary file write during zip extraction ("zip slip")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handled at line 435
if origVersion == "" { | ||
// the first header only contains the id - e.g., /cnb/buildpacks/<buildpack-id>, | ||
// read the next header to get the version | ||
secondHeader, err := tr.Next() |
Check failure
Code scanning / CodeQL
Arbitrary file write during zip extraction ("zip slip")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handled at line 397
) | ||
|
||
for { | ||
header, err = tr.Next() |
Check failure
Code scanning / CodeQL
Arbitrary file write during zip extraction ("zip slip")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handled at line 353
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we don't need to do it as part of this PR (because we use tar readers all over) but it would be cleaner to wrap tar.NewReader
with our own reader that sanitizes the path as part of calling tr.Next()
Signed-off-by: Juan Bustamante <jbustamante@vmware.com>
Signed-off-by: Juan Bustamante <jbustamante@vmware.com>
Signed-off-by: Juan Bustamante <jbustamante@vmware.com>
Signed-off-by: Juan Bustamante <jbustamante@vmware.com>
@natalieparellano @joe-kimmel-vmw @jkutner I will love if you can review the PR again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me :) Thanks for all of your hard work in getting this over the finish line @jjbustamante
IMO this is in good shape. I did make one tiny suggestion in this PR (set to merge into this branch): https://github.com/buildpacks/pack/pull/1829/files |
Summary
After the pack
0.30.0-pre2
was released some users reported some troubles when trying to use the newflatten
flag added to thepack builder create
andpack buildpack package
commands. See slack threadpack build app --buildpack
was used with a flattened buildpack on a GHA runner it was causing the runner to run out of disk spacepack build app --buildpack
was used with a flattened buildpack that is also present into the builder we want to use, pack was not able to deduplicate those buildpacks correctly.This PR is trying to fix both issues.
Output
Before
Issue 1, when a buildpack is flattened we are saving in the same layer N number of buildpacks, during
pack build
an ephemeral builder is created, in this process we read each buildpack layer tar to disk in a temp folder and then create this builder (that is deleted at the end of the command execution) adding those tars to it. Because each flattened layer contains more than 1 buildpack, the iteration over the layers creates a file on disk with all the flattened buildpacks. that is the reason for the process to consume more hard drive space.Issue 2, when pack is creating the ephemeral builder and some buildpacks are pass through
-b
flag, pack calculates the hash of the tar file inside the provided buildpack and then compares this value against the hash of the same buildpack in the builder, because the provided buildpack is flattened (there are more than 1 buildpack in the layer) we need to split those buildpacks into individual ones.After
To fix issue 1: This PR adds some logic to keep track of the layers DiffIds when a buildpack is fetched, if the same layer DiffID is found in more than 1 buildpack then only 1 of them returns the content of the layer tar (with all the other buildpacks inside of it) and the other returns an empty tar. In this way we only use the same hard drive space when the layer tar is written into disk.
To fix issue 2: This PR adds some logic when a flattened buildpack is fetched, Pack will try to write different tar files for each buildpack flattened inside a layer and after that the calculation of the hash to execute the deduplication logic remains the same.
Documentation
Related
Resolves #1798