Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate FIFO asset cache cleaner with Pack creation time behavior #302

Open
1 of 3 tasks
schneems opened this issue Jul 15, 2024 · 4 comments
Open
1 of 3 tasks

Comments

@schneems
Copy link
Contributor

schneems commented Jul 15, 2024

Noted in heroku/buildpacks#22, pack mucks with creation time.

The Ruby CNB supports caching assets from rake assets:precompile, but we need a way to expire them. Beyond asset performance, it's also used for features such as sprocket's "last 3 versions" of an asset being made available. (In the asset pipeline when you modify a file, it generates a new output with a new SHA in the filename. But things like emails might reference older asset SHAs so you want to keep the old ones around for a little while, this is accomplished by storing assets generated in the cache between builds.)

Likewise, we need some way to invalidate cache entries or the cache size would grow indefinitely. The classic buildpack caps the memory limit of the asset directory and when it hits that limit, it deletes the oldest files first as newer files are more likely to still be used or referenced.

The problem (as I'm guessing) is that if pack is messing with creation time then it's effectively giving all files a very similar creation time even if they were created months/years apart. This could mean that the most recently generated assets are now cleared from cache instead of the oldest ones which will result in random asset lookup failures at runtime that will be difficult to diagnose and debug.

TODO

  • Validate this is actually a problem.
  • Investigate if a workaround is possible.
  • Work with upstream to investigate long-term fixes.
@schneems
Copy link
Contributor Author

schneems commented Oct 10, 2024

Comment is a work in progress. I'm trying to de-tangle all the information about build reproducibility


mtime

Application use of mtime

Libraries and applications use the mtime of a file to determine last modified time of files. This is used in logic such as cache eviction of older files. While application owners and library authors might know that there are caveats around mtime information, it's generally considered to be a stable API to rely on.

CNB project mtime implications

The cloud native buildpack project generates OCI images. One desired property of an OCI image is deterministic builds. That is, given the same inputs, the project produces the same image byte-for-byte. Timestamps complicate determinisim.

Another important feature of CNBs is rebasing https://buildpacks.io/docs/for-platform-operators/how-to/integrate-ci/pack/cli/pack_rebase/. Rebasing allows for updating system dependencies (like a version of curl, for example) without the need to completely re-build the entire image.

CNB Features

For image creation commands (builder create, buildpack package, build) pack creates container images in a reproducible fashion.

[...]

We achieve reproducible builds by “zeroing” various timestamps of the layers of the output image. When images are inspected they may have confusing creation times (eg. “40 years ago”).

  • SOURCE_DATE_EPOCH environment variable allows the platform running pack to change the creation date of the resulting image. The RFC notes that setting this can affect reproducibility. I think if you used the same value you would get 100% build reproducibility
  • pack build --creation-time <value> Needs citation: I'm guessing this sets SOURCE_DATE_EPOCH under the hood.

History

This decreases the amount of storage required for builders. For example, if a new version of a builder is created containing some of the same buildpacks as a previous builder image, those buildpack layers should be reusable. Currently, immaterial timestamp differences may prevent this from happening.

@schneems
Copy link
Contributor Author

schneems commented Oct 15, 2024

Reproducing cache mtime behavior

Short: I was wrong, mtime of files from the cache is preserved, it's only runtime that's zero-d

Long:


Make a buildpack

Using a buildpack that prints the mtime of the app dir and caches it. Here's the buildpack:

$ exa /Users/rschneeman/Documents/projects/tmp/cache-mtime-buildpack/ --tree
/Users/rschneeman/Documents/projects/tmp/cache-mtime-buildpack
├── bin
│  ├── build
│  └── detect
└── buildpack.toml
$ cat /Users/rschneeman/Documents/projects/tmp/cache-mtime-buildpack/bin/build
#!/usr/bin/env bash

set -euo pipefail

layers_dir="$1"
env_dir="$2/env"
plan_path="$3"

echo -e '[types]\nbuild= true\nlaunch = true\ncache = true' > "${CNB_LAYERS_DIR}/muh_layer.toml"

muh_layer="${CNB_LAYERS_DIR}"/muh_layer
mkdir -p "${muh_layer}"

echo "App dir"
stat *

echo "Cache dir BEFORE"
stat "${muh_layer}/"* || echo "Empty"

# Copy files
cp -rf --preserve=timestamps . "${muh_layer}"

echo "Cache dir AFTER"
stat "${muh_layer}/"*

exit 0

Make an "app" with some files with an mtime

Make some files, note that they have different mtimes, about 1 minute apart:

$ mkdir -p /tmp/0fe01170bffe428074e4b8405c4ea8bc
# ...
$ ls
a.txt	b.txt
$ cat a.txt
hi
$ cat b.txt
there
$ stat *
16777230 81149285 -rw-r--r-- 1 rschneeman wheel 0 3 "Oct 15 15:13:35 2024" "Oct 15 15:13:35 2024" "Oct 15 15:13:35 2024" "Oct 15 15:13:35 2024" 4096 8 0 a.txt
16777230 81149917 -rw-r--r-- 1 rschneeman wheel 0 6 "Oct 15 15:14:40 2024" "Oct 15 15:14:40 2024" "Oct 15 15:14:40 2024" "Oct 15 15:14:40 2024" 4096 8 0 b.txt

This was run on a mac

First build of the buildpack

$ pack build test-mtime-app --path  /tmp/0fe01170bffe428074e4b8405c4ea8bc --buildpack  /Users/rschneeman/Documents/projects/tmp/cache-mtime-buildpack --builder cnbs/sample-builder:jammy
jammy: Pulling from cnbs/sample-builder
Digest: sha256:9db376e26252b6cbc75190c88ea24571da36ec0d373136be233b32fd174962fc
Status: Image is up to date for cnbs/sample-builder:jammy
jammy: Pulling from cnbs/sample-base-run
Digest: sha256:164610cae6066b77383cb0b67ac40e429745a5fdf3aa0018dd16d43bf40bc92d
Status: Image is up to date for cnbs/sample-base-run:jammy
0.20.0: Pulling from buildpacksio/lifecycle
Digest: sha256:ba1d771ec095df94eb75a667a2fe4178cf8d6f05cde6430c89c7168cd04fcfd3
Status: Image is up to date for buildpacksio/lifecycle:0.20.0
===> ANALYZING
[analyzer] Image with name "test-mtime-app" not found
===> DETECTING
[detector] examples/node-js 0.0.1
===> RESTORING
===> BUILDING
[builder] App dir
[builder]   File: a.txt
[builder]   Size: 3         	Blocks: 8          IO Block: 4096   regular file
[builder] Device: fe01h/65025d	Inode: 4896615     Links: 1
[builder] Access: (0644/-rw-r--r--)  Uid: ( 1000/     cnb)   Gid: ( 1000/     cnb)
[builder] Access: 2024-10-15 20:13:35.000000000 +0000
[builder] Modify: 2024-10-15 20:13:35.000000000 +0000
[builder] Change: 2024-10-15 20:46:29.414769009 +0000
[builder]  Birth: 2024-10-15 20:46:29.414769009 +0000
[builder]   File: b.txt
[builder]   Size: 6         	Blocks: 8          IO Block: 4096   regular file
[builder] Device: fe01h/65025d	Inode: 4896616     Links: 1
[builder] Access: (0644/-rw-r--r--)  Uid: ( 1000/     cnb)   Gid: ( 1000/     cnb)
[builder] Access: 2024-10-15 20:14:41.000000000 +0000
[builder] Modify: 2024-10-15 20:14:41.000000000 +0000
[builder] Change: 2024-10-15 20:46:29.414769009 +0000
[builder]  Birth: 2024-10-15 20:46:29.414769009 +0000
[builder] Cache dir BEFORE
[builder] stat: cannot statx '/layers/examples_node-js/muh_layer/*': No such file or directory
[builder] Empty
[builder] Cache dir AFTER
[builder]   File: /layers/examples_node-js/muh_layer/a.txt
[builder]   Size: 3         	Blocks: 8          IO Block: 4096   regular file
[builder] Device: fe01h/65025d	Inode: 4896641     Links: 1
[builder] Access: (0644/-rw-r--r--)  Uid: ( 1000/     cnb)   Gid: ( 1000/     cnb)
[builder] Access: 2024-10-15 20:13:35.000000000 +0000
[builder] Modify: 2024-10-15 20:13:35.000000000 +0000
[builder] Change: 2024-10-15 20:46:30.375769010 +0000
[builder]  Birth: 2024-10-15 20:46:30.375769010 +0000
[builder]   File: /layers/examples_node-js/muh_layer/b.txt
[builder]   Size: 6         	Blocks: 8          IO Block: 4096   regular file
[builder] Device: fe01h/65025d	Inode: 4896642     Links: 1
[builder] Access: (0644/-rw-r--r--)  Uid: ( 1000/     cnb)   Gid: ( 1000/     cnb)
[builder] Access: 2024-10-15 20:14:41.000000000 +0000
[builder] Modify: 2024-10-15 20:14:41.000000000 +0000
[builder] Change: 2024-10-15 20:46:30.375769010 +0000
[builder]  Birth: 2024-10-15 20:46:30.375769010 +0000
===> EXPORTING
[exporter] Adding layer 'examples/node-js:muh_layer'
[exporter] Adding layer 'buildpacksio/lifecycle:launch.sbom'
[exporter] Adding 1/1 app layer(s)
[exporter] Adding layer 'buildpacksio/lifecycle:launcher'
[exporter] Adding layer 'buildpacksio/lifecycle:config'
[exporter] Adding label 'io.buildpacks.lifecycle.metadata'
[exporter] Adding label 'io.buildpacks.build.metadata'
[exporter] Adding label 'io.buildpacks.project.metadata'
[exporter] no default process type
[exporter] Saving test-mtime-app...
[exporter] *** Images (5422a5113791):
[exporter]       test-mtime-app
[exporter] Reusing cache layer 'examples/node-js:muh_layer'
[exporter] Adding cache layer 'examples/node-js:muh_layer'
Successfully built image test-mtime-app

We can see that mtime of the incoming files is correct. They were modifed about 1 minute apart from each other:

[builder] App dir
[builder]   File: a.txt
# ...
[builder] Modify: 2024-10-15 20:13:35.000000000 +0000

[builder]   File: b.txt
# ...
[builder] Modify: 2024-10-15 20:14:41.000000000 +0000

We can see the cache dir afterwards uses the correct timestamps, they were modified one minute apart:

[builder] Cache dir AFTER
[builder]   File: /layers/examples_node-js/muh_layer/a.txt
# ...
[builder] Modify: 2024-10-15 20:13:35.000000000 +0000
[builder]   File: /layers/examples_node-js/muh_layer/b.txt
# ...
[builder] Modify: 2024-10-15 20:14:41.000000000 +0000
[builder] Change: 2024-10-15 20:46:30.375769010 +0000

Build with cache

$ pack build test-mtime-app --path  /tmp/0fe01170bffe428074e4b8405c4ea8bc --buildpack  /Users/rschneeman/Documents/projects/tmp/cache-mtime-buildpack --builder cnbs/sample-builder:jammy
jammy: Pulling from cnbs/sample-builder
Digest: sha256:9db376e26252b6cbc75190c88ea24571da36ec0d373136be233b32fd174962fc
Status: Image is up to date for cnbs/sample-builder:jammy
jammy: Pulling from cnbs/sample-base-run
Digest: sha256:164610cae6066b77383cb0b67ac40e429745a5fdf3aa0018dd16d43bf40bc92d
Status: Image is up to date for cnbs/sample-base-run:jammy
0.20.0: Pulling from buildpacksio/lifecycle
Digest: sha256:ba1d771ec095df94eb75a667a2fe4178cf8d6f05cde6430c89c7168cd04fcfd3
Status: Image is up to date for buildpacksio/lifecycle:0.20.0
===> ANALYZING
[analyzer] Restoring data for SBOM from previous image
===> DETECTING
[detector] examples/node-js 0.0.1
===> RESTORING
[restorer] Restoring metadata for "examples/node-js:muh_layer" from app image
[restorer] Restoring data for "examples/node-js:muh_layer" from cache
===> BUILDING
[builder] App dir
[builder]   File: a.txt
[builder]   Size: 3         	Blocks: 8          IO Block: 4096   regular file
[builder] Device: fe01h/65025d	Inode: 4896609     Links: 1
[builder] Access: (0644/-rw-r--r--)  Uid: ( 1000/     cnb)   Gid: ( 1000/     cnb)
[builder] Access: 2024-10-15 20:13:35.000000000 +0000
[builder] Modify: 2024-10-15 20:13:35.000000000 +0000
[builder] Change: 2024-10-15 20:48:33.402491011 +0000
[builder]  Birth: 2024-10-15 20:48:33.402491011 +0000
[builder]   File: b.txt
[builder]   Size: 6         	Blocks: 8          IO Block: 4096   regular file
[builder] Device: fe01h/65025d	Inode: 4896611     Links: 1
[builder] Access: (0644/-rw-r--r--)  Uid: ( 1000/     cnb)   Gid: ( 1000/     cnb)
[builder] Access: 2024-10-15 20:14:41.000000000 +0000
[builder] Modify: 2024-10-15 20:14:41.000000000 +0000
[builder] Change: 2024-10-15 20:48:33.402491011 +0000
[builder]  Birth: 2024-10-15 20:48:33.402491011 +0000
[builder] Cache dir BEFORE
[builder]   File: /layers/examples_node-js/muh_layer/a.txt
[builder]   Size: 3         	Blocks: 8          IO Block: 4096   regular file
[builder] Device: fe01h/65025d	Inode: 4896643     Links: 1
[builder] Access: (0644/-rw-r--r--)  Uid: ( 1000/     cnb)   Gid: ( 1000/     cnb)
[builder] Access: 2024-10-15 20:48:33.929491011 +0000
[builder] Modify: 2024-10-15 20:48:33.929491011 +0000
[builder] Change: 2024-10-15 20:48:33.929491011 +0000
[builder]  Birth: 2024-10-15 20:48:33.929491011 +0000
[builder]   File: /layers/examples_node-js/muh_layer/b.txt
[builder]   Size: 6         	Blocks: 8          IO Block: 4096   regular file
[builder] Device: fe01h/65025d	Inode: 4896616     Links: 1
[builder] Access: (0644/-rw-r--r--)  Uid: ( 1000/     cnb)   Gid: ( 1000/     cnb)
[builder] Access: 2024-10-15 20:48:33.929491011 +0000
[builder] Modify: 2024-10-15 20:48:33.929491011 +0000
[builder] Change: 2024-10-15 20:48:33.929491011 +0000
[builder]  Birth: 2024-10-15 20:48:33.929491011 +0000
[builder] Cache dir AFTER
[builder]   File: /layers/examples_node-js/muh_layer/a.txt
[builder]   Size: 3         	Blocks: 8          IO Block: 4096   regular file
[builder] Device: fe01h/65025d	Inode: 4896643     Links: 1
[builder] Access: (0644/-rw-r--r--)  Uid: ( 1000/     cnb)   Gid: ( 1000/     cnb)
[builder] Access: 2024-10-15 20:13:35.000000000 +0000
[builder] Modify: 2024-10-15 20:13:35.000000000 +0000
[builder] Change: 2024-10-15 20:48:34.319491011 +0000
[builder]  Birth: 2024-10-15 20:48:33.929491011 +0000
[builder]   File: /layers/examples_node-js/muh_layer/b.txt
[builder]   Size: 6         	Blocks: 8          IO Block: 4096   regular file
[builder] Device: fe01h/65025d	Inode: 4896616     Links: 1
[builder] Access: (0644/-rw-r--r--)  Uid: ( 1000/     cnb)   Gid: ( 1000/     cnb)
[builder] Access: 2024-10-15 20:14:41.000000000 +0000
[builder] Modify: 2024-10-15 20:14:41.000000000 +0000
[builder] Change: 2024-10-15 20:48:34.319491011 +0000
[builder]  Birth: 2024-10-15 20:48:33.929491011 +0000
===> EXPORTING
[exporter] Reusing layer 'examples/node-js:muh_layer'
[exporter] Reusing layer 'buildpacksio/lifecycle:launch.sbom'
[exporter] Reusing 1/1 app layer(s)
[exporter] Reusing layer 'buildpacksio/lifecycle:launcher'
[exporter] Reusing layer 'buildpacksio/lifecycle:config'
[exporter] Adding label 'io.buildpacks.lifecycle.metadata'
[exporter] Adding label 'io.buildpacks.build.metadata'
[exporter] Adding label 'io.buildpacks.project.metadata'
[exporter] no default process type
[exporter] Saving test-mtime-app...
[exporter] *** Images (5422a5113791):
[exporter]       test-mtime-app
[exporter] Reusing cache layer 'examples/node-js:muh_layer'
[exporter] Adding cache layer 'examples/node-js:muh_layer'
Successfully built image test-mtime-app

We see the mtime is preserved

[builder] Cache dir AFTER
[builder]   File: /layers/examples_node-js/muh_layer/a.txt
# ...
[builder] Modify: 2024-10-15 20:13:35.000000000 +0000
[builder]   File: /layers/examples_node-js/muh_layer/b.txt
# ...
[builder] Modify: 2024-10-15 20:14:41.000000000 +0000

@schneems
Copy link
Contributor Author

Therefore: I think this is maybe not an issue. The mtime of the files in the cache seem to be preserved. The launch mtime will be zero-d but that doesn't affect us. I was only worried about cache cleaning.

One caveat is that we want to make sure any files copied have a preserved mtime i.e. cp -r --preserve=timestamps . "${muh_layer}"

I need to validate that we're copying and preserving mtime in the buildpack

@schneems
Copy link
Contributor Author

fs_extras does not explicitly copy mtime there is a PR implementing this behavior webdesus/fs_extra#53. The library cp_r https://crates.io/crates/cp_r does copy mtime, however it's lacking required features (no option to overwrite existing file) also it does not raise an error when there's a problem copying an mtime https://github.com/sourcefrog/cp_r/blob/095ea5d0c10bbdc487a0e3671752df6be8d17ef9/src/lib.rs#L475 which I think we want to not have that silently fail.

One path forward is to copy files and then in a second operation, copy the mtimes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant