Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reproducible builds #1250

Open
salrashid123 opened this issue Aug 7, 2023 · 14 comments
Open

Support reproducible builds #1250

salrashid123 opened this issue Aug 7, 2023 · 14 comments

Comments

@salrashid123
Copy link

Cog currently uses docker to build the images

however, docker based builds are not reproducible: you'll get different image hashes even with the identical config

this long-term feature request is to refactor the build system from docker to something like


some references building using kaniko and bazel

@technillogue
Copy link
Contributor

technillogue commented Aug 10, 2023

Hi, we've investigated this - SOURCE_DATE_EPOCH is a promising direction, and we tried approaches with resetting mtime for everything. Unfortunately, pip install is fundamentally irreproducible, because it generates pyc files that include the timestamp. Unzipping wheels without using pip might make this possible, or I think there's some PEPs in the works that might help with this. See pypa/pip#5648

@salrashid123
Copy link
Author

got it; i think esp with python it'd be difficult to do with its own toolchains.

maybe generating the docker file per #1241 (comment)

and then chaining it to off the shelf kaniko would be sufficient workaround ( ref

docker run   \
  -v `pwd`:/workspace -v $HOME/.docker/config_docker.json:/kaniko/.docker/config.json:ro \
   -v /var/run/docker.sock:/var/run/docker.sock \
     gcr.io/kaniko-project/executor@sha256:034f15e6fe235490e64a4173d02d0a41f61382450c314fffed9b8ca96dff66b2    \
	 --dockerfile=Dockerfile \
	 --reproducible   \
	     --destination "docker.io/salrashid123/tpmds:server"       --context dir:///workspace/

i realize now we're involving kaniko as well but it maybe easier to delegate it like this for now

@technillogue
Copy link
Contributor

Would that address the pyc timestamps?

@salrashid123
Copy link
Author

i think so, as part of the kaniko reproducible builds, it sets up snapshots resetting the all file times.

tried it from the getting started guide and using the generated Dockerfile seems to always reference a file like

COPY .cog/tmp/build1866459875/cog-0.0.1.dev-py3-none-any.whl /tmp/cog-0.0.1.dev-py3-none-any.whl

which doesn't exist and causes the kanilo to fail

cog build
cog debug  > Dockerfile

docker run     -v `pwd`:/workspace -v $HOME/.docker/config_docker.json:/kaniko/.docker/config.json:ro    -v /var/run/docker.sock:/var/run/docker.sock      gcr.io/kaniko-project/executor@sha256:034f15e6fe235490e64a4173d02d0a41f61382450c314fffed9b8ca96dff66b2    	 --dockerfile=Dockerfile 	 --reproducible   	     --destination "docker.io/salrashid123/cogdemo:server"       --context dir:///workspace/

@salrashid123
Copy link
Author

oh, so python embeds the timestamp inside the file...then kaniko isn't gonna help out.

...and i can't sincerely recommend going all out on and investing in python-bazel builds

@technillogue
Copy link
Contributor

as a stopgap for the debug issue, run cog build once and then interrupt, it will place a cog wheel in .cog/tmp/whatever, and then you can edit the cog debug output

does python-bazel address pyc timestamps somehow? does it just strip pyc files?

it would be incredibly helpful for us to get reproducible builds for deduplication

@salrashid123
Copy link
Author

yeah, i tried the interrupt trick suggested but each cog+kaniko build is different hash (which is expected, i tihnk)

i'm unsure exactly how bazel rules_python handles pyc files but i can say you need to precisely define everything upfront and bazel uses its own sandbox to canonicalize everything.

some examples with rules_python which may help answer the question though....once it works with rules_python, stitching it with rules_docker and containers would be easy

https://github.com/bazelbuild/rules_python/tree/main/examples

@charles-dyfis-net
Copy link

charles-dyfis-net commented Aug 11, 2023

Y'all might also investigate Nix (which provides dockerTools, an alternate build tool for Docker images) towards this end.

Nix converts all timestamps to one second past epoch, btw.

@technillogue
Copy link
Contributor

Does rules_python generate pyc at all? bazelbuild/rules_python#1761

Again, there's no issue with mtimes, the problem is the timestamps embedded in pyc files

@charles-dyfis-net
Copy link

charles-dyfis-net commented Aug 12, 2023

Does rules_python generate pyc at all? bazelbuild/rules_python#1761

Again, there's no issue with mtimes, the problem is the timestamps embedded in pyc files

The NixOS install CD is fully binary reproducible. I can't imagine it not including Python, so clearly they've got that licked somehow.

Indeed, quoting:

   # Determinism: The interpreter is patched to write null timestamps when compiling Python files
   #   so Python doesn't try to update the bytecode when seeing frozen timestamps in Nix's store.
   export DETERMINISTIC_BUILD=1;

@technillogue
Copy link
Contributor

then we would have to ship nix's patched interpreter, right? DETERMINISTIC_BUILD is not present in stock python

@salrashid123
Copy link
Author

her'es an end-to-end covering building an image with bazel and serving with cog.

if precise build steps are followed, you should end up with

  • sha256:3db6542dc746aeabaa39d902570430e1d50c416e7fc20b875c10578aa5e62875

(i verified it on two different clean vms)

as mentioned, using bazel is really tedious though toolchains like gazelle may help with python.
(imo as-is in current state, the developer friction all this introduces negates the primary ease-of-use benefits of using/building w/ cog in the first place)

[tbh, i've never used or needed cog and try to not use bazel for deterministic builds (in go there are easier ways)...this issue with cog was something i noticed and then ratholed academically.]

@RyzeNGrind
Copy link

I would like to add my +1 for supporting reproducible builds via Nix and NixOS as well.

@technillogue
Copy link
Contributor

https://github.com/datakami/cognix is a project that exists and kind of works but unfortunately isn't a priority for us at this time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants