Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Going further #34

Open
fenollp opened this issue May 1, 2021 · 4 comments
Open

Going further #34

fenollp opened this issue May 1, 2021 · 4 comments

Comments

@fenollp
Copy link

fenollp commented May 1, 2021

Efficiently cache crate dependencies.

I'd like to discuss the future of cargo-wharf and to this extent some ideas I'd like to collaborate on.

Cache integration and the case for a community-backed global cache

Recent versions of docker build support --output=PATH which copies files out of an image. This allows for writing the compilation results of each dependency to the filesystem of the local machine or of a CI cache.
cargo has a way of specifying where to look for build artifacts other than the sometimes-empty ./target/ dir: CARGO_TARGET_DIR.

More on CARGO_TARGET_DIR

Per https://stackoverflow.com/a/37472558/1418165 it turns out that a shared CARGO_TARGET_DIR (or CARGO_BUILD_TARGET_DIR)

These would have to be part of the hashed name of each dependency being built (the dependency path or the docker tag).
To solve hermeticity issues, see

cross

cross already does a good job of building Rust projects (on various platform triplets) using docker (docker run) and QEMU.
This work should be adapted (in a way that can be most easily maintainable) to use BuildKit, its QEMU integration, its rootless capabilities and its ability to run the compute graph with maximum parallelization.

Conclusion

So if cargo-wharf where to create hermetic BuildKit targets for each dependency, leveraging the work on cross, I think there'd be a seamless way to integrate both local and global caches for dependencies. This global cache (basically a Docker Registry) could then be paid for by the community and profit the community.

To get there I see these development steps:

  • get the list of dependencies from cargo-wharf, hashed and hermetic
  • "generate" a Dockerfile with these, based on cross's.
    • Each dep is a stage in this file. Stage name = hashed recipe
    • When linking, dependencies are bind --mount=from=HASHEDDEP,source=...,target=... as read-only
  • docker build this Dockerfile as the cargo build equivalent. Same for cargo test.
  • In a local cache setting
    • each hashed dependency build results would live in a centralized folder, ready for reuse by another project. Thus lowering initial build times.
  • If using the global cache
    • each hashed dependency build results would live as a single-layer docker image, holding files, in the local docker registry as well as the global networked one.
    • New builds should be received by the global registry and checked for hermeticity before adding them to its cache.

Note that that global docker registry

  • can easily be switched to a private instance
  • could be used to directly build dependencies and/or profit from cache locality by setting it as the docker host in about this way: DOCKER_HOST=ssh://lotsa.oompf.machine.com cargo build, only the final build results would then be transferred over the network.

Ideas, thoughts, notes, criticism please shoot.

@fenollp
Copy link
Author

fenollp commented May 1, 2021

In addition dependencies should be built with an empty docker context.

The Dockerfile generated on the fly should first look for hashed dependencies available over the network (i.e. does the image rust/cache:HASHED exist), falling back to the local cache.

A build ARG can be used to switch between networked caches.

A thing I haven't mentioned yet: the hash of a dependency should be a cryptographic hash of the inputs of that dependency, à la Nix.

@tonistiigi
Copy link

cross

fyi, I did some initial work for Dockerfile cross-compile support via --platform in tonistiigi/xx#27 . There are no custom wrapper tools or prebuilt images with toolchains. Only rustup or apk add rust. Seems to work quite well but people more familiar with rust might want to double-check. To me, it is cleaner than cross or xargo. I might look into integrating something similar to this project in the future as rust learning exercise, but in case I don't find time for that just putting it out here for visibility.

@fenollp
Copy link
Author

fenollp commented Nov 14, 2021

Ah thanks for the heads up on your xx project.
My whole point here however is to end up achieving a global build cache for Rust crates by expressing the topological tree of crate dependencies as Dockerfile stages & mount=bind. Setting DOCKER_HOST turns said cache into a networked one.

I am not finding any emphasizing on build results reuse in xx but it is probably implicit. How does xx handle / plans to handle caching of intermediary build results?
Thanks

@tonistiigi
Copy link

@fenollp xx is not meant to be a replacement for this project. It is a helper for adding native cross-compile to Dockerfiles so that they work with any --platform configuration. This project could likely borrow some ideas from it if it wants to have similar capabilities. Rust builds in Dockerfiles can't do package-based caching, if you just want faster incremental builds then cache mounts help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants