Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Bring nix builds to buildkit #1650

Open
blaggacao opened this issue Aug 21, 2020 · 27 comments
Open

Feature Request: Bring nix builds to buildkit #1650

blaggacao opened this issue Aug 21, 2020 · 27 comments

Comments

@blaggacao
Copy link

I'd like to propose to bring the capability of doing nix builds into buildkit.

nix builds propose a convincingly saner alternative of building container images, see this talk: https://www.youtube.com/watch?v=pfIDYQ36X0k

I'd be interested in preliminary feedback if / how that could be done.

@AkihiroSuda
Copy link
Member

You can build Dockerfile with FROM nixos/nix. Does it work for you?

@blaggacao
Copy link
Author

Ah, sorry! I wasn't clear.

What I'm interested in doing is have buildkit build something like using the nix builder

{
  redis = buildImage {
    name = "redis";
    tag = "latest";

    # for example's sake, we can layer redis on top of bash or debian
    # fromImage = bash;
    # fromImage = debian;

    contents = pkgs.redis;
    runAsRoot = ''
      mkdir -p /data
    '';

    config = {
      Cmd = [ "/bin/redis-server" ];
      WorkingDir = "/data";
      Volumes = {
        "/data" = {};
      };
    };
  };
}

The buildImage directive builds a single layered redis container with all runtime dependencies, and nothing more. It's basically a "single" binary container (+ runtime dependencies) int the root filesystem.

It leverages nix's ecosystem knowledge of how to build and run redis and puts that knowledge into a container.

It eliminates the need for builder containers or apt-get cleanups or - even - for alpine itself: it efficiently produces single layered single binary containers without the need to know how to actually package them (thanks to the huge nix ecosystem fundus).

I would imagine:

something loosely like:

Nixfile

# syntax=nix/nixfile:master

{ pkgs, buildImage }: {
# `buildkit` - the magic attribute for buildkit to consume, since there can be various outputs, but buildkit presumes one `*file` - one artifact.
buildkit = buildImage {
    name = "redis";
    tag = "latest";

    contents = pkgs.redis;
    runAsRoot = ''
      mkdir -p /data
    '';

    config = {
      Cmd = [ "/bin/redis-server" ];
      WorkingDir = "/data";
      Volumes = {
        "/data" = {};
      };
    };
  };
}

@tonistiigi
Copy link
Member

Yes you should be possible to do this as a frontend. The efficiency depends on how much work is put into it. The simplified version is similar to the Dockerfile approach @AkihiroSuda suggested, just the user does not need to write a Dockerfile.

You might also want to take a look at https://github.com/talos-systems/bldr/ https://github.com/sipsma/bincastle for a take on natively defining build dependencies/rules in a similar fashion.

@tonistiigi
Copy link
Member

the magic attribute for buildkit to consume, since there can be various outputs, but buildkit presumes one *file - one artifact.

Actually, we have --target for that where you can specify the name.

@blaggacao
Copy link
Author

blaggacao commented Aug 27, 2020

I was having a closer look at the current nix dockerfile tooling: it uses a VM for the construction of the file system. I'd imagine this to cause trouble when running form nixos/nix. - What's sort of the best argument in-depth argument against using a VM - if there is?

@nlewo
Copy link

nlewo commented Aug 28, 2020

@blaggacao it only uses a VM when you need to run commands as root user. But in practice, you rarely need this. This means we generally don't rely on a VM to build Docker images.
I didn't read the whole thread but building Docker images with Nix from a container has an important drawback: you can't (easily) use the Nix store (/nix/store) where everything is cached.

@blaggacao
Copy link
Author

blaggacao commented Aug 28, 2020

Docker images with Nix from a container has an important drawback: you can't (easily) use the Nix store (/nix/store) where everything is cached.

A show stopper. @tazjin has build some interesting caching strategies into nixery (which also caches the completed image layers (noct obly the nix store results) based on the requested packages. Maybe in his codebase are clues how to map a nix derivation (after going through the layered image "popularity" algorithm to content adressable container layers.

Code entrypoint for layer caching: https://github.com/google/nixery/blob/ba1d80323175b61c4f6348827daa8254e3aa13a5/config/pkgsource.go#L35-L41

@tonistiigi
Copy link
Member

You can probably used cache mounts if you want to keep things in nix store over multiple builds. A more complicated version likely doesn't even need that and could connect straight to instruction caching. Eg. examples I posted above don't use cache mounts afaik.

@sipsma
Copy link
Collaborator

sipsma commented Aug 28, 2020

Eg. examples I posted above don't use cache mounts afaik.

Bincastle's current approach doesn't involve cache mounts and should hopefully be formalized upstream once buildkit has support for merge-op (right now it's sort of a hacked version of merge-op).

It's been a long time since I've looked at how nix works underneath the hood so I don't have any concrete ideas. That being said, I quickly skimmed this blog post linked to in the nixery repo and, on the surface, it seems like a lot of the issues involved here might be positively impacted by the existence of an efficient merge-op.

Just to give an example, in bincastle the equivalents of packages are all single layer images arranged in a DAG that get topologically sorted and efficiently merged together via merge-op make a final rootfs. I'd be curious to hear if anyone with more knowledge of nix internals thinks a similar approach would work with nix and allow the nix-store to be easily used. Not sure what the issues preventing use of the nix-store actually are.

@blaggacao
Copy link
Author

Or maybe it's enough/better to implement nixc, a runner that scaffolds containers upon reception of a manifest from something like ipfs, see https://notes.burke.libbey.me/nix-squashfs/ - with some crfs features. - just throwing in loose ideas.

@AkihiroSuda
Copy link
Member

My experimental Nix frontend for BuildKit: https://github.com/AkihiroSuda/buildkit-nix

cd examples/nginx
export DOCKER_BUILDKIT=1
docker build -t nginx-nix -f default.nix .

@blaggacao
Copy link
Author

off topic, @tonistiigi iirc, you once implemented CA for docker. Looks like full CA is just about to land in nix as well (as opposed to "input addressability")

@vito
Copy link
Contributor

vito commented Mar 12, 2022

I spent some time experimenting with MergeOp/DiffOp for this, and it's a really cool feature, but I don't think it's a silver bullet here; Nix also keeps a SQLite database under /nix/var/nix/db/db.sqlite which won't benefit from merging (as far as I understand how the database is used).

Before realizing that I tried using nix show-derivation to find all the input packages and then nix-env -i them in a tree with MergeOp/DiffOp tying them all together, but that quickly consumed all the memory on my machine. 😆 (nix-env -i is very greedy compared to nix-env -iA and I don't know how to do the latter from a derivation.)

I also tried using nix instantiate and then building individual derivations but that was a bit of a non-starter because it would still result in busting the cache whenever the input to nix instantiate changed.

My current approach is as follows (I'm using LLB directly so don't have Dockerfile syntax ready):

  1. Start from nixos/nix and get the image digest
  2. Create a cache mount with the image digest as part of its ID, say /nix-cache/sha256:...
  3. Mount the cache to /cache/ and cp -anT /nix/ /cache/ to seed it with the initial content
  4. Mount the cache to /nix/ and run nix-build (or whatever)
  5. Mount the cache to /nix/ and run cp -aL ./result ./foo to hoist the result out of the cache, just so it gets cached "properly" in LLB (this might be specific to my project)

Using the digest as part of the cache mount ID is important because if you point to a different image (say nixos/nix:master, or say they push a new version) that can cause the cp -anT command to fail with errors like "cannot overwrite non-directory." Having the cache rotate with the base image fixes this.

So far this is working well, but it assumes the full lifecycle (including all building) is happening in Buildkit. If someone wanted to e.g. build in an external system (like Hydra) and propagate the results to Buildkit without a costly transfer they might need #1512.

Hope this helps.

@sipsma
Copy link
Collaborator

sipsma commented Mar 12, 2022

@vito Thanks for the info! I am currently investigating adapting apk package DAGs to LLB using merge+diff, but doing the same investigation for nix is on my eventual TODO list, so I appreciate the data points you gathered so far. If you happen to have any code for your attempts that you can share, I'd appreciate it (no worries if not though, this is useful information on it's own).

One thing I'm immediately curious about: when you saw all the memory on your machine being consumed, were you able to tell if that was Buildkit using the memory? Or was it nix-env? If it was Buildkit then there's probably some improvements we should make to the merge/diff implementation in terms of memory usage.

Also, in terms of the db.sqlite issue, that's something that will also be true for other package managers like apk, apt, etc. which all have local dbs of installed packages that are usually just a flat file (thus not mergeable with mergeop, at least with the current featureset). I do wonder if it's actually necessary to include them at all in exported container images, but if it is there may be some other options for getting them there that are a bit more convoluted... I need to look into it more.

@vito
Copy link
Contributor

vito commented Mar 13, 2022

@sipsma The memory usage was definitely nix-env - it's a known limitation with nix-env -i foo-1.2 (installing by package name + version) vs nix-env -f '<nixpkgs>' -iA foo (installing by named attribute). For example nix-env -qaP firefox which queries for attributes matching a package name takes up ~2GB of RAM. 😱 Apparently it's an older command that just hasn't aged well as the set of available packages has grown, and folks shouldn't normally be using it anyway, but I don't know a better way to go from a derivation to a set of things that can be installed context-free (i.e. without depending on the /nix/ store the derivation came from).

Happy to share code, but it's deep in my esoteric Lisp so it might be hard to find the signal in the noise. It's living in the merge-op branch - the last few commits have all the changes through a few iterations. The idea was to express merge/diff as a list of thunks (analogous to LLB states) where the first is the base, and the rest are diffed from it + merged on top of it. I didn't put much a ton of thought into it, just needed some kind of notation.

re: db.sqlite, I think it's used by Nix to know which things have already been built, so if you're basing nix-build command on one big merged image of cached dependencies, it might not be able to tell that they're cached. Someone who knows Nix more than me can hopefully confirm/correct this.

@tonistiigi
Copy link
Member

that will also be true for other package managers like apk, apt, etc. which all have local dbs of installed packages that are usually just a flat file (thus not mergeable with mergeop, at least with the current featureset)

One approach might be to add packages on separate layers with a merge and then one layer on top of it with apk/world etc for all the packages. Then at least when packages change or are reordered all the package layers can be directly reused and recreating the top layer could be relatively quick. Especially if the recreating of the top layer can be done without accessing the separate package layers. It might get more complicated in some cases with setup scripts etc.

@nlewo
Copy link

nlewo commented Mar 14, 2022

@vito I don't fully understand what you are trying to achieve and it would be nice if you could detail what you want to improve in the current buildkit-nix implementation. I would be glad to help you on the Nix side ;)

For instance, if you want to cache Nix storepaths (stuffs in the /nix/store) in BuildKit, there are some tools to export/import the Nix database.

@vito Note I'm pretty sure you should not have to use nix-env (nix-env is used to manipulate user environments and it's an anti-pattern in the Nix world). If you need to be more granular than nix-build, nix-instantiate and nix-store -r should be sufficient.

@nlewo
Copy link

nlewo commented Mar 14, 2022

Regarding buildkit-nix improvements, i think the nix2container project could be pretty useful. Instead of building a tarball, it builds JSON files describing a "Nix" image. This JSON file is then ingested by a Go library (currently used by a Skopeo fork) to actually create the image. The two main advantages are

  • images are no longer written in the Nix store (less IOs, less disk space)
  • Nix doesn't need to repack all layers of an image (rebuilds can be much more faster!)

I guess this Go library could be used by buildkit-nix to create layers, instead of unpacking them from the Nix build output. buildkit-nix could also skip existing layers without having to unpack the image tarball.

@AkihiroSuda
Copy link
Member

@nlewo PR is welcome 👍

@sipsma
Copy link
Collaborator

sipsma commented Mar 14, 2022

@tonistiigi

One approach might be to add packages on separate layers with a merge and then one layer on top of it with apk/world etc for all the packages. Then at least when packages change or are reordered all the package layers can be directly reused and recreating the top layer could be relatively quick. Especially if the recreating of the top layer can be done without accessing the separate package layers. It might get more complicated in some cases with setup scripts etc.

Yeah that's exactly what I was thinking, though in the apk case it would only be necessary to do this if you want the exported result to actually have apk support in it, which is probably only useful for a small subset of cases like development environments.

@sipsma
Copy link
Collaborator

sipsma commented Mar 14, 2022

@nlewo Thanks for the info! I have some extra thoughts on how nix2container could integrate with buildkit, but please don't let this stop you from contributing to buildkit-nix right away as the idea could require some extra work on top of what's already possible.

Here's what I'm imagining:

  1. buildkit-nix uses nix2container to generate the JSON file (or just a struct with equivalent fields)
  2. buildkit-nix uses the layer information from the previous step to create 1 LLB state for each layer that should exist in the final image. Each LLB state would just consist of the contents of its layer directly under /
  3. Those LLB states are then combined together using MergeOp, allowing them to be exported as a container image (or various other export types)

The nice part of that approach is that it lets us use all of Buildkit's caching to optimize the image creation step as much as possible in addition to offering lots of extra features (i.e. exporting the image in estargz format).

Do you think that sounds feasible? I guess the main idea here is to re-use nix2container as much as possible but replace the skopeo image-building aspect with Buildkit.

@vito
Copy link
Contributor

vito commented Mar 15, 2022

@nlewo My goal was to have a .nix with that builds an OCI image and be able to add more packages to it repeatedly without having to fetch everything from scratch every time I changed it.

For some context, the project this is for is essentially a language that compiles commands to LLB and runs them, transparently setting up mounts for files/etc that are passed around. I was hoping to do all of this using the language's existing primitives without having to couple it to directly Nix or other projects.

Given these limitations are self-imposed I hope not to distract y'all too much, but I'll be following this conversation closely to see if there's anything I could adjust if there ends up being a nice path forward. nix2container looks really cool. :) buildkit-nix was also extremely useful as a reference, but not really suitable for direct integration given the nature of my project.

Note I'm pretty sure you should not have to use nix-env (nix-env is used to manipulate user environments and it's an anti-pattern in the Nix world). If you need to be more granular than nix-build, nix-instantiate and nix-store -r should be sufficient.

nix-instantiate was my original approach but I couldn't figure out how to do that without busting the merge/diff caches every time the .nix changes, since the dependency chain becomes .nix -> nix-instantiate -> nix-store -r diffs+merge. Using nix-env was definitely not ideal, but it could at least run "from scratch" after parsing package names out of nix show-derivation, making its result reusable. (It just didn't work for other reasons.)

@nlewo
Copy link

nlewo commented Mar 15, 2022

Do you think that sounds feasible?

@sipsma yep, this is also what i have in mind!
Regarding your point 1., Buildkit should call Nix to generate the JSON file which is the interface between Nix and Buildkit worlds.

I guess the main idea here is to re-use nix2container as much as possible but replace the skopeo image-building aspect with Buildkit.

I think the Skopeo "nix" transport would still be valuable but it would be nice to be able to do it with buildkit-nix as well.

However, maybe Buildkit could go one step further. Currently, for each layer in the generated JSON file, nix2container generates the digest of this layer (a layer consist on several storepaths). This digest is then used by Skopeo to skip already pushed layers. This however requires IOs (read) at build time, since Nix needs to tar the layer to generate this digest (thanks to the Nix cache mechanism, we only need to do it one time).
I'm wondering if Buildkit could maintain its cache by using the storepath hashes (which are not based on the content) instead of the layer digest: we could then avoid to compute this layer digest twice (by nix2container and by buildkit). Is there a document explaining how the buildkit cache works (is it only "content based")?


@vito I would have some questions actually, but i think it would be better to talk about it somewhere else. For instance, it would be useful if you could show an example of your build expression, to see how you are planning to use Nix. Maybe you could open an issue in your project and ping me on it!

@blaggacao
Copy link
Author

blaggacao commented Mar 16, 2022

There are also growing thoughts in my corner of the thought palast to modify the OCI standard to add a new image type / layer type that more directly provide handles to nix store paths.

These could be assembled on the fly by a runtime into an image, but slightly differently than how it's done with the current layer mimetypes.

The capabilities of the classic container overlay mechanism are only needed for the very last 'customization' layer since the nix store paths already never clash and are read-only for all intents and purposes.

So amending the OCI standard makes a lot of sense to leverage these characteristics of nix, that don't require to spend on the max overlays budget for most of its contents.

@adrian-gierakowski
Copy link

@blaggacao I like your thinking 👍

@blaggacao
Copy link
Author

Maybe to clarify the role of the sqlite in nix, it has two roles:

  • function eval cache (when using flakes and currently only for the entire flake)
  • keep record of reified store paths

Building something in buildkit without mounting the store path doesn't really make sense, imo, and from a nix point of view.

Now cutting through all the elaborate words that have been said and of which I only understand half of it, in simple terms: what's conceptually holding us back to bind mount the store?

I mean there is even a known implementation of a fuse store that queries on demand remotely from a central cache (eg ifps). Although that project is proprietary.

@blaggacao
Copy link
Author

blaggacao commented Jun 4, 2022

I hinted at this idea earlier in this thread. Any help and "political" support is welcome to help that proposal gain momentum and succeed: opencontainers/image-spec#922

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants