Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support remote execution with rules_nixpkgs #180

Open
aherrmann opened this issue Jan 26, 2022 · 76 comments
Open

Support remote execution with rules_nixpkgs #180

aherrmann opened this issue Jan 26, 2022 · 76 comments
Labels
P2 major: an upcoming release type: feature request

Comments

@aherrmann
Copy link
Member

Is your feature request related to a problem? Please describe.
Bazel supports remote execution through the remote execution protocol. This protocol manages inputs and outputs required by and generated by Bazel build actions, i.e. actions defined by regular Bazel rules.

However, rules_nixpkgs defines repository rules that invoke nix-build during Bazel's loading phase. The Nix package manager will then realize Nix store paths (typically under /nix/store/...) and generate symlinks into Bazel's execution root. These Nix store paths are outside of Bazel's control and the remote execution protocol does not ensure that these store paths are also realized on the remote execution nodes.

Remote execution actions that depend on Nix store paths will fail if the required Nix store paths are not realized on the remote execution nodes.

Describe the solution you'd like
We need some solution to ensure that Nix store paths that are required for Bazel build actions exist on the remote execution nodes that these actions may be run on.

Some possible approaches:

  • Run a script before each Bazel invocation (e.g. by defining a custom tools/bazel) that builds all required Nix store paths on the remote execution nodes.
  • Remote execution nodes could share a Nix store via network fs.
  • nixpkgs_package could set remotable = True to execute the nix-build command on the remote execution nodes. (feature announcement, commit, flag)
  • Remote execution could be configured to run actions in Docker images that contain the needed Nix store paths, perhaps using Nixery.
  • Place the Nix store inside Bazel's execroot (perhaps using managed directories) and add a sandbox mount pair (should support relative paths) to mount it under /nix/store inside the sandbox.

cc @YorikSar @AleksanderGondek @r2r-dev

@Jonpez2
Copy link

Jonpez2 commented Oct 14, 2022

If we could find a way to run every action, whether local or remote, under something equivalent to nix-shell --pure -p [the-nix-packages-you-told-me-you-depend-on], then that would be best, right? Then it wouldn't matter whether you were local or remote, and you'd never get inhermeticity.
I'm certain there's a flake approach here too - maybe before every action we emit a flake.nix with appropriate dependencies and nix run that?

@Jonpez2
Copy link

Jonpez2 commented Oct 18, 2022

Does this seem credible?

@aherrmann
Copy link
Member Author

@Jonpez2 Perhaps yes, though this may still require additional metadata: To achieve fine granularity the set [the-nix-packages-you-told-me-you-depend-on] would need to be minimal for each action. So, it's not one global Nix environment for the entire build, but precise Nix dependencies for each build action. The way we integrate nix with Bazel through rules_nixpkgs the Nix store paths are not provided through a nix shell environment, but instead through symlinks into the nix store (with gc roots), i.e. rules_nixpkgs invokes nix-build.

@Jonpez2
Copy link

Jonpez2 commented Oct 18, 2022

Yes for sure, precise for each action. And then no symlinks and gc roots, but just calls within nix contexts. That would be cleaner, right?
So maybe we would have a nix toolchain base or something which provides the machinery to wrap all actions in a nix-shell/flake invocation, and then we find a way to inject that into all actions of all rules that use the toolchain? Or something... There's a core good idea here that would make bazel + nix super happy together, I can feel it...

@AleksanderGondek
Copy link
Contributor

@aherrman @Jonpez2
Apologies for I have been living on a quite rules_nixpkgs-distant planet for a while :D

In principle, every Bazel action should be hermetic and pure - therefore it stands to reason that running it within a very narrowly defined nix-shell achieves that goal and I would love to be able to do just that.

However, due to my experiences of running rules_nixpkgs and Bazel in tandem (which makes me also biased and blind to “newcomer” perspective) I see one major, troublesome aspect of proposed approach:

Interoperability with existing Bazel ecosystem / rules.

I have not fully thought this through, but it seems to be that unless the change would be placed in Bazel code itself (sic!), all of existing rules would need to change to be able to act on inputs delivered from nixpkgs - example cc_library would need to recognize inputs provided by nix package manager and act on them differently then the ones from Bazel itself). Great deal of composability is sacrificed.

@Jonpez2
Copy link

Jonpez2 commented Oct 18, 2022 via email

@aherrmann
Copy link
Member Author

I have not fully thought this through, but it seems to be that unless the change would be placed in Bazel code itself (sic!), all of existing rules would need to change to be able to act on inputs delivered from nixpkgs

Correct, the proposed "If we could find a way to run every action, whether local or remote, under something equivalent to nix-shell [...]" would, I think, require a change to Bazel itself. An alternative that we've been thinking about, if we're to modify parts of Bazel anyway, is to tackle this at the remote execution protocol level. That protocol currently has no notion of a "system root" or other kinds of per-action system dependencies. If the protocol could express such constraints per-action in a generic way, then the remote side could ensure that the constraints are resolved before the action is executed. E.g. that nix-build is run to provide the needed Nix store paths.

Side note, this is, not directly but still somewhat tangentially, related to bazelbuild/bazel#6994.

@AleksanderGondek
Copy link
Contributor

AleksanderGondek commented Oct 19, 2022

I have not fully thought this through, but it seems to be that unless the change would be placed in Bazel code itself (sic!), all of existing rules would need to change to be able to act on inputs delivered from nixpkgs

Correct, the proposed "If we could find a way to run every action, whether local or remote, under something equivalent to nix-shell [...]" would, I think, require a change to Bazel itself. An alternative that we've been thinking about, if we're to modify parts of Bazel anyway, is to tackle this at the remote execution protocol level. That protocol currently has no notion of a "system root" or other kinds of per-action system dependencies. If the protocol could express such constraints per-action in a generic way, then the remote side could ensure that the constraints are resolved before the action is executed. E.g. that nix-build is run to provide the needed Nix store paths.

Side note, this is, not directly but still somewhat tangentially, related to bazelbuild/bazel#6994.

There is another way to move forward, which I feel is a bit more lean and less disruptive towards overall Bazel build model.

Bazel has Remote Assets API that can be extended to provision /nix/store-bound artifacts. Qualifiers could be employed to pass on any additional required metadata and the big issue of ensuring RBE execution platforms hosts /nix/store consistency is solved.

@Jonpez2
Copy link

Jonpez2 commented Oct 19, 2022

FWIW, if we're thinking of something remote-execution-specific, BuildBarn has the following two interesting (and I think related) issues:
buildbarn/bb-remote-execution#40
buildbarn/bb-remote-execution#23

@Jonpez2
Copy link

Jonpez2 commented Oct 19, 2022

@AleksanderGondek - re the Remote Assets API - would that mean something like doing a nix query to find the transitive closure of required /nix/store roots, and then having the remote worker unpack into a /nix/store that looks exactly like that? And then having starlark code which figures out resolved paths within /nix/store, and executing with that? That seems a bit entwined and fragile to me...

@aherrmann
Copy link
Member Author

There is another way to move forward, which I feel is a bit more lean and less disruptive towards overall Bazel build model.

Bazel has Remote Assets API that can be extended to provision /nix/store-bound artifacts.

What I remember of this approach from the last attempt was that it only worked when the remote execution system could still access the needed nix files on the host directly. That's usually not true, e.g. when a developer issues a build on their development machine and the remote execution service runs on a cloud platform or somewhere else on a remote machine. If that limitation could be fixed then this could indeed be a viable option.

@Jonpez2
Copy link

Jonpez2 commented Oct 19, 2022

I really think the only safe option is a nix-shell or nix run flake style wrapper.
Do all actions happen within the context of some particular posix toolchain by any chance? e.g. do they all invoke via a bash selected from the toolchain, or something like that? Is there any way we could hook that? It would mean something like generating one toolchain per rule kind or something, but it would be at the bazel level and therefore equivalent between local and remote...

[Edit] I'll leave this comment here, but it's obviously off base, and couldn't possibly be true.

@uri-canva
Copy link
Contributor

Note there's some extra considerations if the host machine is a different platform than the remote executor. The common way of using rules_nixpkgs right now is to let nix detect the host machine platform, instead of having bazel pass that platform information in (which it can't even do for repository rules anyway, it can only do it for actions).

@uri-canva
Copy link
Contributor

Do all actions happen within the context of some particular posix toolchain by any chance? e.g. do they all invoke via a bash selected from the toolchain, or something like that?

Not from a toolchain, but the default shell used by actions can be set with the BAZEL_SH environment variable or the --shell_executable option. Note not all actions use the shell anyway, some execute the binaries directly.

It would mean something like generating one toolchain per rule kind or something, but it would be at the bazel level and therefore equivalent between local and remote...

No that's definitely the approach that would be the most compatible with bazel: defining toolchains using the bazel APIs in a way such that bazel doesn't need to know anything about nix. For example if we define toolchains with binaries that are nix-shell wrappers, as long as the executors have nix installed, then running those wrappers will work as expected, and assuming they include absolute paths to the store paths their contents will stay the same if the underlying derivation is the same, or change if the derivation changes, which lets bazel handle the caching correctly even without any knowledge of the derivation.

@uri-canva
Copy link
Contributor

Just as some extra context, we've been looking at this problem too. One approach that we've spiked is using nix to build portable binaries that are self contained, using buildFHSUserEnv. In practice it works, but it's a bit clunky, and it doesn't have the same advantages as you'd get from using nix built binaries, since you need to reimplement a lot of the builder scripts / derivations, and the resulting binaries still need to be compatible with the glibc version running in the executor, assuming you use glibc and not musl, which is what you need to be able to use prebuilt dependencies from the respective ecosystems in your interpreters and compilers.

@Jonpez2
Copy link

Jonpez2 commented Oct 20, 2022

Another point that occurred to me this morning - I think we need to define a Provider which adds nix-specific data for a rule: i.e. which nix packages are require to build the rule, and which are required to run the rule.
Then maybe we figure out how to plug in a host-package-manager manager into bazel itself, which consumes data out of such a provider (or the transitive closure of the providers collected from the deps?) , and pre-sets-up the host env for the rule's action executions.
I say this because some rules may run on remote host a), resolve a /nix/store path, and then hand it over to another rule which proceeds to run on remote host b) which hasn't got that resolved. So we need to communicate requirements between rules I think.

@Jonpez2
Copy link

Jonpez2 commented Oct 30, 2022

Does all of this apply to guix as well? Ie could we make it a bit more palatable to bazel by making it apply to at least 2 mainstream package managers?
Do you in tweag have interest in a call to discuss possible ways forward on this? I would be excited to give some context on my usecase.

@uri-canva
Copy link
Contributor

Note that changes to the remote execution APIs are a bit more complex to get through, since there's several implementations of it. See https://github.com/bazelbuild/remote-apis#api-community.

@Jonpez2
Copy link

Jonpez2 commented Oct 31, 2022

Yeah I really don’t think this should happen via the remote execution api. On nix, this can and therefore should work exactly the same across remote and local, no?

@aherrmann
Copy link
Member Author

I'm currently a bit overloaded on other work and OOO due to an accident. I haven't managed to catch-up on the discussion above, yet. I haven't forgotten it and I intend to contribute further, just letting you know to manage expectations.

@Jonpez2
Copy link

Jonpez2 commented Oct 31, 2022 via email

@Jonpez2
Copy link

Jonpez2 commented Nov 9, 2022

Hello again! No-op sort of an update: I've been having a dig around in the bazel codebase to try to figure out some kind of a way forward here, but I haven't come up with anything particularly useful. I am thinking that we would want to add a spawn strategy which wraps the DynamicSpawnStrategy and somehow picks out the transitive closure of relevant NixToolProvider (i.e. the thing I was trying to describe in #180 (comment)). Then "all it would need to do" is, in the exec() function, before delegating to DynamicSpawnStrategy, prefix the spawn's command line with 'nix-shell -i [whatever] run -- " or something?
There's a bit of handwaving in there :)

@Jonpez2
Copy link

Jonpez2 commented Nov 25, 2022

Gentle ping on this one
FWIW, here is a googel groups thread on the subject - https://groups.google.com/g/bazel-discuss/c/kqv-EHhApbY

@aherrmann
Copy link
Member Author

Sorry for the long silence on this one. I've been quite busy lately.

Do all actions happen within the context of some particular posix toolchain by any chance?

@Jonpez2 That is one of the difficulties, at least for a generic solution on the level of rules_nixpkgs, i.e. from the library author's perspective: We don't what kinds of targets users will import from Nix, and we don't know how users will use these targets. They could import toolchains, libraries, build tools, a Docker base image, etc. So, actions that use Nix provided files could be any action really. See also what @uri-canva points out in #180 (comment).

Note there's some extra considerations if the host machine is a different platform than the remote executor. The common way of using rules_nixpkgs right now is to let nix detect the host machine platform, instead of having bazel pass that platform information in (which it can't even do for repository rules anyway, it can only do it for actions).

@uri-canva That's correct. I think handling this correctly is possible. The nixpkgs_package import could explicly set the system argument to nixpkgs and the generated toolchain could set the correct exec constraints. The assumption that host equals exec may be hard-coded in some places that would need fixing. But, I think there's no strong technical reason for that assumption, mostly just historical reasons. It's clearly related to this ticket here, but it's a separate issue. @uri-canva do you want to go ahead and open a feature request for it?

No that's definitely the approach that would be the most compatible with bazel: defining toolchains using the bazel APIs in a way such that bazel doesn't need to know anything about nix.

@uri-canva The trouble is that Nix provided dependencies can be used in actions that don't have a natural associated toolchain. E.g. a genrule or a custom action.

For example if we define toolchains with binaries that are nix-shell wrappers

@uri-canva Two things that make this problematic:

  1. Nix-shell wrappers defer the Nix-evaluation to the execution time. This means we have shifted the problem from having to ship the Nix store paths to the remote side to having to ship the Nix expressions to the remote side. I.e. the Nix sources that the Nix shell wrapper loads and evaluates have to exist on the remote executor under the correct path.
  2. Nix evaluation adds overhead, sometimes quite considerable one. The nice thing about the way rules_nixpkgs does it right now is tha this evaluation happens only once per import when fetching the nixpkgs_package. With Nix shell wrappers this would happen every time. This can be problematic. I've worked on projects in the past where this overhead made Nix shell wrappers infeasible for certain tools, e.g. gcc.

One approach that we've spiked is using nix to build portable binaries that are self contained, using buildFHSUserEnv. That's indeed a nice solution. But, as you point out has it's costs. At the level of rules_nixpkgs I'd prefer a more generic solution, if possible. I think it's worth pointing out the difference in the user and the library author perspective here: As a user with a concrete use-case in mind one can craft a dedicated solution that fits well with the given infrastructure and codebase, and one can adjust the own project around the constraints that the chosen solution implies. As the library authors of rules_nixpkgs we should strive to find a solution that doesn't unduly restrict or dictate the setup on the user. In practice we may have to be pragmatic here and there and impose some restrictions to arrive at a feasible solution, but we should try not to be overly restrictive. I feel like forcing users to turn every Nix imported package into a self-contained, relocatable package is probably too restrictive. That said, if it works at a use-site of rules_nixpkgs and fits well into a given project, sure why not.

Another point that occurred to me this morning - I think we need to define a Provider which adds nix-specific data for a rule

@Jonpez2 Keep in mind that rules_nixpkgs implements repository rules and these cannot return providers. Providers only exist at the level of regular rules. That said, if this only about transmitting metadata to dedicated rules, aspects, or other tools through some additional mechanism. Yes, that can be a viable route, rules_nixpkgs could auto-generate such metadata targets. Take a look at the good work done by @AleksanderGondek and @r2r-dev on https://github.com/tweag/nix_gazelle_extension - it generates dedicated metadata targets to transmit Nix metadata into Gazel. I think it's somewhat different from what you're point at here, but may still be a good point of reference.

Note that changes to the remote execution APIs are a bit more complex to get through, since there's several implementations of it. See https://github.com/bazelbuild/remote-apis#api-community.

@uri-canva Absolutely, I understand. The observation that makes me consider this route is that rules_nixpkgs is not alone with the problem of having to transmit information about system dependencies or other kinds of ambience. Projects that use distribution package managers could also benefit from the ability to collect system dependencies across targets and ship this metadata to the remote side. Indeed, to extend the protocol (if needed) to better support rules_nixpkgs would have much higher chances of success if it could also benefit other non-Nix related use-cases.

@aherrmann
Copy link
Member Author

@Jonpez2 Thanks for the mailing list thread. I'll take a look. I saw Alex Eagle mentioned the community day session there. I've been meaning to share my notes here. I'll try to do so now:

BazelCon Community Day, which was held one day before BazelCon, included unconference style sessions, and we had one session on remote execution and system dependencies and Nix in particular.

Many voices suggested to find a way to track the Nix store paths with Bazel explicitly, i.e. somehow store Nix store paths under Bazel's output-base and mark them as regular dependencies such that the remote execution system would automatically push them to the remote side. The problem here is that nixpkgs assumes stable absolute paths and Bazel's action execution evironment does not provide these. So, this would require some form of path remapping, to map Nix store paths fetched into Bazel’s output base to some stable absolute path. E.g. some form of file system abstraction.

A promising suggestion came from @illicitonion. Namely, mark Nix imported targets with special platform properties that define which Nix derivation is required to run the action. Then, add a feature to Bazel to accumulate platform properties across transitive dependencies such that the transitive closure of required Nix derivations is communicated to the remote side for each action. Finally, extend the remote executor to parse these platform properties and ensure that the required Nix derivations are instantiated before running the action.

The underlying observation is that there are really two problems here:

  1. What Nix derivations does a given action depend on.
  2. How do make sure that the corresponding Nix store paths are instantiated at the remote side.

The marking and transitive accumulating of the platform properties achieves 1. Conveniently, platform properties are already shipped to the remote side, so that this doesn't require extension of the RBE protocol.
2. then has to be addressed on the remote executor side by reading these platform properties and instantiating the corresponding store paths.

A concern that was brought up is that we may want to cut the Nix dependencies at some point, or distinguish runtime and build time deps.

@Jonpez2
Copy link

Jonpez2 commented Dec 10, 2022

@aherrmann would you be open to having a call on this? Is there a way for me to contact you directly to set one up please?

Thank you!

@olebedev
Copy link

For example if we define toolchains with binaries that are nix-shell wrappers

@uri-canva, please note that this won't work for every Nix package but for only the packages with executables. Nix provides some other stuff other than that, for example, we build docker images in Nix and ship them to Bazel and nix packages and form a final docker image based on the Nix one.

Allow me to put a couple of thoughts around the usability of the riles_nixpkgs in general. As far as I am aware, Bazel is primarily used for mono repositories rather than just monolithic C++/Java builds to reach incrementally. That is, for poly language setups where a Nix code base can also take place (we have a large Nix code base in our mono repo). In this case, using rules_nixpkgs becomes problematic because it's not supposed to be used in this way (support building nix packages as first-class Bazel build graph citizens, in fact they are external repositories in Bazel) and the design of the rules assumes that only a nixpkgs pin/commit is being fetched via builtins.fetchTarball and a small set of files can be attached to a declared using the rules Nix package via nix_files_deps attribute. It works ok, but we're missing dependency check capabilities from Bazel here and we need to list all these files manually, which involves code generators in the code maintenance process and destroys granularity (can be fixed by applying evaluation of every Nix package and infer it dependencies on Nix files, only). This also overloads the analysis phase because all the Nix packages are declared in the WORKSPACE file eventually.

That is, the larger a Nix codebase within a Bazel-maintained repository is the more problematic its maintenance becomes. Because it is not a part of the first-party Bazel dependencies graph but rather a part of external Bazel repositories that rules_nixpkgs create out of the Nix expressions. For example:

nixpkgs_package(
  name = "libXxf86vm",
  attribute_path = "xorg.libXxf86vm",
  nix_file = "//nix:nixpkgs.nix",
  nix_file_deps = [
    # a huge list of files that the Nix expression of the `xorg.libXxf86vm` package depends on
  ],
  repository = "@nixpkgs",
)

In light of the above, have you considered using Nix within the genrule for BRE instead of the rule set? For example, building a docker image in Nix:

genrule(
  name = "hello-world",
  srcs = ["default.nix", "//pkgs:default.nix"],
  outs = ["image"],
  cmd = """
nix-build $(location default.nix) -o "$@"
""",
  tags=["requires-network"],
)

This would work just fine with Bazel RE with a single caveat: we needs to make sure that the actions that depend on the //:hello-world target's output need to be executed only at the nodes where this //:hello-world target has been run/executed. Given the Nix binary cache, we can easily couple such tuples of target + action together, what do you think? I am not familiar with Bazel's internals but AFAIK, it can be possible to tell the Bazel build scheduler to handle tuples like this, we just need to make sure we create these tuples in terms of Bazel. What do you think about this approach?

Also, there is a great talk about making rules_nixpkgs work for BRE - https://skillsmatter.com/skillscasts/17673-remote-execution-with-rules-nixpkgs#video. From your perspective, is there something that looks more like a blocker to applying this approach? To me, it looks like quite a lot of infrastructure work needs to be done but the overall outcome would impress.

@aherrmann, @Jonpez2, @uri-canva, please let me know what do you think.

@sluongng
Copy link

sluongng commented Feb 20, 2023

@uri-canva I think your analysis is on point here.

Given a codebase using nixpkgs_package and nixpkgs_*_configure rules for host builds, there could be an automated way to produce the container image for remote execution, and all the toolchains required to use it.

Keep in mind, container image support does not come with Bazel by default.
This is a "hack" outside of Bazel, provided by various different RBE implementation. By including a container image via exec_properties as part of the Execute rpc request.

This routes back to my original point earlier: perhaps, what it would take for Nix to work with RBE is simply another similar hack on RBE side to enable nix to work. Here is what I have in my head:

  1. Have a local process, pre-build, to upload needed Nix files to the remote cache under 1 directory(tree):

    shell.nix
    flake.nix
    flake.lock
    

    This could be implemented, sneakily, in workspace_status command 😈, or as part of the repository_rule()

  2. Include the digest of the tree above as part of exec_properties of Bazel action (example), telling RBE platform to select builders matching this requirement.

    exec_properties = {
         "OSFamily": "Linux",
         "nix-tree": "SHA256:aaaaaaaaaabbbbbbbbbbbccccccccccccc",
     },
    
  3. Special RBE builders fetch the tree digest and run nix shell to prepare working directory / environment variables before executing an action. These special builders could have their own custom way of integrating with a Nix Remote Cache or even a Nix Remote Build setup.

Would something like this works or is there some use cases that I am missing here?

@uri-canva
Copy link
Contributor

No that sounds reasonable, I mentioned a container image because it's what we use, but the environment can be reproduced outside of bazel in any way, be it a container image passed through exec_properties, a container image passed to the remote execution service out of band, a NixOS VM image for the remote execution service to use for the workers, etc etc.

It would still be helpful for rules_nixpkgs to provide as much as possible of this to users, for example by providing a nix derivation, then users can bake this nix derivation into the execution environment however they prefer.

Note that what you're describing is not related to rules_nixpkgs at all, and it is very well supported by bazel as is already, the tricky part that rules_nixpkgs brings in is being able to defined multiple nixpkgs_package and depend on them from different targets. A global workspace shell environment is much easier to translate into something that you can load into the remote execution environment.

@aherrmann
Copy link
Member Author

@uri-canva Thank you for sharing your progress on this issue and this classification. I agree with that classification. I'd like to add some details though:

[Download rules] [...] the inputs are explicitly specified, there are no implicit inputs from the system [...] These assumptions match the use case of language package managers such as maven and yarn.

Ideally that's the case. In practice this fails under certain circumstances. Language specific package managers like pip or yarn often make it hard to properly separate package download from package build or installation. If package build or installation ends up ocurring during repository rule evaluation, i.e. during fetch, then it is easy to incur a reference to system state. E.g. packages with native components often invoke whatever C/C++ compiler they find in PATH to build native components. Since repository rules don't have access to Bazel toolchains, yet, since they run in an earlier phase, these builds are usually not hermetic.
I'm mentioning this to point out that this ideal is not so easy to achieve with standard tools and its one of the areas where Nix can help, either by providing Nix built packages instead of invoking the package manager, or by providing a controlled toolchain through the environment or appropriate repository rule attributes.

[Nix / as a system package manager] When used as a system package manager, for example [NixOS, Nix shell ...] remote execution can be supported easily, as the assumptions match other system package managers. The remote execution environment can either be a NixOS environment configured to match the host, or a container image configured to match a host [...]

It's worth pointing that when Nix is used in this way one loses out on the granularity that rules_nixpkgs provides. With rules_nixkpgs provided tools and toolchains Bazel can track which targets are affected by a change to an individual Nix package and can invalidate and rebuild only those targets. When Nix is used to generate an OS image (or similar) then Bazel does not have that information (or at least it depends very much on how each tool and toolchain is defined) and, to be sure, any change requires a clean rebuild.

Also, one of the benefits Nix provides is the ability to define the tools and their versions in code right in your repository. That means that PRs can contain changes to the Nix provided environment as well as to regular code. A remote execution setup needs to take this into account and must support multiple simultaneously existing configurations. In particular, one PR that updates a Nix package must not cause the entire remote execution cluster to be restarted with that new image, because then builds on other branches would run against the wrong versions. This is in contrast to some remote execution setups, where the remote executor image is centrally defined and updated in bulk. There are ways around this, but it's important to evaluate their costs carefully. For example, defining a Docker image to run each build action in can be a workaround. But, when evaluating that approach in practice we found that it can incur a significant overhead such that some types of build actions no longer benefit from remote execution.

[Nix / as a language package manager] [...] for example when using nixpkgs_package to provide self contained outputs such as container image tarballs, remote execution is supported, as the files can be copied to remote executors and consumed there.

I agree with the general point. But, I wouldn't call this a language package manager use-case. Perhaps "Nix as an artifact provider" is better name. Language packages provided by Nix often have runtime dependencies on the Nix store, e.g. dynamic libraries like glibc, or package collections like PYTHONPATH, and cannot be used on another machine without ensuring that their transitive closure is available.

In a sense there's nothing too special about nix in terms of integrating it into bazel: if one were to implement a rules_apt rule set that installs apt packages on demand via repository rules, it would be hitting all these issues as well. The special thing about nix is that using it on demand is so common that not being able to do so in a certain context is a big limitation.

Correct, perhaps with the added property that Nix stores packages in Nix store paths that depend on their inputs. A minor version apt package update will not change paths and a discrepancy across machines may go unnoticed, whereas with Nix it wouldn't. To be sure, this is an advantage of Nix in terms of correctness.

@sluongng Thanks for detailing this.

Include the digest of the tree above as part of exec_properties of Bazel action (example), telling RBE platform to select builders matching this requirement

IIUC this comes back to the granularity point. A global image loses out on the per package granularity provided by rules_nixpkgs.

In that respect @layus' work seems like a good direction as it maintains that granularity. The particular implementation so far uses Nix's remote execution protocol to communicate the required Nix packages to the remote side before the build. But, thinking more generally, that doesn't necessarily have to be the way it's done. We could also use some other scheme to register required packages with the remote side.

@olebedev
Copy link

@sluongng Thanks for detailing this.

Include the digest of the tree above as part of exec_properties of Bazel action (example), telling RBE platform to select builders matching this requirement

IIUC this comes back to the granularity point. A global image loses out on the per package granularity provided by rules_nixpkgs.

@aherrmann, maybe we can add attributes that are required to be present for a particular action, in the similar way, in exec_properties and make sure we built only them, not the whole attrset? Assuming Nix code base is relatively small for cheap frequent fetches.

In that respect @layus' work seems like a good direction as it maintains that granularity. The particular implementation so far uses Nix's remote execution protocol to communicate the required Nix packages to the remote side before the build. But, thinking more generally, that doesn't necessarily have to be the way it's done. We could also use some other scheme to register required packages with the remote side.

I agree with that. This decouples the problem from Bazel and turns it into an infrastructure challenge, which seems solvable and can be optimised.

@aherrmann
Copy link
Member Author

maybe we can add attributes that are required to be present for a particular action, in the similar way, in exec_properties and make sure we built only them, not the whole attrset? Assuming Nix code base is relatively small for cheap frequent fetches.

@olebedev Yes, that comes back to the suggestion in #180 (comment). As laid out there there the challenge is to capture transitive dependencies correctly. E.g. a cc_library built with a Nix provided CC toolchain may have a dependency on Nix store paths, e.g. a shared library dependency like glibc. A downstream target, e.g. a cc_binary or a haskell_binary that depends on that cc_library will transitively depend on that Nix store path as well. We probably don't want to require users to collect these transitive dependencies manually in exec_properties.

A promising suggestion came from @illicitonion. Namely, mark Nix imported targets with special platform properties that define which Nix derivation is required to run the action. Then, add a feature to Bazel to accumulate platform properties across transitive dependencies such that the transitive closure of required Nix derivations is communicated to the remote side for each action. Finally, extend the remote executor to parse these platform properties and ensure that the required Nix derivations are instantiated before running the action.


I agree with that. This decouples the problem from Bazel and turns it into an infrastructure challenge, which seems solvable and can be optimised.

Yes, exactly.

@z8v z8v unpinned this issue Feb 24, 2023
@z8v z8v pinned this issue Feb 24, 2023
@Jonpez2
Copy link

Jonpez2 commented Mar 14, 2023

Hello! Revisiting this after a long diversion into other, less important matters :)

If we did want to take the 'special platform properties' route, who would we need to engage, and how would we move it forward?

Thank you!

@aherrmann
Copy link
Member Author

Hi @Jonpez2, thanks for checking in!

On the Bazel side that would require a feature request for the collecting of the transitive platform properties and discussion with the Bazel team about the exact shape of this. As mentioned above there are some open questions about how to control when properties are forwarded and when not. Some targets produce outputs that no longer need transitive Nix store paths at build or runtime. Generally, distinguishing build and runtime dependencies is another challenge with this approach. On the remote executor side this requires implementation work on the particular remote execution platform used to parse these platform properties and act on them.

All that said, as mentioned above, after exploring and discussing different approaches, I think the approach that @layus worked on and presented at Bazel eXchange is the most promising concept. Some details may need fleshing out or revisiting, but conceptually it's a very elegant approach to say, if a build rule knows a Nix store path, then that means that it has a legitimate direct or transitive dependency on it. So, we track which Nix store paths are requested at the loading phase (in the current implementation through Nix remote builds, but it doesn't have to be that), and then make sure that the build nodes can access them (in the current implementation through a shared network file system, but it doesn't have to be that). Bazel, or the remote execution protocol, doesn't have to be touched.

@sluongng
Copy link

@aherrmann I think you are aiming for something with higher "purity" here.

My take on @Jonpez2 's question is: If we are simply aiming to meet the current functionality of supporting custom docker images like this

    exec_properties = {
        "OSFamily": "Linux",
        "container-image": "docker://gcr.io/YOUR:IMAGE",
    },

with something like

    exec_properties = {
        "OSFamily": "Linux",
        "nix-pkg-revision": "17.09",
        "nix-pkg-sha256": "aaaaa...", # Optional
        "nix-shell": "//:shell.nix"
    },

Then it's simply a matter of engaging with an RBE software vendor to implement support for this on the server side.

It will have the same tradeoffs with supporting custom Docker image today: the platform will create and store a new root dir using Nix, then set it as the base root dir (and environment variables) before each action executed on the remote executor.

What's impure about this?

If you update the Nix platform config, Bazel will not know about it to invalidate Action Cache.
This is also a common pain point for custom Docker image user today, but having such a feature allow our customers to adopt Bazel a lot faster and benefit from RBE much earlier.

If we could agree on this being a reasonable short-term solution for the Nix ecosystem, then I would love to collab with somebody to implement this feature as part of BuildBuddy's RBE offering.

@Jonpez2
Copy link

Jonpez2 commented Mar 15, 2023 via email

@uri-canva
Copy link
Contributor

#180 (comment)

@sluongng we are currently running a setup similar to that, except we have an out of band mechanism to take the shell.nix and create a container image out of it, and then we pass that container image in exec_properties.

There are two challenges I can see with what you propose:

  1. On the bazel side and in the remote execution protocol, exec_properties values currently can only be strings, not artifacts / digests (https://sourcegraph.com/github.com/bazelbuild/remote-apis@64cc5e9e422c93e1d7f0545a146fd84fcc0e8b47/-/blob/build/bazel/remote/execution/v2/remote_execution.proto?L727)
  2. Nix doesn't have a standard format for redistributable derivation definitions, the closest thing we have is flakes

Both solvable problems, and both things that are useful outside of this integration work in their respective ecosystems.

@Silic0nS0ldier
Copy link
Contributor

I’m personally pretty axed to get this

  • correct (I.e. pure in both nix and bazel terms)
  • minimally invalidating
  • transparent w.r.t. local vs remote execution
  • minimally incremental in terms of infra (like a globally distributed
    filesystem seems kinda hard!)

I know I’m asking for a lot :(
#180 (comment)

This is something I've been ideating on for awhile now. I believe such an outcome is possible, but on the Bazel side there is a major capability gap that effectively locks you into 100% local or 100% remote for building, unless you have some mechanism in place to ensure local and remote stay perfectly in sync. ...or if you are willing to accept some concessions and extra work (potentially a lot) some of this can be achieved manually with the existing spawn strategy controls (I personally don't recommend this).

The capability gap is execution platforms being entirely disconnected from spawn strategies, which can lead to scenarios where platform(...) declarations registered as extra execution platforms (be it via register_execution_platforms(...) in WORKSPACE and MODULE, or --extra_execution_platforms=... via .bazelrc and CLI) are used to configure targets/actions that are run locally despite (potentially) being incompatible.

e.g. When you have a macOS host and Linux remote executors, Bazel may (will if the remote is unreachable or otherwise disabled) allocate actions configured for the remote platform to the macOS host which will depending on the inputs may proceed to fail.

In a setup where remote is the only enabled spawn strategy, you can get around this via RBE service specific logic and exec_properties. e.g. with EngFlow defining a worker pool and adding the pool name in exec_properties so the schedular knowns where to route actions.

There is an approved proposal to address this capability gap, which if implemented (no idea if it has been started or not) should suit multi-platform build scenarios (especially those involving Nix) much better. Execution Platforms vs. Strategies

@Jonpez2
Copy link

Jonpez2 commented Mar 16, 2023

That's a very interesting doc. When I first started thinking about this whole thing (back when I was young and naive), I thought that injecting into something like the spawn strategies would make most sense as then there might be some chance of code sharing between the local and remote strategies. Then maybe we could use flakes and nix run to give us extraordinary awesomeness.
The one point that seems to be explicitly out of scope in that docs is the collection of data from dependencies for feeding into the spawn strategy, which I think is a likely requirement here?

@Silic0nS0ldier
Copy link
Contributor

The one point that seems to be explicitly out of scope in that docs is the collection of data from dependencies for feeding into the spawn strategy, which I think is a likely requirement here?

Appropriate constraints added to target_compatible_with plus appropriate dependency propagation should address this issue once platforms can influence the selected spawn strategy. As for how rules_nixpkgs would need to evolve to support such mechanisms... That is TBD.

As for the spawn strategy side of the problem, it sounds like other projects are currently ranking higher priority wise (fair enough). I may have a go at implementing the proposal.

Not really in scope for this issue, but if anyone else wants to take a crack here are the notes I made for where changes would be necessary to implement the original proposal iteration (discussion around it made tracking down relevant source easier, the approved proposal should have a lot of overlap).

@Jonpez2
Copy link

Jonpez2 commented Mar 18, 2023 via email

@aaronmondal
Copy link

@uri-canva Thanks for linking this issue!

I think the upcoming remote execution implementation in rules_ll might be relevant to this issue since it also interleaves Nix and Bazel. Instead of invoking Nix via Bazel, we invoke Bazel wrapped in a Nix environment. So nixpkgs dependencies come from a flake and make up the environment in which Bazel runs. Then we use remote execution like this:

  1. Wrap Bazel in an environment that sets all its runtime dependencies to something from nixpkgs: https://github.com/eomii/rules_ll/blob/main/bazel-wrapper/default.nix
  2. Create a container image that uses the exact same wrapped Bazel as input. This will cause nix store paths on local installations and in the container to be identical (as long as the system architecture is similar enough): https://github.com/eomii/rules_ll/blob/main/rbe/image.nix
  3. Use that container to generate remote execution-compatible toolchains. This will cause the generated toolchains to only refer to nix store paths: https://github.com/eomii/rules_ll/blob/14c1f431ceccd5b715a5df27639f2ff8d7506cf2/rbe/default/cc/BUILD#L135-L148

See also https://ll.eomii.org/setup/remote_execution.

Note that this functionality is not yet officially released and currently only works with upstream rules_ll, which is even less stable than the already highly volatile releases lol 😅

With this setup you can actually run the remote execution toolchain locally because a local nix shell will install the same exact tools as are in the RBE container. This means that you can share caches between systems with pretty much prefect cache hit rate. Pretty cool! So you can have a remote executor build something large (in our case LLVM) and consume artifacts on a laptop and continue locally building custom targets on the laptop 😊

A drawback is that this system requires you to regenerate the entire RBE container and all toolchains if you change a dependency. If the Bazel wrapper script changes in any way the RBE container will have a different hash and the RBE toolchain will be incompatible with the previous one and require a full cache rebuild.

We have a tool that kind of does this stuff automatically, but at the moment it's a bit limited and only does the bare minimum we need for a basic toolchain.

Another drawback is that the Bazel wrapper currently only supports the Clang 15 toolchains from nixpkgs and hardcodes some flags around it. It would be desirable to make this more flexible so that it works with arbitrary (or at least "more") toolchain configs.

@sluongng
Copy link

sluongng commented Sep 7, 2023

https://github.com/pdtpartners/nix-snapshotter/blob/main/docs/architecture.md is a very interesting project.
It lets you define custom nix-based layers in your OCI-compatible container image.

When RBE worker downloads the container image, using the snapshotter, it would be able to construct the nix-layer using contents from the host's nix-store.

So this is 1 step closer to the RBE goal. The remaining pain would be how to build/update + release the container image before each Bazel build if there are some nix packages that were updated. But I think that could be done with a Bazel wrapper? WDYT? 🤔

@fhilgers
Copy link

As I understand, the problem is that the runtime dependencies for a package installed via the nix are missing on the remote executor. Is this right?

If so, could rules_nixpkgs just implement some feature like guix relocatable packs (https://guix.gnu.org/manual/en/html_node/Invoking-guix-pack.html) or use an already existing project like https://github.com/NixOS/bundlers or https://github.com/DavHau/nix-portable?

Basically, as far as I am aware, these projects use either linux user namespaces, proot or fakechroot to simulate the nix store being in the usual location /nix/store while it is actually somewhere else like in the bazel cache from the repository rule. All of those probably only work on linux, but maybe there exists a similar project for Macos that I just dont know about.

@aherrmann
Copy link
Member Author

As I understand, the problem is that the runtime dependencies for a package installed via the nix are missing on the remote executor. Is this right?

Yes, that's right.

could rules_nixpkgs just implement some feature like guix relocatable packs

For some specific use-cases that could work, such as executables used as build tools or runtime dependencies. E.g. we could import a relocatable version of hello, place the artifact into a Bazel controlled output, Bazel would treat it as any other source file, and forward it to the RE worker.

It gets trickier for other use-cases. E.g. imported shared libraries have RUNPATH entries pointing to absolute Nix store paths and since they are not themselves executable they cannot be wrapped with something that sets up namespaces or fakechroot or the like.

Toolchains are also tricky. Let's say we import a CC toolchain from Nix and manage to make it relocatable using namespaces or the like. Now we can invoke the C compiler in a Bazel build action and it will find references to absolute Nix store paths, e.g. headers or standard libraries, thanks to the namespaces setup. But, (assuming dynamic linking) it will produce a binary that references shared objects in the Nix store, e.g. glibc. So, to execute that resulting binary we now also need a relocatable wrapper that captures the transitive runtime dependencies. So, this approach is somewhat viral and invasive. And there is an issue of duplication and artifact size. If we import two separate tools from Nix, they will likely have some overlap in their runtime closure, typically at least libc. This will be duplicated in the relocatable bundles.

Nonetheless, it's certainly an interesting approach and could work for some specific use-cases. You might be interested to read this blog post by @filmil and follow the links to the example project and related discussions on GitHub.

@domenkozar
Copy link

Note that Tvix at NixCon 2024 will talk about how tvix.dev has FUSE layer support, so you could share those store paths across remote nodes and other machines.

@layus
Copy link
Collaborator

layus commented Oct 23, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 major: an upcoming release type: feature request
Projects
Status: In Progress
Development

No branches or pull requests