Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify intended usage #31

Open
RuRo opened this issue Jul 26, 2024 · 6 comments
Open

Clarify intended usage #31

RuRo opened this issue Jul 26, 2024 · 6 comments

Comments

@RuRo
Copy link

RuRo commented Jul 26, 2024

The README seems to suggest that adding cuda-maintainers.cachix.org as a substituter and setting allowUnfree = true and cudaSupport = true is sufficient to get the prebuilt packages. However, quite often I end up rebuild some of the CUDA-enabled packages after updating.

I have a few questions:

  1. I currently have nixpkgs following github:nixos/nixpkgs/nixos-unstable in my flake and I run nix flake update nixpkgs every once in a while, but this seems like a bad strategy, because the CI might be lagging behind upstream and not every commit may be successfully built.

    Is there some better way to only track the nixos-unstable commits that were succesfully built by nixpkgs-cuda-ci? The README links to the hercules dashboard, but it's not clear how to get the desired information from that dashboard. It also looks like most jobs are failing for some reason.

  2. The README mentions that

    We build for different cuda architectures at a different frequencies, which means that to make use of the cache you might need to import nixpkgs as e.g. import { ...; config.cudaCapabilities = [ "8.6" ]; }. Cf. the flake for details

    What are those "different frequencies" exactly?

  3. nix/overlays.nix seems to also be optionally enabling MKL versions of LAPACK/BLAS.

    Are these versions of packages also built in CI and if so, how often?

So, for example, if I set cudaCapabilities = [ "8.6" ] and enable the MKL the same way as your nix/overlay.nix, how can I determine the latest nixos-unstable commit that is already available in cuda-maintainers.cachix.org?

@zopieux
Copy link

zopieux commented Oct 11, 2024

I believe you've nailed those issues on the head, I also struggle answering those exact questions, making using the cache an exercise in frustration.

I have had some success asking around the NixOS CUDA Matrix room, however figuring out which package is successfully included in which nixpkgs-revision CI build seems basically impossible without some insider know-how. SomeoneSerge at a few occasions was kind enough to point me to the exact build that included what I needed, but I have yet to be able to reverse-engineer how this can be done as an end-user without bothering the maintainers.

@SomeoneSerge
Copy link
Owner

Hi! Sorry about the frustration. I've been spending less and less time on this repo. My idea for the next steps is roughly this:

  1. Confirm with the nix-community infra people that they're ready to publicly advertise their support of cuda
  2. Update the readme to point to https://hydra.nix-community.org/jobset/nixpkgs/cuda and https://nix-community.org/cache/
  3. Archive the present repo

P.S. @RuRo Sorry for the delayed response, I actually didn't get a notification for this issue o_0

@RuRo
Copy link
Author

RuRo commented Oct 14, 2024

This sounds like a great development!

I still have one question, though: if/when this repo gets archived, what would be the appropriate place to discuss / report issues with the new nix-community CUDA cache/builders? For example, a lot of the questions in my original post would also apply to the nix-community cache:

  • Are they only building CUDA with all the capabilities enabled or are there builds like cudaCapabilities = [ "8.6" ]?
  • Are there going to be builds with MKL enabled?
  • Is there some way to (automatically or semi-automatically) track nixos-unstable, but only using the subset of commits that had successfully built and cached the CUDA packages?
  • etc

Thanks.

@SomeoneSerge
Copy link
Owner

One venue would be #nix-community:nixos.org on matrix paralleled by https://github.com/nix-community/infra/issues on github. The nix-community hydra follows the nixos-unstable branch and builds its pkgs/top-level/release-cuda.nix file. That's where the list of packages and capabilities are controlled, and currently that only features the "all caps" variant for x86_64 and "all caps" sbsa (not jetson) for aarch64. This can be adjusted by opening a PR against Nixpkgs, but in coordination with the nix-community team because these changes might lead to dramatic impact in load on the community hydra's build servers, shared with projects other than the cuda cache

@SomeoneSerge
Copy link
Owner

Follow the links in NixOS/nixpkgs#324379

@SomeoneSerge
Copy link
Owner

Also to answer the original questions, even though that's less relevant now:

nix/overlays.nix seems to also be optionally enabling MKL versions of LAPACK/BLAS.

Two ideas wrt the overlays were 1) to test non-default instances of packages (e.g. mpi or mkl support that was otherwise disabled), 2) to provide an executable instructions on how to get a cache-hit/a matching hash when enabling these optional features, since it's kind of like looking for a needle in a haystack...

Most parts of the overlays were over time merged into nixpkgs (some guarded behind config.cudaSupport) so the overlays became less relevant

What are those "different frequencies" exactly?

That used to be specified like so: https://github.com/SomeoneSerge/nixpkgs-cuda-ci/pull/14/files#diff-206b9ce276ab5971a2489d75eb1b12999d4bf3843b7988cbe8d687cfde61dea0L170

But then the onSchedule jobs were disabled because hercules kept on accumulating pending effects without ever running any, requiring that the queue be reset. Currently there's just a github action running updating the lock file from time to time and triggering the default job...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants