hydra: build CUDA packages for the CUDA team #1335

zimbatm · 2024-07-02T16:21:46Z

I don't know how much resources that would take, but we could give it a go.

CUDA packages are slow to build, and not cached by upstream due to upstream not building unfree packages. This could help the team quite a bit.

terraform/hydra-projects.tf

zimbatm · 2024-07-02T17:08:26Z

terraform/hydra-projects.tf

+  input {
+    name              = "nixpkgs"
+    type              = "git"
+    value             = "https://github.com/NixOS/nixpkgs.git nixos-unstable"


@SomeoneSerge do you want to use another branch, like a staging-cuda or something?

We tried pushing big changes to cuda-updates first and building it prior to merging into master, but we kept coming back to targeting master directly. In our hercules we build master + nixos-unstable + the latest release: this way by the time CI starts a round of nixos-unstable, some of will have been cached by the master job. This alleviates some of the pain of nixos-unstable advancing without testing CUDA

A related idea would be to pull from nix-community/nixpkgs and give the CUDA team access to it. That would limit the risks compared to giving everybody push access to NixOS/nixpkgs

For now we're building nixos-unstable-small

zowoq · 2024-07-02T23:57:09Z

I don't know how much resources that would take, but we could give it a go.

No objection. Maybe if this ends up too much for our current hardware we could consider upgrading.

terraform/hydra-projects.tf

Mic92 · 2024-07-03T05:41:00Z

I don't know how much resources that would take, but we could give it a go.

No objection. Maybe if this ends up too much for our current hardware we could consider upgrading.

If you tell companies that all they need to do to get CUDA packages for NixOS is to pay some monthly donations, than I am sure we get the funding pretty quick. Also we still haven't asked Hetzner for the discount they are offering to the NixOS foundation. This way we would probably still save money with bigger hardware.

zowoq · 2024-07-03T10:02:40Z

terraform/hydra-projects.tf

-  input {
-    name              = "supportedSystems"
-    type              = "nix"
-    value             = "[ \"x86_64-linux\" \"aarch64-linux\" \"aarch64-darwin\" ]"


Linux only?

value = "[ \"x86_64-linux\" \"aarch64-linux\" ]"

For now it's "x86_64-linux" only since upstream has set that value as a default.

Yes, I mean we should restrict it here to linux only as well.

We support both x86_64 and aarch64 linux, I'll update the release file

zimbatm · 2024-07-03T10:15:21Z

First run: https://hydra.nix-community.org/eval/109915

terraform/hydra-projects.tf

SomeoneSerge · 2024-07-03T19:31:03Z

terraform/hydra-projects.tf

@zowoq so I'm considering just building the whole import <nixpkgs> { config.cudaSupport = true; }; this gives me something like

❯ nix-eval-jobs --expr 'import ./pkgs/top-level/release-cuda.nix { }' --force-recurse | wc -l ... # eval errors, eval errors ... 138452

Does that sound unreasonable? I could in principle come up with a smaller, curated set of jobs.

Hexa also raises the concern that this would be effectively mirroring the NixOS Hydra:

hexa (UTC+1)
so all of them
if there was a cache behind nix-community hydra, than you'd be mirroring cache.nixos.org effectively
SomeoneSerge (UTC+3)
Yeah... Ideally we'd have a solution that evaluates the full DAGs for vanilla and cuda nixpkgs, starts building cuda from the leaves (ehhh, the roots), and always suspends the build if it hash matches the vanilla hash

import <nixpkgs> { config.cudaSupport = true; }

TBH I doubt we have the capacity to handle that without it being detrimental to the other projects using this CI. Currently we only have two builders for linux that are shared by buildbot, hercules, hydra:

### `build03` - Provider: Hetzner - CPU: AMD Ryzen 9 3900 12-Core Processor - RAM: 128GB DDR4 ECC - Drives: 2 x 1.92 TB NVME in RAID 1 ### `build04` - Provider: Hetzner - Instance type: [RX170](https://www.hetzner.com/dedicated-rootserver/rx170) - CPU: Ampere Altra Q80-30 80-Core Processor - RAM: 128GB DDR4 ECC - Drives: 2 x 960 GB NVME in RAID 0

If you do want to build everything maybe we could have dedicated machines just for this package set? Not sure if raising the money for that is feasible?

if there was a cache behind nix-community hydra, than you'd be mirroring cache.nixos.org effectively

Not sure if I've misunderstood or not, we push everything to cachix but it skips existing nixos cache paths.

Not sure if I've misunderstood or not, we push everything to cachix but it skips existing nixos cache paths.

The concern is that if there is a phase shift between NixOS and the Community Hydras, and the latter starts building a certain derivation from a certain commit before the former, we'll have wasted some storage and compute

TBH I doubt we have the capacity to handle that without it being detrimental to the other projects using this CI. Currently we only have two builders for linux that are shared by buildbot, hercules, hydra:

Roger that. I'll push a smaller jobset tomorrow, based on what we've been building in https://github.com/SomeoneSerge/nixpkgs-cuda-ci.

How is the community builder funded? @ConnorBaker was asking on matrix if there's an opencollective

The concern is that if there is a phase shift between NixOS and the Community Hydras, and the latter starts building a certain derivation from a certain commit before the former, we'll have wasted some storage and compute

Yeah, we can't really avoid that with this approach. Could try adding the free deps as blockers for nixos-unstable or could try doing something similar in a repo here, flake update PRs with max-jobs = 0 so merging is blocked if they aren't cached?

How is the community builder funded? ConnorBaker was asking on matrix if there's an opencollective

We have an opencollective: https://opencollective.com/nix-community

ConnorBaker · 2024-07-04T06:20:27Z

I don't know how much resources that would take, but we could give it a go.

No objection. Maybe if this ends up too much for our current hardware we could consider upgrading.

If you tell companies that all they need to do to get CUDA packages for NixOS is to pay some monthly donations, than I am sure we get the funding pretty quick. Also we still haven't asked Hetzner for the discount they are offering to the NixOS foundation. This way we would probably still save money with bigger hardware.

They offer discounts? My Hetzner bill for part of the CUDA CI is like $400; I’d love to consolidate some of that stuff under the community, especially if you can get a discount and we can all benefit from it!

zimbatm · 2024-07-04T10:46:43Z

Merging the current state. We can still do follow-up PRs afterwards!

zimbatm · 2024-07-05T17:35:12Z

They offer discounts? My Hetzner bill for part of the CUDA CI is like $400; I’d love to consolidate some of that stuff under the community, especially if you can get a discount and we can all benefit from it!

Yes, but they ran out of the discount budget for this year. We'll have to contact them again next year.

zimbatm · 2024-07-05T17:35:54Z

If you want to donate hardware to the cause, we are discussing what the requirements would be in #1343

zowoq · 2024-07-05T22:42:11Z

My Hetzner bill for part of the CUDA CI is like $400

Could you go into some detail please? What hardware, what is built and what is the utilization like?

zowoq · 2024-07-15T10:33:19Z

I've reverted this as it had been interfering with our other CI builds.

I'll see if I can find a way of running these builds without causing problems for our other users.

nix-community/infra#1335

- Add pre-commit-hooks.cachix.org - Add cache.lix.system - Delete cuda cachix due to it is cached at nix-community.cachix.org now -- nix-community/infra#1335

github-actions bot added the terraform label Jul 2, 2024

SomeoneSerge reviewed Jul 2, 2024

View reviewed changes

terraform/hydra-projects.tf Show resolved Hide resolved

SomeoneSerge reviewed Jul 2, 2024

View reviewed changes

terraform/hydra-projects.tf Show resolved Hide resolved

zimbatm commented Jul 2, 2024

View reviewed changes

zowoq reviewed Jul 3, 2024

View reviewed changes

terraform/hydra-projects.tf Outdated Show resolved Hide resolved

zowoq reviewed Jul 3, 2024

View reviewed changes

zimbatm force-pushed the hydra-nixpkgs-cuda branch from 0d372d9 to df44813 Compare July 3, 2024 10:20

zowoq reviewed Jul 3, 2024

View reviewed changes

terraform/hydra-projects.tf Outdated Show resolved Hide resolved

zowoq force-pushed the hydra-nixpkgs-cuda branch from df44813 to 2068fc5 Compare July 3, 2024 12:48

hydra: build CUDA packages for the CUDA team

2608479

zowoq force-pushed the hydra-nixpkgs-cuda branch from 2068fc5 to 2608479 Compare July 3, 2024 13:38

SomeoneSerge mentioned this pull request Jul 3, 2024

release-cuda: build with config.cudaSupport NixOS/nixpkgs#324379

Merged

13 tasks

SomeoneSerge reviewed Jul 3, 2024

View reviewed changes

zimbatm added this pull request to the merge queue Jul 4, 2024

Merged via the queue into master with commit 4c757f9 Jul 4, 2024
38 checks passed

zimbatm deleted the hydra-nixpkgs-cuda branch July 4, 2024 10:53

ConnorBaker mentioned this pull request Jul 17, 2024

Policy for third-party hardware donation #1343

Open

ryan4yin added a commit to ryan4yin/nix-config that referenced this pull request Nov 11, 2024

chore: cuda is cached at nix-community.cachix.org now

dffb641

nix-community/infra#1335

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hydra: build CUDA packages for the CUDA team #1335

hydra: build CUDA packages for the CUDA team #1335

zimbatm commented Jul 2, 2024

zimbatm Jul 2, 2024

SomeoneSerge Jul 2, 2024

zimbatm Jul 4, 2024

zimbatm Jul 4, 2024

zowoq commented Jul 2, 2024

Mic92 commented Jul 3, 2024 •

edited

Loading

zowoq Jul 3, 2024

zimbatm Jul 3, 2024

zowoq Jul 3, 2024

SomeoneSerge Jul 3, 2024

zimbatm commented Jul 3, 2024

SomeoneSerge Jul 3, 2024

zowoq Jul 4, 2024

SomeoneSerge Jul 4, 2024

SomeoneSerge Jul 4, 2024 •

edited

Loading

zowoq Jul 4, 2024 •

edited

Loading

ConnorBaker commented Jul 4, 2024

zimbatm commented Jul 4, 2024

zimbatm commented Jul 5, 2024

zimbatm commented Jul 5, 2024

zowoq commented Jul 5, 2024

zowoq commented Jul 15, 2024

hydra: build CUDA packages for the CUDA team #1335

hydra: build CUDA packages for the CUDA team #1335

Conversation

zimbatm commented Jul 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zowoq commented Jul 2, 2024

Mic92 commented Jul 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zimbatm commented Jul 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SomeoneSerge Jul 4, 2024 • edited Loading

Choose a reason for hiding this comment

zowoq Jul 4, 2024 • edited Loading

Choose a reason for hiding this comment

ConnorBaker commented Jul 4, 2024

zimbatm commented Jul 4, 2024

zimbatm commented Jul 5, 2024

zimbatm commented Jul 5, 2024

zowoq commented Jul 5, 2024

zowoq commented Jul 15, 2024

Mic92 commented Jul 3, 2024 •

edited

Loading

SomeoneSerge Jul 4, 2024 •

edited

Loading

zowoq Jul 4, 2024 •

edited

Loading