Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate usage of OpenGL utilties and NixOS options. #141803

Closed
jonringer opened this issue Oct 15, 2021 · 25 comments · Fixed by #306121
Closed

Deprecate usage of OpenGL utilties and NixOS options. #141803

jonringer opened this issue Oct 15, 2021 · 25 comments · Fixed by #306121
Labels
0.kind: enhancement Add something new 6.topic: cuda Parallel computing platform and API 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS
Milestone

Comments

@jonringer
Copy link
Contributor

jonringer commented Oct 15, 2021

The /run/opengl-driver/ directory contains more than just opengl libraries now. It also includes vulkan implementations, machine learning libraries, hardware-accelerated encoding and decoding, opencl implementations, and many others. Directory renaming is now out of scope

This directory is also being used as the way to get access to the host machines gpu drivers. I think it's another source of confusion for many users to have to add addOpenGLRunpath so that they can get a vulkan game / cuda workload / opencl workload to work.

I think we should deprecate this prior to the 21.11 release, and fully move it to something like /run/gpu-driver/ or something more representative of the desired affect.

I would also like if someone specifies a value for services.xserver.videoDrivers, that they would also get the gpu libraries mounted automatically instead of needing to do a separate hardware.opengl-drivers.enable = true;. This was done in 02cef04c8187

cc @ryantm

EDIT: roadmap (2.5 years later... lol):

@jonringer jonringer added the 0.kind: bug Something is broken label Oct 15, 2021
@jonringer jonringer added this to the 21.11 milestone Oct 15, 2021
@jonringer jonringer added 0.kind: enhancement Add something new and removed 0.kind: bug Something is broken labels Oct 15, 2021
@jonringer
Copy link
Contributor Author

For the transition period ( NixOS 21.11), I think keeping the old /run/opengl-driver would help alleviate a lot of the pain, would just like to get the ball rolling away from using opengl explicitly

@nrdxp
Copy link

nrdxp commented Oct 15, 2021

I agree. Also, on a related note, having to specify your graphics driver at the xserver level in the module system feels wrong to me. I'm still not exactly sure how it maps given names to driver packages either.

@veprbl veprbl added the 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS label Oct 16, 2021
@jonringer
Copy link
Contributor Author

I propose two changes:

  • move /run/opengl-driver/ -> /run/gpu-driver/
    • make /run/opengl-driver/ a symlink to /run/gpu-driver/ for backwards compatibility of existing derivations
  • services.xserver.videoDrivers -> hardware.videoDrivers
    • add a (mkRemovedModule services.xserver.videoDrivers hardware.videoDrivers)

@jonringer
Copy link
Contributor Author

potential future work as well:

  • addOpenGLRunpath -> addGPURunpath
    • Until 21.11 EOL, have it add both /run/opengl-driver/ and /run/gpu-driver/, to avoid system vs package invariance

@jonringer
Copy link
Contributor Author

jonringer commented Oct 16, 2021

I'm also kind of wondering whether gpu-driver is just another fallacy in naming, and we should really be exposing something like, /run/system-driver/ would be even more appropriate, as other hardware-accelerated devices such as tpus, fpga's, and others would also likely need to import "libraries which are determined by the system, and need to be available at runtime".

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-to-run-bevy-game-in-nixos/17486/11

@Artturin Artturin modified the milestones: 21.11, 23.05 Dec 31, 2022
@SomeoneSerge SomeoneSerge added the 6.topic: cuda Parallel computing platform and API label Mar 23, 2023
@github-project-automation github-project-automation bot moved this to 🆕 New in CUDA Team Mar 23, 2023
@SomeoneSerge SomeoneSerge moved this from 🆕 New to 📋 Backlog in CUDA Team Mar 23, 2023
@SomeoneSerge SomeoneSerge moved this from 📋 Backlog to Roadmap in CUDA Team Apr 12, 2023
@SomeoneSerge
Copy link
Contributor

Should we, in addition, consider using a libc-versioned directory? Motivated by https://discourse.nixos.org/t/python-dll-import/22594/17

@RaitoBezarius RaitoBezarius modified the milestones: 23.05, 23.11 May 31, 2023
@jonringer jonringer changed the title Move /run/opengl-driver/ to a more appropriate directory Deprecate usage of OpenGL utilties and NixOS options. Mar 20, 2024
@jonringer
Copy link
Contributor Author

Decided to rename this issue since renaming the underlying path was deemed too risky for potential breakages.

@Atemu Atemu modified the milestones: 23.11, 24.05 Mar 20, 2024
@Atemu
Copy link
Member

Atemu commented Mar 20, 2024

Could you provide some context as to why?

My naive assumption was that we'd have a new path but:

  • have both new and legacy path added to rpath via the hook; new path preferred
  • provide a legacy symlink pointing at the new path in NixOS

which would be quite trivial.

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Mar 20, 2024

TBH, I'm not sure how much value there is to migrating the filesystem path (unless we splay the paths further to increase granularity). I suspect what users see in practice is the nixos option path

@Atemu
Copy link
Member

Atemu commented Mar 20, 2024

For the NixOS module change, I came up with these terms:

  • acceleration API = an API standard which intends to provide hardware-assisted compute acceleration such as OpenGL, Vulkan or CUDA
  • driver = userspace library component which implements one or more hardware-based acceleration APIs

The usage of the NixOS options API would then look a little like this:

{
  hardware.acceleration.opengl.enable = true;
  hardware.drivers.mesa.enable = true;
}

Individual drivers would also have a package option which is a package selector for the specific package to use aswell as driver-specific settings. (More on the package selector data type when I have a more concrete implementation but it'd allow setting 64/32 bit using a single value.)

Each acceleration API would have a drivers option which is an attrset with driver names as keys and package selectors as values. These could then be overridden indivually. The intention behind this is to enable use-cases where e.g. different versions of the same driver are required for different APIs or a specific driver must be disabled for one API but enabled in all others.
All enabling a driver would do would be to insert the driver's package into the options of all APIs that it implements.

Additionally, accel APIs have a default option if the API supports setting one and of course API-specific knobs.

Here's an example of how a non-trivial driver config could look like:

{
  hardware.drivers.mesa.enable = true; # Would be default if GUI is enabled
  hardware.drivers.mesa.package = p: p.mesa_42; # Perhpaps possible at some point
  hardware.drivers.amdvlk.enable = true;
  hardware.drivers.amdgpu-pro.enable = true;
  hardware.drivers.cuda.enable = true;
  hardware.drivers.cuda.package = p: p.cudaPackages_12;

  hardware.acceleration.vulkan.enable = true;
  # Ideally you could specify the concrete driver name ("RADV", "radeonsi", "zink"
  # etc.) here but I don't think that'd be possible to implement
  hardware.acceleration.vulkan.default = "mesa"; # AMDVLK is still present but mesa's drivers are preferred
  hardware.acceleration.vulkan.drivers.amdgpu-pro = null; # Disable the vulkan driver; can't ever be used
  # Perhaps instead:
  # hardware.drivers.amdgpu-pro.apis = [ "opencl" ]; # to not insert itself into drivers other than OpenCL?

  hardware.acceleration.opencl.enable = true;
  hardware.acceleration.opencl.default = "amdgpu-pro"; # OpenCL from AMDGPU-PRO
}

It's currently not possible to know which .so implements which driver and all drivers are thrown into the same directory but the NixOS API drafted above requires this information for fine-grained control, so we will have to define an interface here. I've been thinking of defining the drivers output to contain all driver bits and subdivide it into driver subdirs like this

vulkan/lib/libvulkan_radeon.so
vulkan/share/vulkan/icd.d/radeon_icd.x86_64.json
vulkan/...
cuda/...
opengl/...

Alternatively, I had thought of using multiple outputs; one for each driver. I kinda like that approach more as it generally strikes me as cleaner and has the potential to reduce closure size (mesa.drivers is ~200M) but am not sure about potential naming conflicts or other downsides of those.

In light of this, I'd also like to propose to split the global lib dirs into accel API subdirs for clean separation.
Though as @SomeoneSerge said, the NixOS options are a lot more critical to UX and therefore the first steps. Global path name and other changes are almost entirely tangential and could happen independantly but I'd appreciate if they came after the NixOS options changes such that we can let the knowledge we gained from the NixOS options influence their design.


Anyhow, that's my current draft. Let me know what you think.

@jonringer
Copy link
Contributor Author

My naive assumption was that we'd have a new path but:
have both new and legacy path added to rpath via the hook; new path preferred
provide a legacy symlink pointing at the new path in NixOS

That was my initial take as well, but there's community projects like nixGL and others who likely have hard coded the path. And there's a potential that we could never fully remove it without breaking someone

@MattSturgeon
Copy link
Contributor

MattSturgeon commented Mar 22, 2024

nixGL and others who likely have hard coded the path. And there's a potential that we could never fully remove it without breaking someone

Forgive my ignorance, but wouldn't it be ok to make those kinda breaking changes by deprecating in one release and then removing one or two releases later?

Surely nixos isn't promising to never change paths because someone might be using the old location?

EDIT:I don't mean to imply we shouldn't consider non-breaking options where possible

@SomeoneSerge
Copy link
Contributor

That was my initial take as well, but there's community projects like nixGL and others who likely have hard coded the path

nixGL only sets the LD_LIBRARY_PATH? Although in all probability we must break someone's workflow...

hardware.drivers.cuda.package = p: p.cudaPackages_12;

Fun fact: cudaPackages don't ship the driver (except for cuda_compat for jetsons). All the relevant stuff comes with nvidia_x11 or with the datacenter driver.

hardware.acceleration.opencl.default = "amdgpu-pro"

Interesting. Would we somehow have to filter the glvnd/opencl loader/vulkan loader configs that come with the drivers to enforce that the declarative configuration be implemented?

@Atemu
Copy link
Member

Atemu commented Mar 23, 2024

there's community projects like nixGL and others who likely have hard coded the path. And there's a potential that we could never fully remove it without breaking someone

nixGL appears to be actively maintained. I don't think we need to worry about breaking it as they can simply adapt.

Also, we could simply keep providing the old path in both NixOS and binaries with a preference for the new path and that should work out fine.

Fun fact: cudaPackages don't ship the driver (except for cuda_compat for jetsons). All the relevant stuff comes with nvidia_x11 or with the datacenter driver.

Oh, okay. I'll leave it to you CUDA people to figure that out. I will only be lifting drivers to this interface on a best-effort basis and the rest will keep using their legacy interface with a minor implementation change.

Though if you forsee possible issues you could have mapping your driver to this interface, let me know so that we can design the interface better from the start.

Interesting. Would we somehow have to filter the glvnd/opencl loader/vulkan loader configs that come with the drivers to enforce that the declarative configuration be implemented?

That or simply write our own. Though I'll only be adding the default option where it is feasible; currently, I only know that Vulkan has sane mechanisms for this.


Some issues I still need to think about:

  • How to handle drivers that only support 64 bit?
    • Will have to filter that somehow
  • How to handle APIs that can only have a single exclusive driver?
    • Only have a driver option on the API, rather than drivers?
    • Have drivers but only install the default driver?

@SomeoneSerge
Copy link
Contributor

nixGL

I think NixGL doesn't care about addDriverRunpath.driverLink, it probably just exports an LD_LIBRARY_PATH?

@jonringer
Copy link
Contributor Author

The original protest against renaming the path: NixOS/rfcs#121 (comment)

I feel more strongly about the NixOS options being intuitive than the implementation consistent.

@Atemu
Copy link
Member

Atemu commented Mar 24, 2024

I think @infinisil was more concerned with it being placed under /run/current-system/ rather than some other directory in /run/ because /run/current-system/ is very much NixOS-specific while the current /run/opengl-driver/ is not.

@jonringer
Copy link
Contributor Author

I believe I got all of the packages. Once merged, I should be able to deprecate cudaPackages.addDriverRunpath

@nviets
Copy link
Contributor

nviets commented Apr 2, 2024

This thread is over my head, but I would like to check it catboost, lightgbm, and python-modules/lightgbm need the same? All three have GPU or CUDA support options.

@jonringer
Copy link
Contributor Author

This thread is over my head, but I would like to check it catboost, lightgbm, and python-modules/lightgbm need the same? All three have GPU or CUDA support options.

This is more about deprecating usage of the term OpenGL, as it's a bit of a misnomer. I don't see usage of the legacy OpenGL helpers in those packages.

@jonringer
Copy link
Contributor Author

Talked to @Atemu , we concluded that we can close this ticket and have a more "NixOS options" specific ticket to represent that work.

@jonringer
Copy link
Contributor Author

Going to do one last pass on documentation to ensure all references have been updated. I'll close this ticket after that sweep

@superherointj
Copy link
Contributor

superherointj commented Apr 18, 2024

https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/python-modules/numba/default.nix#L23

I wonder if numba needs an update. I'm not offering a PR because I'm out of context.
Arrived here from a warning in nixos-rebuild. And I'm still wondering how this is being triggered [still investigating].

@jonringer
Copy link
Contributor Author

I wonder if numba needs an update. I'm not offering a PR because I'm out of context.
Arrived here from a warning in nixos-rebuild. And I'm still wondering how this is being triggered [still investigating].

It should come from the toplevel pkgs scope now, so it should never need to default to pulling them out of cudaPackages. I can update the nix expression though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: enhancement Add something new 6.topic: cuda Parallel computing platform and API 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS
Projects
Status: Done