-
-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make NNlibCUDA an extension #492
Conversation
Project.toml
Outdated
|
||
[deps] | ||
Adapt = "79e6a3ab-5dfb-504d-930d-738a2a938a0e" | ||
Atomix = "a9b6321e-bd34-4604-b9c9-b65b8de01458" | ||
ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4" | ||
cuDNN = "02a925ec-e4fe-4b08-9a7e-0d78e3d38ccd" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to make cuDNN
a strong dependence because it seems that there is no way to make using CUDA
also trigger the loading of cuDNN from the extension. This is not ideal but the alternatives seem to be worse:
- have the extension triggered by
using CUDA, cuDNN
- keep
NNlibCUDA
as a separate package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some other option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making it a strong dep is a non-starter because cuDNN
directly depends on CUDA.jl. I would say we go with option 1, but there are a couple variations on it we could also consider:
- Make only
cuDNN
a weak dep and access CUDA.jl throughcuDNN.CUDA
. - Create separate extensions for CUDA.jl and
cuDNN
. Then someone can choose to only load the former if they don't need e.g. conv functionality. This doesn't make anything easier so we should probably consider it after an extension is in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think we should go with using CUDA, cuDNN
. This will also carry over to Flux's usage.
I'm annoyed by portability issues for scripts. Ideally, it should be possible to run the same script on any machine without any edit and using the appropriate device, so something like:
using Flux
gpu_backend = ... # Some hardware detection utility or Preferences.jl magic?
if gpu_backend == "cuda"
using CUDA, cuDNN
elseif gpu_backend == "amdgpu"
using AMDGPU
elseif gpu_backend == "metal"
using Metal
end
Even better, the whole loading should be conditionally done by flux itself, but I don't think this can be done without hard dependencies.
Anyways, since this is a crucial decision, let's also ask @mcabbott and @darsnack if they are ok with having the CUDA extension here be triggered by using CUDA, cuDNN
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like using Flux, CUDA, cuDNN
, it seems a huge shame to have to load two packages, one of which is a super-obscure internal thing. I mean I don't even know where it lives, https://www.google.com/search?q=cuDNN.jl leads me only to one abandoned 5 years ago.
It's a huge shame to give up on using Flux, CUDA
as the interface. I understand that the default use of new package extensions does not allow us to then load cuDNN. I wonder if we should seriously consider either hacks for now (can Requires or something like it load cuDNN?) or finding out if upstream can be changed e.g. to unify cuDNN into CUDA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some packages might want to expose GPU-accelerated functionality without users having to depend on either CUDA or CUDNN. With Preferences, the user environment would then need to include CUDA (i.e. in the Project.toml) in order to set the CUDNN preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does JuliaPackaging/Preferences.jl#24 work for package extension dependencies? The example seems to imply that packages can set preferences for their dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. In any case, that example also shows how the active project needs to have A
as a hard dependency, either in [deps]
or in [extras]
, which is the point I was making above.
That said, although I'm happy to consider alternative ways of loading CUDNN-like functionality, I don't see it happening soon. Without a first-class language feature and using Preferences.jl, it would require users to import CUDA.jl to enable the CUDNN features, which IMO is mostly the same as having them do using cuDNN
. And even with a first-class feature where, say, packages could express in their Project.toml which features they request of a package, it doesn't seem clear how that would interact with package extensions (what if a user has CUDA.jl but not CUDA.jl+CUDNN, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By sheer coincidence, I noticed JuliaPackaging/Preferences.jl#53 was reported today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I'm currently too busy working on other things to consider reworking the CUDA.jl/cuDNN.jl situation again (especially now that it just stabilized a bit after introducing JLLs), but I'm not opposed to changes. So if anybody would want to explore a different mechanism for shipping CUDA libraries, feel free to open an issue or PR.
I propose we move on here with What should happen with Flux can be discussed elsewhere, the alternatives I see are either |
it should be ready for merge |
@ToucheSir merge? |
plugins: | ||
- JuliaCI/julia#v1: | ||
version: "1.6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this, we have no way to tell if changes to NNlib break NNlibCUDA on Julia <1.9. So either we add something like I mentioned in #495 (comment) or we create a separate Reverse CI step/pipeline for NNlibCUDA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the issue here. Changes to NNlib won't effect julia < 1.9 users since NNlib is julia >= 1.9 from now on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For backports, if there is ever gonna be one, we can test locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that Project.toml also bumped Julia compat. To be clear then, merging this would mean we're stopping feature development for Julia <1.9 and now maintaining two backport branches (one for NNlib and one for NNlibCUDA)? I recall there being mixed success with backport branches in FluxML before, are there any lessons learned from that so that this doesn't run into the same issues? cc @FluxML/committers
Edit: I misspoke about the two backport branches. Depending on how we want to handle extensions in Flux, this may require three backport branches, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear then, merging this would mean we're stopping feature development for Julia <1.9 and now maintaining two backport branches (one for NNlib and one for NNlibCUDA)?
we are stopping development for julia < 1.9 and we will maintain backport branches in case we feel the need and care enough about backporting something, which I consider an unlikely event. In expectation, the benefits far outweigh the drawbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 vote for moving to 1.9 soon!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, in that case I'll cast my vote for this too. The rest of the PR LGTM.
supercedes #481