-
-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
register NNlibCUDA as a subpackage of NNlib #300
Comments
@JuliaRegistrator register() subdir=lib/NNlibCUDA |
Registration pull request created: JuliaRegistries/General/32119 After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version. This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:
|
@JuliaRegistrator register() subdir=lib/NNlibCUDA |
Registration pull request updated: JuliaRegistries/General/32119 After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version. This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:
|
@DilumAluthge there is also failing |
Yeah no worries at all! We can merge it manually in 3 days. |
It might make sense to register https://github.com/FluxML/NNlibCUDA.jl instead. We wouldn't have to deal with registering it oddly and it keeps the projects separate in their own little silos. Currently NNlibCUDA doesn't get used by NNlib internally and developing it means getting unintended code alongside. |
don't see any oddity here, this subpackage thing is actually quite cool and already well supported by julia's toolings
That's exactly what we don't want since the two packages are tightly coupled. When we change NNlib we want to make sure everything is fine with NNlibCUDA by running its tests, and sometimes we have to make changes that involve both packages (adding some function requiring also a cuda kernel). Having NNlibCUDA as a subpackage is much safer since it allows combined CI very easily (I already put that in place), makes contributing easier, and relieves a lot of the maintenance burden.
I don't see what's the problem in also seeing the folder lib/NNlibCUDA when developing, if anyone gets confused by that he probably shouldn't be developing NNlib at all. Actually, as I said above, the possibility to make changes to both packages in a single PR is great. |
Not really, since most of the work in NNlib proper is not focused on CuDNN at all, in fact, the code in the two achieve very different goals. You typically either work on the kernels or the plumbing (NNlibCUDA is a bridge to CUDNN), rarely both. Most new kernels shouldn't be written twice anyway. Waiting around for GPU ci when not changing any GPU code is also a thing here, since most nnlib code isn't meant for GPUs.
I don't think that's the message we want to send to contributors... |
I'm a little late to this, but can someone explain why NNlibCUDA exists as a subdir and a separate repo and why there wasn't a consensus made about which one it should be before now? Getting a whiff of some communications breakdown or personal enmity being involved and I don't like it. |
I needed something to get testing with CUDA#master, Julia 1.6 and share that work easily and quickly. See also https://github.com/FluxML/Flux.jl/tree/dg/cuda16 which gets flux+ cuda + 1.6 usable. Either way is fine in the long run as long as we get our stack functional :) |
I like to be able to do |
@ToucheSir Dhairya and I disagree on a lot of stuff, which is of course unfortunate for two people working on the same project. There is nothing about "personal" and "enmity" involved here though.
The first comment pointing about the possibility of using multi-package repos for this stuff was As someone doing a lot of maintenance here, I know very well that having a separate NNlibCUDA.jl would be an additional maintenance hassle (a finite and non-small fraction of which weighing on me), not a big deal but something I'd rather avoid if possible. I see a few pros in having the multi-repo in one package approach.
but probably that actually works. Finally, as Dhairyia said, we can split out a repo at any point, nothing here is irreversible, but the fact that someone already put the things in a functional state and with a nicely integrated CI (so better situation than before) is something that should be exploited.
We have a relevant fraction of code (e.g. activation functions) that works on both gpu and cpu and should be tested on both |
Yes. |
I think setting up ci isn't something new. We also shouldn't have nnlib depending on so many heavy deps like cuda. That makes it very heavy for the rest of the ecosystem to use. |
For what it's worth, having a package in a subdir doesn't imply a dependency. Personally I wouldn't place a subdir package inside another package (rather place them as siblings in the repo) since it means that the subdir package code will be distributed with all copies of the main package, but the dependencies of the subdir package still don't affect the main package at all. Well, unless the subdir package is a dependency of the main package, but then you would get exactly the same effect in a separate repository. |
I don't think the cuda version is a dep, so it's good that those are separate |
As @GunnarFarneback said, NNlib's doesn't depend on anything CUDA-related, NNlibCUDA is its own package (depending on NNlib). Here we just have the same scaffolding of KernelAbstractions with CUDAKernels and ROCKernels https://github.com/JuliaGPU/KernelAbstractions.jl/tree/master/lib |
So I´d like to give this package-in-a-subdir a try, there is no problem in pulling out at any moment anyway. Maybe the people from KernelAbstractions @vchuravy @jpsamaroo can tell us whether their experience with this has been positive or not? |
As I've mentioned elsewhere, moving the location of a subdirectory package is nontrivial. You have to make sure that the new location of the package includes all of the Git tree SHAs that have already been registered in the General registry. So I would recommend that you first make a definitive decision on which of the two options you want:
And then register based on that decision, with the knowledge that switching from option 1 to option 2 requires a nontrivial amount of work. |
@DilumAluthge there is something I don´t understand, from the answer to this question, the relocation process seemed quite easy... |
Idk, I've never done it myself, but here's one data point: SnoopCompileBot was originally a subdirectory package. At one point, the maintainer wanted to move it to its own repo. The maintainer found the process difficult enough that instead of moving it to its own repo, they just made a new package called CompileBot. You can probably find that conversation by searching for SnoopCompileBot and CompileBot in PRs to the General registry. Again, I've never done this personally, but if it was painful enough that one package author gave up and just made a new package, I find that concerning. I would recommend at least looking over the historical discussion for SnoopCompileBot/CompileBot. |
ok, that seems something to consider. Let´s wait for some feedback from @vchuravy and @jpsamaroo |
I personally don't think we've had enough time to evaluate the merits of the subpackage approach, since KA 0.6 was only released 11 days ago. I'm personally in favor of the separate package approach, which you can always "promote" to a subpackage later without much difficulty (there are tools to combine git histories of different repos too). |
it seems I'm the only one hyped by this subdir approach, let's move everything to the separate repo https://github.com/FluxML/NNlibCUDA.jl then. I'll make a PR removing NNlibCUDA from here soon |
Oh! Unfortunately I didn't see this discussion in time. Now it's probably too late (hopefully not?), but I am very much in favor of keeping things as a single package with sub-packages. I maybe don't overlook the whole picture, but here are two reasons / examples:
Did I get sth completely wrong? At the danger of riding a dead horse here, but maybe @timholy can share his experiences first hand? Maybe his reasons for moving it out don't apply here. |
As the maintainer of this package and others, I understand these concerns, deal with the consequences and think this was the right move. You see:
Remeber too that a design that expects an NNlibX necessitates some amount of code/ logic repetition. It's also fair to say that NNlib devs shouldn't be expected to maintain every single spin-off, thus also making the monorepo difficult to scale and QA. The GPU backend really does achieve different end goals so it makes sense to be separate. In fact, when the time comes, it would be better to remove the CUDA moniker and have it depend on GPUCompiler only, thus obviating the need for an AMD and KA flavour. The versioning and following proper semver can be handled like we do with so many dependencies in the ecosystem. Yes it would mean working over multiple repos, but the versioning standard isn't any different for sub packages, so it's the same regardless of where the code lives. Recently, a faulty NNlib was also the reason behind the breakage of the tutorials that involve plotting, which also includes all the SciML ones, so I would want to ensure some amount of stability, which does require us to separate the concerns of different sections of code. |
Ok, thank you for the write-up. I'm still not convinced, but let's see where it takes us :) |
reopening this after a few months of working with NNlibCUDA as a separate package. In the work for gather/scatter we add to ping-pong between the two libraries multiple times, it has been very annoying, so I encourage people to reconsider the option of having NNlibCUDA as a subpackage. Also see #332. Moreover, according to https://github.com/FluxML/Flux.jl/pull/1566/files we are going to have FluxCUDA, FluxAMDGPU etc.. as subpackages of Flux, so NNlib should reflect that. |
No decision has been made on that front. The issue seems to be that there isn't so much a need for the cuda package at all but that the code for |
No description provided.
The text was updated successfully, but these errors were encountered: