-
-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjoint of cholesky
is hard-coded for the CPU
#1210
Comments
The rrule in ChainRules.jl is also hardcoded on cpu so removing the |
This seems worth opening an issue in CUDA.jl |
Hi @CarloLucibello ,
Does Base have |
It appears certain functions in LinearAlgebra do call trsm! though, so if you can update the rules to work with those there may be a better chance of getting GPU compatibility. |
@ToucheSir You're suggesting to call the |
Hi, @ToucheSir I've checked whether I could quickly fix this issue, and it seems that the choice of invoking |
If you can come up with a solution that doesn't require any additional dependencies (excluding CUDA.jl itself), we could add it to a block like Zygote.jl/src/lib/broadcast.jl Line 256 in a133200
|
I personally don't understand why we don't deserve the |
You'll have to bring that up on the CUDA.jl side, but my understanding is that they're not comfortable making |
Oh I'm talking about the actual upstream |
Well you've lost me, which indicates this should probably be taken as an upstream issue :) |
I guess one possible solution, if neither the Zygote nor the CR rules work with CUDA here, could also be to add a CR definition to CUDA for Of course, it would be nice if the definition in CR would also work (efficiently) on GPUs but I guess it's unavoidable that sometimes one has to specialize on the array type. |
Can you recheck @Red-Portal? This should be fixed on the master branch now that #1114 was merged. |
Uh, I'm still experiencing some issues. I'm unavailable next week, so I'll take a deeper look after that. |
@devmotion I checked again and seems to work. Don't know why I didn't get it right last time. Regardless, LGTM. Cheers for everyone who made this possible! |
Hi,
I've been attempting to differentiate through a Cholesky decomposition, which is common practice in Gaussian processes. The problem is that, the current adjoint for the Cholesky is hard-coded for the CPU version of
trsm!
.See the following minimal working example:
output:
A simple fix is to use the following snippete:
The two calls to
triu
are necessary for going around a performance bug in the matrix multiplication between two triangular matrices. I didn't pursue the cause further, but it seems that multiplying two triangular matrices on the GPU is like a 100 times slower than a simple matrix multiplication. Any thoughts on the reason for this?The text was updated successfully, but these errors were encountered: