-
-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjoints for functions of specialized matrices #402
Comments
I don't get the error you see, at least on Zygote v0.3.4. (I hit JuliaManifolds/Manifolds.jl#393 on master today.) EDIT: On Zygote v0.4.1 now, I do get the error. And my examples below also no longer work. (And looking at the source of 0.3.4, I don't see how they did work!) But here are some more compact examples of the behaviour on version 0.3.4. Gradients of julia> gradient(x -> x[1,1] + 10x[1,2], Diagonal(ones(2)))[1]
(diag = [1.0, 0.0],)
julia> gradient(x -> x[1,1] + 10x[1,2], Diagonal(ones(2,2)))[1]
(diag = [1.0, 0.0],)
julia> gradient(x -> x[1,1] + 10x[1,2], collect(Diagonal(ones(2))))[1]
2×2 Array{Float64,2}:
1.0 10.0
0.0 0.0 Gradients with julia> gradient(x -> x[1,1] + 10x[1,2] + 100x[2,1], Symmetric(ones(2,2)))[1]
(data = [1.0 110.0; 0.0 0.0], uplo = nothing)
julia> gradient(x -> x[1,1] + 10x[1,2] + 100x[2,1], collect(Symmetric(ones(2,2))))[1]
2×2 Array{Float64,2}:
1.0 10.0
100.0 0.0 This is what you call solution B I think. Which seems correct to me. For sparse matrices the current behaviour is more like what you call solution A, it's happy to return a dense matrix. Discussion here: #163 (comment) Unfortunately the above examples are broken by JuliaManifolds/Manifolds.jl#256, which fixes other getindex problems, so perhaps that needs to change: ∇getindex (generic function with 1 method)
julia> gradient(x -> x[1,1] + 10x[1,2], Diagonal(ones(2,2)))[1]
ERROR: ArgumentError: cannot set off-diagonal entry (1, 2) to a nonzero value (10.0)
julia> gradient(x -> x[1,1], Diagonal(ones(2,2)))[1]
2×2 Diagonal{Float64,Array{Float64,1}}:
1.0 ⋅
⋅ 0.0 |
Thanks for your comments! As you note, in the current code, the examples do not work. All of them would work if the scope of the All the examples you mention are solution A, with the exception of the very last one |
Ah I see, with See what you think of this: 73fdfa9 . With this approach, In fact |
Yes, this does the trick; thanks for writing this! I did not expect that this call to I propose to rename Although I have not had the time to test, I think that the code for _project(dx, x::Symmetric) = Symmetric((dx+transpose(dx))/2, Symbol(x.uplo)) or, equivalently, and more similar to the current code: _project(dx, x::Symmetric) = Symmetric(_symmetric_back(dx, x.uplo)/2, Symbol(x.uplo)) note the added |
Yea it’s not quite right yet. I think the examples above should work (and are included as tests) but they would be broken by what you suggest. However I don’t like this 6:
|
The issue is that the right-hand side of this test is not right: @test gradient(x -> x[1,1] + 10x[1,2] + 100x[2,1], Symmetric(ones(2,2)))[1] == Symmetric([1 110; 0 0]) Try this on 73fdfa9: function f(A)
As = Symmetric(A)
y = As[1,1]+10As[1,2]+100As[2,1]
return y
end
A = ones(2,2)
g = ngradient(f,A)[1]
gZ = Zygote.gradient(f,A)[1]
@test g ≈ gZ
#Test Failed:
# Expression: g ≈ gZ
# Evaluated: [1.0 110.0; 0.0 0.0] ≈ [1.0 220.0; 0.0 0.0] BTW, I was wrong with the
With this definition, the I like it that the current test checks the type of the return value with rng = MersenneTwister(348302)
n = 3
A = rand(rng,n,n)
@test gradcheck(f,A) BTW, with this out of the way, the following might be nice to add in the same spirit of returning special matrices: for AT in [:Diagonal, :LowerTriangular, :UpperTriangular,:Symmetric]
@adjoint collect(x::$AT) = collect(x), Δ -> (_project(Δ,x),)
end or perhaps even just widening the scope of the current @adjoint collect(x::AbstractArray) = collect(x), Δ -> (_project(Δ,x),) |
OK now thinking more clearly I agree that what’s needed is the projection operator. Done! |
Excellent! Thanks a lot! I see this is part of JuliaManifolds/Manifolds.jl#256. I agree this is done. I assume we'll close this issue once JuliaManifolds/Manifolds.jl#256 is merged to master? Two final comments (may be of interest to @MikeInnes, @sethaxen, too):
|
Yes, I thought about writing Things like |
That's right, for What we can do to experiment with this is to write the adjoint for |
I'm very interested in solutions for this problem. Solution B as proposed above sounds similar to some of the proposals we've made in extending AD to Riemannian manifolds for Manifolds.jl. These issues may be useful:
I'll hopefully have the time to read this carefully within the next few days. |
If it would be the case that using I have done some small experiments with and without projection, but could not produce a computational advantage yet. Maybe a more complicated example is needed. I am sharing it here in case others want to try it. The way the projection is used below is only for experimentation. If beneficial, it should be included in Zygote adjoints so that the user would not have to insert the projections. using Zygote
using BenchmarkTools
using Random
using LinearAlgebra
using Test
ℙ(X) = X
ℙ(::T,X) where {T<:AbstractArray} = X
ℙ(::Type{T},X) where {T<:Diagonal} = Diagonal(X)
ℙ(::Type{T},X) where {T<:Symmetric} = Symmetric((X+X')//2)
Zygote.@adjoint ℙ(X::T) where {T} = ℙ(X), Ȳ -> (ℙ(T,Ȳ),)
rng = MersenneTwister(2845493)
n = 20
## Test function
A = randn(rng,n,n)
D = Diagonal(randn(rng,n,n))
x = 2.0
f(x) = A*(x*D)
fℙ(x) = A*ℙ(x*D) # equal to `f`, since ℙ is the identity in the forward pass
@test f(x)≈fℙ(x)
## Adjoints
Ȳ = randn(rng,n,n)
Y,B = Zygote.pullback(f,x)
x̄ = B(Ȳ)[1]
Yℙ,Bℙ = Zygote.pullback(fℙ,x)
x̄ℙ = Bℙ(Ȳ)[1]
@test x̄≈x̄ℙ
## Timing
@btime B($Ȳ)
@btime Bℙ($Ȳ)
# result: similar timing for B and Bℙ |
No, if you delete Whether using |
If we are okay with unconstrained adjoint, then it is correct, in the sense that it provides the correct gradients when part of a composition of functions. Just try a function like Don't get me wrong, I think solution B, as you have implemented now, is great and mathematically the most satisfactory. I'm fine with proceeding with it! |
As mentioned previously, I am okay with working with solution B as initially implemented. However, as discussed in #256 (comment), it was decided not to use solution B. Therefore, my proposal is to, at least for now, use solution A, but now implemented simply using the definition of Lines 25 to 26 in 73fdfa9
So, to make it explicit, _project is not used, until the aforementioned discussion is completed.
I'd like to show that this approach does result in the correct gradients. I will use the first of @mcabbott's examles:
The gradient that is computed using the definition of using Zygote
using LinearAlgebra
Zygote._zero(xs::AbstractArray{<:Number}, T=float(eltype(xs))) =
fill!(similar(xs, T, size(xs)), false)
g = gradient(A -> A[1,2], Diagonal([1,1]))[1]
display(collect(g))
# 2×2 Array{Int64,2}:
# 0 1
# 0 0 The meaning of a gradient is that if we take an inner product with a disturbance Δx in the input, it returns the disturbance in the output. In this case, all disturbances Δx are diagonal matrices. The inner product of any diagonal matrix with the computed gradient is zero, and we obtain the correct result that the disturbance in the output is zero. So, the final result is identical to the result that we would have found after projection (solution B), which results in a zero gradient. function f2(x)
A = Diagonal(x)
return A[1,2]
end
g2 = gradient(f2,[1,1])[1]
display(collect(g2))
JuliaManifolds/Manifolds.jl#2-element Array{Int64,1}:
# 0
# 0 Which is the correct result. So, I propose to implement this small fix now. Instead of throwing an error on specialized matrices, the correct gradients are computed, which is progress! |
To clarify, this 3-arg But that aside, the proposal is to treat the zeros of Diagonal as being not structurally zero, just accidentally zero. This is how sparse arrays are currently treated, for example |
Thanks! I agree that ideally this As you say, Diagonal and sparse arrays are clearly different. When computing function values, Diagonal can only have diagonal elements. Only for the gradient computation, solution A uses the entire space of matrices. If there is a agreement that we want to use it, I can write the code for the new proposal. Just to be clear, the result of all gradient computations will be correct - not just in "some calculations"; otherwise it would not be a good idea to implement it. We can add a couple of cases to the test suite to demonstrate this. The examples that you gave earlier are useful for that. |
409: matrix exp(A) returns real-valued adjoint for real-valued A r=MikeInnes a=sdewaele The current adjoint for the matrix `exp(A)` where `A` is real-valued can return a complex-valued matrix. The complex components are only small numerical errors, close to machine precision. However, the fact that it is complex-valued leads to issues in the adjoint computation, see below for an example. This PR converts the return value to a real-valued array for real-valued arguments. BTW - This issue has some similarities to #402, in that 1) Having indexing in the test code revealed the issue. For that reason, it may be an idea to add a `gradcheck_getindex`, to complement the current `gradcheck`? Or a test of compatibility of the return value of the adjoint with the input type? 2) An alternative fix would be to allow the adjoint of the indexed array to be complex-valued. This is similar to allowing unconstrainted matrices as adjoints in #402. This would also require changes to the `getindex` adjoint. ```julia # Note: A randomly generated 8 × 8 also fails (almost?) always. Using this confirmed example to be sure. A = [ 0.0 1.0 0.0 0.0 0.0 1.0 -4.34 -18.31 -0.43] x = [1.0] f(A,x) = exp(A*x[1]) Y,Bf = Zygote.pullback(f,A,x) Ȳ = ones(3,3) Ā,x̄ = Bf(Ȳ) # fails ``` Error: ```julia ERROR: InexactError: Float64(14.999150171071268 + 9.678155693021492e-16im) Stacktrace: [1] Type at .\complex.jl:37 [inlined] [2] convert at .\number.jl:7 [inlined] [3] setindex!(::Array{Float64,1}, ::Complex{Float64}, ::Int64) at .\array.jl:767 [4] #980 at C:\Users\user\.julia\packages\Zygote\ycnjm\src\lib\array.jl:32 [inlined] [5] #2619#back at C:\Users\user\.julia\packages\ZygoteRules\6nssF\src\adjoint.jl:49 [inlined] [6] literal_getindex at C:\Users\user\.julia\packages\Zygote\ycnjm\src\lib\lib.jl:77 [inlined] [7] (::typeof(∂(Zygote.literal_getindex)))(::Complex{Float64}) at C:\Users\user\.julia\packages\Zygote\ycnjm\src\compiler\interface2.jl:0 [8] f at .\REPL[7]:1 [inlined] [9] (::typeof(∂(f)))(::Array{Float64,2}) at C:\Users\user\.julia\packages\Zygote\ycnjm\src\compiler\interface2.jl:0 [10] (::getfield(Zygote, Symbol("##28#29")){typeof(∂(f))})(::Array{Float64,2}) at C:\Users\user\.julia\packages\Zygote\ycnjm\src\compiler\interface.jl:38 [11] top-level scope at none:0 ``` Co-authored-by: Stijn de Waele <stijn.de.waele123@gmail.com> Co-authored-by: sdewaele <14310676+sdewaele@users.noreply.github.com>
I encountered an error with the adjoint of
getindex
of a specialmatrix. The issue is that the current
getindex
adjoints tries to assign to an element in the special matrix, which is not always allowed. It points to a more general problem/opportunity for adjoints of function of special matrices. UsingDiagonal
as an example, this is the error:Two potential possible are:
Solution A : adjoint is a tuple with an unconstrained array
Pro: Simple.
Con: The adjoint does not return a
Diagonal
. This reduces the efficiency of the adjoint code. This can perhaps be a fallback, as we implement solution B.Solution B : Insert orthogonal projection and return a special matrix
Pro: Returns a special matrix, so the adjoint code becomes more efficient.
Con: Some work to write the adjoint.
Both solutions provide the same gradients:
Mathematical background solution B
Suppose we have a special matrix S ∈ 𝕊. For example, for
Diagonal
, 𝕊 is the space of diagonal matrices. 𝕊 is a subspace of the space of the regular matrices ℤ: 𝕊 ⊂ ℤ. The original function maps from 𝕊 to 𝕐:Y = f(S) : 𝕊 → 𝕐
For example, for
getindex(S,i::Int,j::Int)
, 𝕐 is scalar, so 𝕐 = ℝ. If we take the adjoint through the the code off
, we currently get a unconstrained matrix S̄ ∈ ℝ, as shown above forgetindex
. To ensure that S̄ ∈ 𝕊, we insert a orthogonal projection P from ℤ to 𝕊:S = P(Z) : ℤ → 𝕊
Note that:
Instead of computing the adjoint for f, we compute it for a function g
g = f∘P
which is:
g' = P∘f'
Because of property 1, g=f for all special matrix S ∈ 𝕊.
This methodology applies to, among others:
Diagonal
,Symmetric
,UpperDiagonal
,LowerDiagonal
,I
(UniformScaling)(?). Often, P is simply the constructor (Diagonal
,UpperDiagonal
,LowerDiagonal
). For Symmetric, it isP(A) = (A+transpose(A))/2
Beyond
getindex
In general, this approach could be used for adjoints of functions mapping from the constrained to the unconstrained space, where the adjoint is not guaranteed to map to 𝕊, notably
collect
andArray()
. As withgetindex
, specializing the current adjoint toArray
will probably give you solution A for these. It should be noted that the adjoint for these functions do work now, unlikegetindex
, so coding these adjoints is less urgent.Open questions
getindex
to apply only toArray
instead ofAbstractArray
? (I think is was like this in the past). This would have the benefit that solution A is automatically used for special matrices. But I don't know what are the repercussions of this. With the current signature of thegetindex
adjoint, I don't see how I can circumvent it to autodiff through the originalgetindex
code in solution B, apart from creating a duplicate, which is of course a undesirable.If there is an interest in this, I am happy to start a branch to start implementing some of this. I would appreciate guidance and help from others to get it right!
The text was updated successfully, but these errors were encountered: