Jacobian in loss function #953

dfenn · 2021-04-26T00:12:02Z

This seems similar to #820 and FluxML/Flux.jl#1464, but the suggestions there don't seem to help in my case.

I'm trying to use the Jacobian of a specific layer's activations with respect to the network input in order to calculate a penalty in my loss function. I'm not sure if the problem is that my implementation is naive, or if what I'm trying to do just isn't supported. I'm using Flux 0.12.2 and Zygote 0.6.10 with Julia 1.6.1.

Running the following MWE

using Flux
using ForwardDiff
using Zygote
using ReverseDiff

numfeatures = 4

model  = Chain(
        Dense(numfeatures, numfeatures, sigmoid),
        )

ps = Flux.params(model)
opt = ADAM(0.1)

function loss(x) 
    modelOutput = model(x)
    jac = ForwardDiff.jacobian(model, x)[1]
    reg = sum(jac)
    return reg
end

train_data = rand(numfeatures, 2)

for epoch in 1:10
    local trainLoss
    gs = Flux.gradient(ps) do
        trainLoss = loss(train_data)
        return trainLoss
    end
    Flux.Optimise.update!(opt, ps, gs)
    @show trainLoss
end

I get a few different outcomes, depending on which version of jacobian I call. Calling the Zygote and ForwardDiff methods results in the error ERROR: LoadError: Mutating arrays is not supported. Calling the ReverseDiff version gives me ERROR: LoadError: Can't differentiate foreigncall expression.

Should it be possible to get a Jacobian inside the loss function like this? If not, is there a better way to do it?

Thanks in advance--I appreciate any insight you can give me.

The text was updated successfully, but these errors were encountered:

mcabbott · 2021-04-26T21:38:31Z

Zygote's jacobian function isn't Zygote-differentiable. There was an alternative sketched in #865 which might be.

ForwardDiff's jacobian also won't directly be Zygote-differentiable. But you can use ForwardDiff twice, something like this:

julia> my_jacobian(x -> my_jacobian(x -> x.^3, x)[1], 1:3)[1]
3×9 Matrix{Float64}:
 6.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0   0.0
 0.0  0.0  0.0  0.0  12.0  0.0  0.0  0.0   0.0
 0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  18.0

julia> Zygote.jacobian(x -> Zygote.forwarddiff(z -> ForwardDiff.jacobian(x -> x.^3, z), x), 1:3)[1]
9×3 Matrix{Int64}:
 6   0   0
 0   0   0
 0   0   0
 0   0   0
 0  12   0
 0   0   0
 0   0   0
 0   0   0
 0   0  18

Something is slightly wrong there. Second derivatives are not really well-supported, but can sometimes be made to work; you will have to experiment a bit, and start small.

dfenn · 2021-05-16T05:54:32Z

Thanks for your helpful response. Working from your example, this seems to work

julia> Flux.gradient(x -> sum(Zygote.forwarddiff(z -> ForwardDiff.jacobian(model, z), x)), rand(5, 2))[1]
5×2 Matrix{Float64}:
 -0.00228453  -0.0116751
 -0.00126774  -0.00494717
  0.00138824   0.0168742
  0.0042229    0.0126724
 -0.00115428  -0.0195571

but modifying accordingly the loss function from the MWE above doesn't work.

function loss(x) 
    jac = Zygote.forwarddiff(z -> ForwardDiff.jacobian(model, z), x)
    reg = sum(jac)
    @show reg
    return reg
end

If I print out the gradients, it looks like they're missing:

julia> include("mwe.jl")
reg = 0.5245257543539097
gs[p] = nothing
gs[p] = nothing
reg = 0.5245257543539097
gs[p] = nothing
gs[p] = nothing

It's not clear to me why the first approach works, but the second doesn't.

mcabbott · 2021-05-16T21:47:16Z

I think what you're seeing is what Zygote.forwarddiff warns about, when it says "Note that the function f will drop gradients for any closed-over values". That is, you write ForwardDiff.jacobian(model, z) to compute the jacobian of the model with respect to z, but what I think you later ask for is the effect of parameters within model on the result. These aren't ever tracked by ForwardDiff.jl, which only "perturbs" its explicit input. But I agree it's disturbing if they aren't tracked within a Zygote.gradient call.

Smaller example, using Zygote#master which after #968 applies forwarddiff automatically:

julia> using Zygote, ForwardDiff

julia> W = rand(2); X = rand(2);

julia> G = gradient(Params([W,X])) do
         sum(ForwardDiff.gradient(x -> dot(x,W)^3, X))
       end
Grads(...)

julia> G[X]
2-element Vector{Float64}:
 9.779807803787289
 5.941561578834241

julia> G[W] === nothing
true

This is still true if you change it to call forward-over-reverse instead, although perhaps there is some context which ought to be passed to the inner call here, to inform it that we care about W?

julia> G2 = gradient(Params([W,X])) do
         sum(Zygote.forwarddiff(X -> Zygote.gradient(x -> dot(x,W)^3, X)[1], X))
       end
Grads(...)

julia> G2[X]
2-element Vector{Float64}:
 9.779807803787289
 5.941561578834241

julia> G2[W] === nothing
true

mcabbott mentioned this issue May 8, 2021

Adjoint for ForwardDiff.jacobian #968

Merged

mcabbott mentioned this issue May 17, 2021

Target ForwardDiff's public API? #769

Open

mcabbott added the second order zygote over zygote, or otherwise label Jul 4, 2022

This was referenced Jul 19, 2022

nested gradient with hessian #1264

Open

jacobian is not differentiable #1268

Open

emmanuellujan mentioned this issue Sep 30, 2022

Calculation of the gradient of a loss function that requires the calculation of the gradient of a Flux model. cesmix-mit/PotentialLearning.jl#24

Closed

MarkArdman mentioned this issue May 12, 2023

predict method auto-differentiation with Zygote.jl navimakarov/LaplaceRedux.jl#2

Closed

4 tasks

jarroyoe mentioned this issue May 25, 2023

Mutating arrays is not supported when trying to calculate the derivative of a function #1425

Closed

cortner mentioned this issue Jul 6, 2023

Adjoint of Adjoint with NamedTuples #1438

Closed

ToucheSir mentioned this issue Mar 5, 2024

Pullback over jacobian #1505

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jacobian in loss function #953

Jacobian in loss function #953

dfenn commented Apr 26, 2021

mcabbott commented Apr 26, 2021

dfenn commented May 16, 2021

mcabbott commented May 16, 2021 •

edited

Loading

Jacobian in loss function #953

Jacobian in loss function #953

Comments

dfenn commented Apr 26, 2021

mcabbott commented Apr 26, 2021

dfenn commented May 16, 2021

mcabbott commented May 16, 2021 • edited Loading

mcabbott commented May 16, 2021 •

edited

Loading