Description
I am trying to understand a performance issue I am seeing in DynamicExpressions.jl where using @turbo
makes the evaluation kernels 4x faster, but makes the derivative kernels 10% slower. See the detailed benchmarks here: SymbolicML/DynamicExpressions.jl#28 (comment)
My derivative kernels look like this:
@maybe_turbo turbo for j in indices((cumulator, dcumulator))
x = op(cumulator[j])::T
dx = diff_op(cumulator[j])::T * dcumulator[j]
cumulator[j] = x
dcumulator[j] = dx
end
(The @maybe_turbo turbo ...
will turn into @turbo ...
when turbo=true
, but just @inbounds @simd ...
otherwise. It will also remove the various type assertions in the scope.)
To create the diff_op
, I generate it using Zygote.jl here:
for op in unary_operators
diff_op(x) = gradient(op, x)[1]
push!(diff_unary_operators, diff_op)
end
I can try to create a MWE for this, but I quickly wanted to check if anything was obvious in how I am using @turbo
here that might hurt performance rather than help it. For example, perhaps this diff_op
is not being inlined correctly, and therefore not being optimized by @turbo
? For the record I am not seeing any warnings about the derivative operator being incompatible, so I'm not quite sure why this is occurring.
Also - the diff_op
in the benchmark is the derivative of one of +, -, *, /, cos, exp
so nothing too crazy.