-
Notifications
You must be signed in to change notification settings - Fork 73
Description
I am trying to understand a performance issue I am seeing in DynamicExpressions.jl where using @turbo makes the evaluation kernels 4x faster, but makes the derivative kernels 10% slower. See the detailed benchmarks here: SymbolicML/DynamicExpressions.jl#28 (comment)
My derivative kernels look like this:
@maybe_turbo turbo for j in indices((cumulator, dcumulator))
x = op(cumulator[j])::T
dx = diff_op(cumulator[j])::T * dcumulator[j]
cumulator[j] = x
dcumulator[j] = dx
end(The @maybe_turbo turbo ... will turn into @turbo ... when turbo=true, but just @inbounds @simd ... otherwise. It will also remove the various type assertions in the scope.)
To create the diff_op, I generate it using Zygote.jl here:
for op in unary_operators
diff_op(x) = gradient(op, x)[1]
push!(diff_unary_operators, diff_op)
endI can try to create a MWE for this, but I quickly wanted to check if anything was obvious in how I am using @turbo here that might hurt performance rather than help it. For example, perhaps this diff_op is not being inlined correctly, and therefore not being optimized by @turbo? For the record I am not seeing any warnings about the derivative operator being incompatible, so I'm not quite sure why this is occurring.
Also - the diff_op in the benchmark is the derivative of one of +, -, *, /, cos, exp so nothing too crazy.