-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue with Zygote.jl-generated function within @turbo
#489
Comments
If you only care about operators like Because Note that ChainRules are defined on I suggest poking around using |
Thanks for the quick and detailed response! I've been trying my best to understand the code you sent, but it is a bit outside of my wheelhouse and many of these calls are new to me. Here's a MWE I have: function f(x, op::F, turbo) where {F}
y = similar(x)
if turbo
@turbo for i in eachindex(x)
y[i] = op(x[i])
end
else
@inbounds @simd for i in eachindex(x)
y[i] = op(x[i])
end
end
return y
end we can see that julia> @btime f(x, op, turbo) setup=(x=randn(10000); turbo=true; op=cos);
19.250 μs (2 allocations: 78.17 KiB)
julia> @btime f(x, op, turbo) setup=(x=randn(10000); turbo=false; op=cos);
117.708 μs (2 allocations: 78.17 KiB) But when I pass in julia> @btime f(x, op, turbo) setup=(x=randn(10000); turbo=true; op=Base.Fix1(ForwardDiff.derivative, cos));
57.083 μs (2 allocations: 78.17 KiB)
julia> @btime f(x, op, turbo) setup=(x=randn(10000); turbo=false; op=Base.Fix1(ForwardDiff.derivative, cos));
143.625 μs (2 allocations: 78.17 KiB) Maybe this is not possible and I should try Enzyme.jl instead? |
I am trying to understand a performance issue I am seeing in DynamicExpressions.jl where using
@turbo
makes the evaluation kernels 4x faster, but makes the derivative kernels 10% slower. See the detailed benchmarks here: SymbolicML/DynamicExpressions.jl#28 (comment)My derivative kernels look like this:
(The
@maybe_turbo turbo ...
will turn into@turbo ...
whenturbo=true
, but just@inbounds @simd ...
otherwise. It will also remove the various type assertions in the scope.)To create the
diff_op
, I generate it using Zygote.jl here:I can try to create a MWE for this, but I quickly wanted to check if anything was obvious in how I am using
@turbo
here that might hurt performance rather than help it. For example, perhaps thisdiff_op
is not being inlined correctly, and therefore not being optimized by@turbo
? For the record I am not seeing any warnings about the derivative operator being incompatible, so I'm not quite sure why this is occurring.Also - the
diff_op
in the benchmark is the derivative of one of+, -, *, /, cos, exp
so nothing too crazy.The text was updated successfully, but these errors were encountered: