Skip to content

Performance issue with Zygote.jl-generated function within @turbo #489

Open
@MilesCranmer

Description

@MilesCranmer

I am trying to understand a performance issue I am seeing in DynamicExpressions.jl where using @turbo makes the evaluation kernels 4x faster, but makes the derivative kernels 10% slower. See the detailed benchmarks here: SymbolicML/DynamicExpressions.jl#28 (comment)

My derivative kernels look like this:

@maybe_turbo turbo for j in indices((cumulator, dcumulator))
    x = op(cumulator[j])::T
    dx = diff_op(cumulator[j])::T * dcumulator[j]

    cumulator[j] = x
    dcumulator[j] = dx
end

(The @maybe_turbo turbo ... will turn into @turbo ... when turbo=true, but just @inbounds @simd ... otherwise. It will also remove the various type assertions in the scope.)

To create the diff_op, I generate it using Zygote.jl here:

for op in unary_operators
    diff_op(x) = gradient(op, x)[1]
    push!(diff_unary_operators, diff_op)
end

I can try to create a MWE for this, but I quickly wanted to check if anything was obvious in how I am using @turbo here that might hurt performance rather than help it. For example, perhaps this diff_op is not being inlined correctly, and therefore not being optimized by @turbo? For the record I am not seeing any warnings about the derivative operator being incompatible, so I'm not quite sure why this is occurring.

Also - the diff_op in the benchmark is the derivative of one of +, -, *, /, cos, exp so nothing too crazy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions