Skip to content

Conversation

@penelopeysm
Copy link
Member

@penelopeysm penelopeysm commented Nov 6, 2025

This PR combines the changes from #1113 and #1115. I don't expect this should be merged, but I think it's worth having somewhere to compare the two.

It appears to me that FastLDF is still faster by about 3x, although allocations are equivalent now (at least for evals; grads are quite different). I am actually quite unsure where this comes from because when @mhauru and I were talking about this over Zoom, we thought that we had reduced the perf difference to around 1.5x or so. (Perhaps I'm just confused.) I'd be very happy if someone could try to reproduce this, the code is all below.

Benchmarks are run on 1.11.7, with 1 thread, as before. Qualitatively I observe the same on 1.12.1.

Interestingly, the one model where the two approaches are almost equivalent time-wise is badvarnames. I wonder if that suggests that it's actually the NamedTuple optimisation doing a lot of the hard work here? I am too lazy to turn that optimisation off and rerun, but just throwing it out there, I guess.

using DynamicPPL, Distributions, LogDensityProblems, Chairmarks, LinearAlgebra
using ADTypes, ForwardDiff, ReverseDiff
@static if VERSION < v"1.12"
    using Enzyme, Mooncake
end

const adtypes = @static if VERSION < v"1.12"
    [
        ("FD", AutoForwardDiff()),
        ("RD", AutoReverseDiff()),
        ("MC", AutoMooncake()),
        ("EN" => AutoEnzyme(; mode=set_runtime_activity(Reverse), function_annotation=Const))
    ]
else
    [
        ("FD", AutoForwardDiff()),
        ("RD", AutoReverseDiff()),
    ]
end

function benchmark_ldfs(model; skip=Union{})
    vi = VarInfo(model)
    x = vi[:]
    ldf_no = DynamicPPL.LogDensityFunction(model, getlogjoint, vi)
    fldf_no = DynamicPPL.FastLDF(model, getlogjoint, vi)
    @assert LogDensityProblems.logdensity(ldf_no, x)  LogDensityProblems.logdensity(fldf_no, x)
    print("LogDensityFunction: eval      ----  ")
    display(median(@be LogDensityProblems.logdensity(ldf_no, x)))
    print("           FastLDF: eval      ----  ")
    display(median(@be LogDensityProblems.logdensity(fldf_no, x)))
    for name_adtype in adtypes
        name, adtype = name_adtype
        adtype isa skip && continue
        ldf = DynamicPPL.LogDensityFunction(model, getlogjoint, vi; adtype=adtype)
        fldf = DynamicPPL.FastLDF(model, getlogjoint, vi; adtype=adtype)
        @assert LogDensityProblems.logdensity_and_gradient(ldf, x)[2]  LogDensityProblems.logdensity_and_gradient(fldf, x)[2]
        print("LogDensityFunction: grad ($name) ----  ")
        display(median(@be LogDensityProblems.logdensity_and_gradient(ldf, x)))
        print("           FastLDF: grad ($name) ----  ")
        display(median(@be LogDensityProblems.logdensity_and_gradient(fldf, x)))
    end
end

@model f() = x ~ Normal()
benchmark_ldfs(f())
#=
LogDensityFunction: eval      ----  23.002 ns
           FastLDF: eval      ----  10.927 ns
LogDensityFunction: grad (FD) ----  180.833 ns (7 allocs: 272 bytes)
           FastLDF: grad (FD) ----  54.054 ns (3 allocs: 96 bytes)
LogDensityFunction: grad (RD) ----  3.875 μs (69 allocs: 2.688 KiB)
           FastLDF: grad (RD) ----  2.974 μs (46 allocs: 1.562 KiB)
LogDensityFunction: grad (MC) ----  723.308 ns (9 allocs: 736 bytes)
           FastLDF: grad (MC) ----  280.562 ns (4 allocs: 192 bytes)
LogDensityFunction: grad (EN) ----  271.429 ns (6 allocs: 240 bytes)
           FastLDF: grad (EN) ----  127.557 ns (2 allocs: 64 bytes)
=#

y = [28, 8, -3, 7, -1, 1, 18, 12]
sigma = [15, 10, 16, 11, 9, 11, 10, 18]
@model function eight_schools(y, sigma)
    mu ~ Normal(0, 5)
    tau ~ truncated(Cauchy(0, 5); lower=0)
    theta ~ MvNormal(fill(mu, length(y)), tau^2 * I)
    for i in eachindex(y)
        y[i] ~ Normal(theta[i], sigma[i])
    end
    return (mu=mu, tau=tau)
end
benchmark_ldfs(eight_schools(y, sigma))
#=
LogDensityFunction: eval      ----  469.627 ns (4 allocs: 256 bytes)
           FastLDF: eval      ----  163.824 ns (4 allocs: 256 bytes)
LogDensityFunction: grad (FD) ----  1.130 μs (11 allocs: 2.031 KiB)
           FastLDF: grad (FD) ----  691.964 ns (11 allocs: 2.594 KiB)
LogDensityFunction: grad (RD) ----  41.000 μs (595 allocs: 24.859 KiB)
           FastLDF: grad (RD) ----  38.542 μs (562 allocs: 20.562 KiB)
LogDensityFunction: grad (MC) ----  3.181 μs (17 allocs: 1.484 KiB)
           FastLDF: grad (MC) ----  1.226 μs (12 allocs: 784 bytes)
LogDensityFunction: grad (EN) ----  1.908 μs (24 allocs: 1.562 KiB)
           FastLDF: grad (EN) ----  721.875 ns (13 allocs: 832 bytes)
=#

@model function badvarnames()
    N = 20
    x = Vector{Float64}(undef, N)
    for i in 1:N
        x[i] ~ Normal()
    end
end
benchmark_ldfs(badvarnames())
#=
LogDensityFunction: eval      ----  487.776 ns (2 allocs: 224 bytes)
           FastLDF: eval      ----  458.338 ns (2 allocs: 224 bytes)
LogDensityFunction: grad (FD) ----  2.867 μs (15 allocs: 4.609 KiB)
           FastLDF: grad (FD) ----  2.489 μs (11 allocs: 4.281 KiB)
LogDensityFunction: grad (RD) ----  58.167 μs (949 allocs: 34.750 KiB)
           FastLDF: grad (RD) ----  51.083 μs (773 allocs: 27.438 KiB)
LogDensityFunction: grad (MC) ----  3.734 μs (13 allocs: 1.469 KiB)
           FastLDF: grad (MC) ----  2.109 μs (28 allocs: 1.094 KiB)
LogDensityFunction: grad (EN) ----  2.317 μs (18 allocs: 4.000 KiB)
           FastLDF: grad (EN) ----  1.583 μs (5 allocs: 2.047 KiB)
=#

@model function inner()
    m ~ Normal(0, 1)
    s ~ Exponential()
    return (m=m, s=s)
end
@model function withsubmodel()
    params ~ to_submodel(inner())
    y ~ Normal(params.m, params.s)
    1.0 ~ Normal(y)
end
benchmark_ldfs(withsubmodel())
#=
LogDensityFunction: eval      ----  312.725 ns
           FastLDF: eval      ----  93.570 ns
LogDensityFunction: grad (FD) ----  602.898 ns (7 allocs: 352 bytes)
           FastLDF: grad (FD) ----  179.568 ns (3 allocs: 112 bytes)
LogDensityFunction: grad (RD) ----  13.021 μs (186 allocs: 7.969 KiB)
           FastLDF: grad (RD) ----  10.979 μs (148 allocs: 5.188 KiB)
LogDensityFunction: grad (MC) ----  4.299 μs (27 allocs: 1.234 KiB)
           FastLDF: grad (MC) ----  628.711 ns (6 allocs: 240 bytes)
LogDensityFunction: grad (EN) ----  1.906 μs (26 allocs: 1.125 KiB)
           FastLDF: grad (EN) ----  329.068 ns (2 allocs: 80 bytes)
=#

@model function typeparam(::Type{T}=Vector{Float64}) where {T}
    x = T(undef, 1)
    x1 ~ Normal()
    x[1] = x1
end
benchmark_ldfs(typeparam())
#=
LogDensityFunction: eval      ----  49.686 ns (1 allocs: 32 bytes)
           FastLDF: eval      ----  36.047 ns (1 allocs: 32 bytes)
LogDensityFunction: grad (FD) ----  251.733 ns (8 allocs: 320 bytes)
           FastLDF: grad (FD) ----  68.739 ns (4 allocs: 144 bytes)
LogDensityFunction: grad (RD) ----  3.911 μs (70 allocs: 2.719 KiB)
           FastLDF: grad (RD) ----  3.000 μs (47 allocs: 1.594 KiB)
LogDensityFunction: grad (MC) ----  837.121 ns (13 allocs: 864 bytes)
           FastLDF: grad (MC) ----  318.085 ns (6 allocs: 256 bytes)
LogDensityFunction: grad (EN) ----  285.941 ns (8 allocs: 304 bytes)
           FastLDF: grad (EN) ----  142.832 ns (3 allocs: 96 bytes)
=#

@github-actions
Copy link
Contributor

github-actions bot commented Nov 6, 2025

Benchmark Report for Commit c44bae1

Computer Information

Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

┌───────────────────────┬───────┬─────────────┬─────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │ VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼─────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │ forwarddiff │   typed │  false │            2.4 │             1.7 │
│           Smorgasbord │   201 │ forwarddiff │   typed │  false │          393.5 │            49.3 │
│           Smorgasbord │   201 │ reversediff │   typed │   true │          458.7 │            84.7 │
│           Smorgasbord │   201 │    mooncake │   typed │   true │          452.7 │             6.4 │
│    Loop univariate 1k │  1000 │    mooncake │   typed │   true │         2305.0 │             4.7 │
│       Multivariate 1k │  1000 │    mooncake │   typed │   true │          900.5 │             9.6 │
│   Loop univariate 10k │ 10000 │    mooncake │   typed │   true │        27132.9 │             4.3 │
│      Multivariate 10k │ 10000 │    mooncake │   typed │   true │         7865.5 │            10.7 │
│               Dynamic │    10 │    mooncake │   typed │   true │           98.4 │            17.5 │
│              Submodel │     1 │    mooncake │   typed │   true │            4.0 │             4.2 │
│                   LDA │    12 │ reversediff │   typed │   true │          847.8 │             8.3 │
└───────────────────────┴───────┴─────────────┴─────────┴────────┴────────────────┴─────────────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants