FastLDF vs Original But Optimised LDF #1118

penelopeysm · 2025-11-06T18:25:30Z

This PR combines the changes from #1113 and #1115. I don't expect this should be merged, but I think it's worth having somewhere to compare the two.

It appears to me that FastLDF is still faster by about 3x, although allocations are equivalent now (at least for evals; grads are quite different). I am actually quite unsure where this comes from because when @mhauru and I were talking about this over Zoom, we thought that we had reduced the perf difference to around 1.5x or so. (Perhaps I'm just confused.) I'd be very happy if someone could try to reproduce this, the code is all below.

Benchmarks are run on 1.11.7, with 1 thread, as before. Qualitatively I observe the same on 1.12.1.

Interestingly, the one model where the two approaches are almost equivalent time-wise is badvarnames. I wonder if that suggests that it's actually the NamedTuple optimisation doing a lot of the hard work here? I am too lazy to turn that optimisation off and rerun, but just throwing it out there, I guess.

using DynamicPPL, Distributions, LogDensityProblems, Chairmarks, LinearAlgebra
using ADTypes, ForwardDiff, ReverseDiff
@static if VERSION < v"1.12"
    using Enzyme, Mooncake
end

const adtypes = @static if VERSION < v"1.12"
    [
        ("FD", AutoForwardDiff()),
        ("RD", AutoReverseDiff()),
        ("MC", AutoMooncake()),
        ("EN" => AutoEnzyme(; mode=set_runtime_activity(Reverse), function_annotation=Const))
    ]
else
    [
        ("FD", AutoForwardDiff()),
        ("RD", AutoReverseDiff()),
    ]
end

function benchmark_ldfs(model; skip=Union{})
    vi = VarInfo(model)
    x = vi[:]
    ldf_no = DynamicPPL.LogDensityFunction(model, getlogjoint, vi)
    fldf_no = DynamicPPL.FastLDF(model, getlogjoint, vi)
    @assert LogDensityProblems.logdensity(ldf_no, x) ≈ LogDensityProblems.logdensity(fldf_no, x)
    print("LogDensityFunction: eval      ----  ")
    display(median(@be LogDensityProblems.logdensity(ldf_no, x)))
    print("           FastLDF: eval      ----  ")
    display(median(@be LogDensityProblems.logdensity(fldf_no, x)))
    for name_adtype in adtypes
        name, adtype = name_adtype
        adtype isa skip && continue
        ldf = DynamicPPL.LogDensityFunction(model, getlogjoint, vi; adtype=adtype)
        fldf = DynamicPPL.FastLDF(model, getlogjoint, vi; adtype=adtype)
        @assert LogDensityProblems.logdensity_and_gradient(ldf, x)[2] ≈ LogDensityProblems.logdensity_and_gradient(fldf, x)[2]
        print("LogDensityFunction: grad ($name) ----  ")
        display(median(@be LogDensityProblems.logdensity_and_gradient(ldf, x)))
        print("           FastLDF: grad ($name) ----  ")
        display(median(@be LogDensityProblems.logdensity_and_gradient(fldf, x)))
    end
end

@model f() = x ~ Normal()
benchmark_ldfs(f())
#=
LogDensityFunction: eval      ----  23.002 ns
           FastLDF: eval      ----  10.927 ns
LogDensityFunction: grad (FD) ----  180.833 ns (7 allocs: 272 bytes)
           FastLDF: grad (FD) ----  54.054 ns (3 allocs: 96 bytes)
LogDensityFunction: grad (RD) ----  3.875 μs (69 allocs: 2.688 KiB)
           FastLDF: grad (RD) ----  2.974 μs (46 allocs: 1.562 KiB)
LogDensityFunction: grad (MC) ----  723.308 ns (9 allocs: 736 bytes)
           FastLDF: grad (MC) ----  280.562 ns (4 allocs: 192 bytes)
LogDensityFunction: grad (EN) ----  271.429 ns (6 allocs: 240 bytes)
           FastLDF: grad (EN) ----  127.557 ns (2 allocs: 64 bytes)
=#

y = [28, 8, -3, 7, -1, 1, 18, 12]
sigma = [15, 10, 16, 11, 9, 11, 10, 18]
@model function eight_schools(y, sigma)
    mu ~ Normal(0, 5)
    tau ~ truncated(Cauchy(0, 5); lower=0)
    theta ~ MvNormal(fill(mu, length(y)), tau^2 * I)
    for i in eachindex(y)
        y[i] ~ Normal(theta[i], sigma[i])
    end
    return (mu=mu, tau=tau)
end
benchmark_ldfs(eight_schools(y, sigma))
#=
LogDensityFunction: eval      ----  469.627 ns (4 allocs: 256 bytes)
           FastLDF: eval      ----  163.824 ns (4 allocs: 256 bytes)
LogDensityFunction: grad (FD) ----  1.130 μs (11 allocs: 2.031 KiB)
           FastLDF: grad (FD) ----  691.964 ns (11 allocs: 2.594 KiB)
LogDensityFunction: grad (RD) ----  41.000 μs (595 allocs: 24.859 KiB)
           FastLDF: grad (RD) ----  38.542 μs (562 allocs: 20.562 KiB)
LogDensityFunction: grad (MC) ----  3.181 μs (17 allocs: 1.484 KiB)
           FastLDF: grad (MC) ----  1.226 μs (12 allocs: 784 bytes)
LogDensityFunction: grad (EN) ----  1.908 μs (24 allocs: 1.562 KiB)
           FastLDF: grad (EN) ----  721.875 ns (13 allocs: 832 bytes)
=#

@model function badvarnames()
    N = 20
    x = Vector{Float64}(undef, N)
    for i in 1:N
        x[i] ~ Normal()
    end
end
benchmark_ldfs(badvarnames())
#=
LogDensityFunction: eval      ----  487.776 ns (2 allocs: 224 bytes)
           FastLDF: eval      ----  458.338 ns (2 allocs: 224 bytes)
LogDensityFunction: grad (FD) ----  2.867 μs (15 allocs: 4.609 KiB)
           FastLDF: grad (FD) ----  2.489 μs (11 allocs: 4.281 KiB)
LogDensityFunction: grad (RD) ----  58.167 μs (949 allocs: 34.750 KiB)
           FastLDF: grad (RD) ----  51.083 μs (773 allocs: 27.438 KiB)
LogDensityFunction: grad (MC) ----  3.734 μs (13 allocs: 1.469 KiB)
           FastLDF: grad (MC) ----  2.109 μs (28 allocs: 1.094 KiB)
LogDensityFunction: grad (EN) ----  2.317 μs (18 allocs: 4.000 KiB)
           FastLDF: grad (EN) ----  1.583 μs (5 allocs: 2.047 KiB)
=#

@model function inner()
    m ~ Normal(0, 1)
    s ~ Exponential()
    return (m=m, s=s)
end
@model function withsubmodel()
    params ~ to_submodel(inner())
    y ~ Normal(params.m, params.s)
    1.0 ~ Normal(y)
end
benchmark_ldfs(withsubmodel())
#=
LogDensityFunction: eval      ----  312.725 ns
           FastLDF: eval      ----  93.570 ns
LogDensityFunction: grad (FD) ----  602.898 ns (7 allocs: 352 bytes)
           FastLDF: grad (FD) ----  179.568 ns (3 allocs: 112 bytes)
LogDensityFunction: grad (RD) ----  13.021 μs (186 allocs: 7.969 KiB)
           FastLDF: grad (RD) ----  10.979 μs (148 allocs: 5.188 KiB)
LogDensityFunction: grad (MC) ----  4.299 μs (27 allocs: 1.234 KiB)
           FastLDF: grad (MC) ----  628.711 ns (6 allocs: 240 bytes)
LogDensityFunction: grad (EN) ----  1.906 μs (26 allocs: 1.125 KiB)
           FastLDF: grad (EN) ----  329.068 ns (2 allocs: 80 bytes)
=#

@model function typeparam(::Type{T}=Vector{Float64}) where {T}
    x = T(undef, 1)
    x1 ~ Normal()
    x[1] = x1
end
benchmark_ldfs(typeparam())
#=
LogDensityFunction: eval      ----  49.686 ns (1 allocs: 32 bytes)
           FastLDF: eval      ----  36.047 ns (1 allocs: 32 bytes)
LogDensityFunction: grad (FD) ----  251.733 ns (8 allocs: 320 bytes)
           FastLDF: grad (FD) ----  68.739 ns (4 allocs: 144 bytes)
LogDensityFunction: grad (RD) ----  3.911 μs (70 allocs: 2.719 KiB)
           FastLDF: grad (RD) ----  3.000 μs (47 allocs: 1.594 KiB)
LogDensityFunction: grad (MC) ----  837.121 ns (13 allocs: 864 bytes)
           FastLDF: grad (MC) ----  318.085 ns (6 allocs: 256 bytes)
LogDensityFunction: grad (EN) ----  285.941 ns (8 allocs: 304 bytes)
           FastLDF: grad (EN) ----  142.832 ns (3 allocs: 96 bytes)
=#

This reverts commit 93daa2b.

github-actions · 2025-11-06T18:27:43Z

Benchmark Report for Commit `c44bae1`

Computer Information

Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

┌───────────────────────┬───────┬─────────────┬─────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │ VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼─────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │ forwarddiff │   typed │  false │            2.4 │             1.7 │
│           Smorgasbord │   201 │ forwarddiff │   typed │  false │          393.5 │            49.3 │
│           Smorgasbord │   201 │ reversediff │   typed │   true │          458.7 │            84.7 │
│           Smorgasbord │   201 │    mooncake │   typed │   true │          452.7 │             6.4 │
│    Loop univariate 1k │  1000 │    mooncake │   typed │   true │         2305.0 │             4.7 │
│       Multivariate 1k │  1000 │    mooncake │   typed │   true │          900.5 │             9.6 │
│   Loop univariate 10k │ 10000 │    mooncake │   typed │   true │        27132.9 │             4.3 │
│      Multivariate 10k │ 10000 │    mooncake │   typed │   true │         7865.5 │            10.7 │
│               Dynamic │    10 │    mooncake │   typed │   true │           98.4 │            17.5 │
│              Submodel │     1 │    mooncake │   typed │   true │            4.0 │             4.2 │
│                   LDA │    12 │ reversediff │   typed │   true │          847.8 │             8.3 │
└───────────────────────┴───────┴─────────────┴─────────┴────────┴────────────────┴─────────────────┘

penelopeysm and others added 25 commits November 5, 2025 23:58

Fast Log Density Function

7cddac7

Make it work with AD

5ed4295

Optimise performance for identity VarNames

e199520

Mark get_range_and_linked as having zero derivative

4cefaca

Update comment

6dfd106

Squeeze down VarInfo allocations

4ca9cf7

Remove old out-of-date comment

7c6e8c1

implement is_transformed(::VarNamedVector)

5c817a4

Handle errors in benchmark suite

93daa2b

make AD testing / benchmarking use FastLDF

41ee7f3

Fix tests

22e32a6

Optimise away make_evaluate_args_and_kwargs

79cc128

const func annotation

f7c6a78

Disable benchmarks on non-typed-Metadata-VarInfo

b1a7650

Fix _evaluate!! correctly to handle submodels

e60873a

Actually fix submodel evaluate

fa0664e

Document thoroughly and organise code

09a1fbb

Support more VarInfos, make it thread-safe (?)

7306ba4

fix bug in parsing ranges from metadata/VNV

53bccc1

Fix get_param_eltype for TSVI

30b9247

Disable Enzyme benchmark

316937a

Merge branch 'mhauru/no-allocs-allowed' into py/fastldf2

7fafc86

Revert "Handle errors in benchmark suite"

9c71e81

This reverts commit 93daa2b.

Don't override _evaluate!!, that breaks ForwardDiff (sometimes)

075cee8

Merge branch 'py/fastldf' into py/fastldf2

c44bae1

github-actions bot assigned penelopeysm Nov 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FastLDF vs Original But Optimised LDF #1118

FastLDF vs Original But Optimised LDF #1118

penelopeysm commented Nov 6, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 6, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

FastLDF vs Original But Optimised LDF #1118

Are you sure you want to change the base?

FastLDF vs Original But Optimised LDF #1118

Conversation

penelopeysm commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Report for Commit c44bae1

Computer Information

Benchmark Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

penelopeysm commented Nov 6, 2025 •

edited

Loading

github-actions bot commented Nov 6, 2025 •

edited

Loading

Benchmark Report for Commit `c44bae1`