-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression in RxInfer code on current master #50704
Comments
Thanks for the report and the nice reproducers. The GC heuristics were significantly changed in #50144 so it isn't too surprising that there might be cases where it needs to be further refined. Putting this on the milestone for now so it isn't forgotten at least. |
Thanks, glad to help! Since the reproducers are not MWE, to give you an idea what might be happening, I think one potential problem here is that the program (as a part of the algorithm) allocates A LOT of small and short-lived objects. The algorithm, on the high-level, involves creating a sparse graph and passing messages between the nodes in the graph. These messages are needed only for a short period of time. In the benchmark example the messages are encoded as the As a side note, I could not run the benchmark on |
So I ran your code (1.10 + #50682) and I see a wall time regression, but not necessarily a GC, because max GC time might not be the best way to quantify it. Specifically because if in one case we do 100 runs and only in one of the runs we do a collection but it's a big collection, we might see 90% max GC usage, but median/mean might be lower. The more concerning thing is that there seems to be a regression overall in the performance of the code that I'm not sure if it's just GC related. 1.9 julia> @benchmark run_filtering(datastream, n, $v)
BenchmarkTools.Trial: 2421 samples with 1 evaluation.
Range (min … max): 1.847 ms … 5.287 ms ┊ GC (min … max): 0.00% … 60.24%
Time (median): 1.906 ms ┊ GC (median): 0.00%
Time (mean ± σ): 2.064 ms ± 612.807 μs ┊ GC (mean ± σ): 6.81% ± 12.77%
▃█
██▅▃▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▃▂ ▂
1.85 ms Histogram: frequency by time 4.66 ms <
Memory estimate: 2.35 MiB, allocs estimate: 63824.
julia> @benchmark run_smoothing(data, n, $v)
BenchmarkTools.Trial: 476 samples with 1 evaluation.
Range (min … max): 8.658 ms … 28.003 ms ┊ GC (min … max): 0.00% … 29.64%
Time (median): 9.164 ms ┊ GC (median): 0.00%
Time (mean ± σ): 10.508 ms ± 2.647 ms ┊ GC (mean ± σ): 10.46% ± 13.81%
▆█▅▄▄▂▁ ▁▁
███████▇███▇▆▄▅▁▅▁▄▁▁▄▄▄▄▇▇▇▆▇███▇█▇█▇█▆▇▇▅▅▆▄▅▄▅▁▁▁▁▁▁▄▁▄▄ ▇
8.66 ms Histogram: log(frequency) by time 18 ms <
Memory estimate: 10.05 MiB, allocs estimate: 220418. 1.10 + that PR julia> @benchmark run_filtering(datastream, n, $v)
BenchmarkTools.Trial: 1642 samples with 1 evaluation.
Range (min … max): 2.746 ms … 23.783 ms ┊ GC (min … max): 0.00% … 87.46%
Time (median): 2.908 ms ┊ GC (median): 0.00%
Time (mean ± σ): 3.044 ms ± 1.398 ms ┊ GC (mean ± σ): 4.46% ± 8.07%
▃▆▅▁▆▂▄▄▃▅▄▇▆█▅▇▆▆▇██▅▆▂▅▂
▃▇███████████████████████████▇▇▇▆▅▄▃▃▃▂▂▃▂▂▁▂▁▂▂▁▁▁▁▁▁▁▁▁▂ ▅
2.75 ms Histogram: frequency by time 3.3 ms <
Memory estimate: 2.51 MiB, allocs estimate: 69825.
julia> @benchmark run_smoothing(data, n, $v)
BenchmarkTools.Trial: 416 samples with 1 evaluation.
Range (min … max): 10.883 ms … 33.700 ms ┊ GC (min … max): 0.00% … 65.24%
Time (median): 11.343 ms ┊ GC (median): 0.00%
Time (mean ± σ): 12.041 ms ± 3.190 ms ┊ GC (mean ± σ): 4.34% ± 10.21%
█▇█
███▆▁▁▁▄▆▇▄▁▄▄▁▁▁▁▁▁▄▁▄▁▄▄▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▅▄▇▄ ▆
10.9 ms Histogram: log(frequency) by time 28 ms <
Memory estimate: 10.12 MiB, allocs estimate: 222916. So the mean GC went slightly down. But notice how the minimum time went up meaning that there is another regression somewhere, specially because on the minimum run the GC didn't run at all. |
@gbaraldi I agree with you that it could be some sort of other regression or a combination of regressions. I tend not to use mean statistics for comparison, specifically because they can be affected by background processes on a computer. I also do not immediately understand how to compare the mean statistics in percentages (for GC) because the base (wall time) is different. It could the case that the mean in percentage is lower because the absolute wall time is bigger. For example for the filtering case, GC mean is just exactly the same with only small improvement is you account for the wall time:
But the mean values in benchmarks are tricky anyway. I tend to compare the min/max values instead, I recall an article about that but I've read it long ago and don't really have a reference but the rough idea is:
I don't want to say that comparing mean/median is always a bad idea. I just want to say that it is safer to compare min/max values in majority of the cases, unless you invest much more time in proper analysis. The max GC time is very consistent across different platforms and computers (I checked my 2 computers several times + plus your benchmark). It does not look to me as an outlier. But you are also correct saying that GC can be only one part of the story. I checked the benchmarks with GC disabled. These are my results: GC.enable(false)
benchmark = @benchmark run_filtering($datastream, $n, $v) samples=100
GC.enable(true)
GC.gc(true)
benchmark # 1.9
BenchmarkTools.Trial: 100 samples with 1 evaluation.
Range (min … max): 1.829 ms … 3.831 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.064 ms ┊ GC (median): 0.00%
Time (mean ± σ): 2.106 ms ± 271.007 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▂▃ ▇▅█ ▃▅▂▄▂
▅▆███▃███▅█████▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▃ ▃
1.83 ms Histogram: frequency by time 3.44 ms <
Memory estimate: 2.35 MiB, allocs estimate: 63823.
# 1.10-beta1
BenchmarkTools.Trial: 100 samples with 1 evaluation.
Range (min … max): 2.568 ms … 4.568 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.749 ms ┊ GC (median): 0.00%
Time (mean ± σ): 2.796 ms ± 319.726 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▁ ▁▄█▆▃
███▅█████▆▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃▁▁▁▃ ▃
2.57 ms Histogram: frequency by time 4.31 ms <
Memory estimate: 2.51 MiB, allocs estimate: 69824. GC.enable(false)
benchmark = @benchmark run_smoothing($data, $n, $v) samples=100
GC.enable(true)
GC.gc(true)
benchmark # 1.9
BenchmarkTools.Trial: 100 samples with 1 evaluation.
Range (min … max): 8.132 ms … 11.818 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 8.832 ms ┊ GC (median): 0.00%
Time (mean ± σ): 8.997 ms ± 758.615 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ █ ▂ ▃
█▅▇█▃▃▁▅▆███▆█▄▇▄▄▃▃▁▃▆▃▅▁▁▁▁▁▃▁▁▁▁▃▁▁▁▁▃▄▁▃▁▃▁▁▁▃▁▁▁▁▁▁▁▁▃ ▃
8.13 ms Histogram: frequency by time 11.8 ms <
Memory estimate: 10.05 MiB, allocs estimate: 220417.
# 1.10-beta1
BenchmarkTools.Trial: 100 samples with 1 evaluation.
Range (min … max): 9.918 ms … 12.661 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 10.534 ms ┊ GC (median): 0.00%
Time (mean ± σ): 10.644 ms ± 460.803 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▄█▅▃▃
▅▁▁▅▃▅▁▅▆▇▇██████▆▅█▇▆▆▃▁▅▁▁▁▁▁▁▁▃▁▁▁▃▁▃▃▃▁▁▅▁▃▁▁▁▁▁▁▁▁▁▁▁▁▃ ▃
9.92 ms Histogram: frequency by time 12.4 ms <
Memory estimate: 10.12 MiB, allocs estimate: 222915. So yes, there is some sort of regression, which looks like it is further escalated by the GC timings. Something happens which triggers GC to spend even more time in the worst case scenario. But I'm not sure how to narrow it down further. The output of the |
The GC might decide to coalesce the garbage collections to bring down overall time while having potentially longer pauses. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I hid those comments because they actually belong to #50859 |
Based on the discussion in #50704 (comment) I don't think this should really be on the milestone. It seems to be a fairly marginal performance regression and it doesn't really have any analysis of what has gotten slower. I'll remove it from the milestone for now, depending on results from some extra profiling work it could potentially be added back. (Also, the link to the repro no longer works) |
Ok, just in case there is a permanent link to the repro. |
@KristofferC The performance regression is pretty hard to bisect. It is not very consistent between runs and pretty much heavily depends on the percentage of GC spent during the run. I did bisect, however, the increase in number of allocations and total size of allocations to 51e3bc3 , but I'm not sure what this commit did and how it could cause regression in our case. |
@KristofferC
BenchmarkTools.Trial: 2694 samples with 1 evaluation.
Range (min … max): 1.671 ms … 3.392 ms ┊ GC (min … max): 0.00% … 46.05%
Time (median): 1.777 ms ┊ GC (median): 0.00%
Time (mean ± σ): 1.855 ms ± 323.736 μs ┊ GC (mean ± σ): 4.24% ± 10.19%
▂▇██▇█▇▂ ▁ ▁ ▁
████████▄▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅███▇▇████ █
1.67 ms Histogram: log(frequency) by time 3.27 ms <
Memory estimate: 2.46 MiB, allocs estimate: 67408.
BenchmarkTools.Trial: 582 samples with 1 evaluation.
Range (min … max): 7.600 ms … 12.065 ms ┊ GC (min … max): 0.00% … 32.69%
Time (median): 7.926 ms ┊ GC (median): 0.00%
Time (mean ± σ): 8.589 ms ± 1.282 ms ┊ GC (mean ± σ): 7.81% ± 11.24%
▇█▇▅▂
▄███████▆▇▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆▅▅▆▆▇▆▇▇▇█▅▇█▇▅▇▇▆▇█▇▅█▆▆▅▄▆▇▇▇ ▇
7.6 ms Histogram: log(frequency) by time 11.9 ms <
Memory estimate: 10.39 MiB, allocs estimate: 228968.
BenchmarkTools.Trial: 2277 samples with 1 evaluation.
Range (min … max): 1.939 ms … 10.002 ms ┊ GC (min … max): 0.00% … 79.32%
Time (median): 2.090 ms ┊ GC (median): 0.00%
Time (mean ± σ): 2.195 ms ± 748.385 μs ┊ GC (mean ± σ): 5.23% ± 10.61%
▅█▆
████▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▅▅▆▇ █
1.94 ms Histogram: log(frequency) by time 6.91 ms <
Memory estimate: 2.27 MiB, allocs estimate: 61943.
BenchmarkTools.Trial: 43 samples with 1 evaluation.
Range (min … max): 115.886 ms … 128.093 ms ┊ GC (min … max): 0.00% … 8.04%
Time (median): 117.313 ms ┊ GC (median): 0.00%
Time (mean ± σ): 117.835 ms ± 2.359 ms ┊ GC (mean ± σ): 0.37% ± 1.58%
▂▅██▂ ▅▂
█████▁▅█████▅▁▁▅▅▅▁▅▁▁▅▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁
116 ms Histogram: frequency by time 128 ms <
Memory estimate: 10.16 MiB, allocs estimate: 234019.
|
The script I used for testing is this Scriptusing RxInfer, BenchmarkTools, Random
seed = 123
rng = MersenneTwister(seed)
v = 100.0
n = 500
hidden = collect(1:n)
data = hidden + rand(rng, Normal(0.0, sqrt(v)), n);
datastream = from(data) |> map(NamedTuple{(:y, ), Tuple{Float64}}, (d) -> (y = d, ));
@model function filtering(; c::ConstVariable, v::ConstVariable)
x_prior_mean = datavar(Float64)
x_prior_var = datavar(Float64)
x_prior ~ Normal(mean = x_prior_mean, variance = x_prior_var)
x_next ~ x_prior + c
y = datavar(Float64)
y ~ Normal(mean = x_next, variance = v)
end
function run_filtering(datastream, n, v)
autoupdates = @autoupdates begin
x_prior_mean, x_prior_var = mean_var(q(x_next))
end
return infer(
model = filtering(c = 1.0, v = v),
datastream = datastream,
autoupdates = autoupdates,
initmarginals = (x_next = NormalMeanVariance(0.0, 10000.0), ),
keephistory = n,
historyvars = (x_next = KeepLast(), ),
free_energy = true,
autostart = true,
postprocess = UnpackMarginalPostprocess()
)
end
result_filtering = run_filtering(datastream, n, v)
@model function smoothing(n; c::ConstVariable, v::ConstVariable)
x_prior ~ Normal(mean = 0.0, variance = 10000.0)
x = randomvar(n)
y = datavar(Float64, n)
x_prev = x_prior
for i in 1:n
x[i] ~ x_prev + c
y[i] ~ Normal(mean = x[i], variance = v)
x_prev = x[i]
end
return x, y
end
function run_smoothing(data, n, v)
return infer(
model = smoothing(n, c = 1.0, v = v),
data = (y = data, ),
returnvars = KeepLast(),
free_energy = true,
postprocess = UnpackMarginalPostprocess()
)
end
result_smoothing = run_smoothing(data, n, v)
benchmark1 = @benchmark run_filtering($datastream, $n, $v)
benchmark2 = @benchmark run_smoothing($data, $n, $v)
show(benchmark1)
show(benchmark2)
display(benchmark1)
display(benchmark2) |
Reduced the example, seems like the problem is not during the execution of the algorithm, but during the model creation. From that perspective the model creation for both models have regressed by about a factor
→ julia --project=. ~/.julia/dev/julia/script.jl
Trial(21.125 μs)Trial(3.653 ms)BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 21.125 μs … 1.522 ms ┊ GC (min … max): 0.00% … 94.81%
Time (median): 23.375 μs ┊ GC (median): 0.00%
Time (mean ± σ): 24.275 μs ± 29.652 μs ┊ GC (mean ± σ): 2.37% ± 1.91%
▃▃▇▇█▃▃▁
▁▁▁▁▁▁▁▁▂▆████████▆▄▄▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
21.1 μs Histogram: frequency by time 30.3 μs <
Memory estimate: 18.21 KiB, allocs estimate: 433.
BenchmarkTools.Trial: 1274 samples with 1 evaluation.
Range (min … max): 3.653 ms … 7.132 ms ┊ GC (min … max): 0.00% … 36.47%
Time (median): 3.721 ms ┊ GC (median): 0.00%
Time (mean ± σ): 3.924 ms ± 547.249 μs ┊ GC (mean ± σ): 3.79% ± 8.58%
▂█▅▃▁▁
██████▇▆▇▆▅▅▄▅▃▅▅▃▄▃▃▁▄▁▃▃▁▅▃▁▃▃▁▅▅▁▅▅▆▆▅▆▅▆▆▄▆▆▇▇█▆▆▄▅▄▃▁▄ ▇
3.65 ms Histogram: log(frequency) by time 5.83 ms <
Memory estimate: 4.05 MiB, allocs estimate: 103123
→ ./julia --startup-file=no script.jl
Trial(384.250 μs)Trial(119.892 ms)BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 384.250 μs … 4.570 ms ┊ GC (min … max): 0.00% … 88.93%
Time (median): 420.312 μs ┊ GC (median): 0.00%
Time (mean ± σ): 448.273 μs ± 97.839 μs ┊ GC (mean ± σ): 0.09% ± 0.89%
▃█▇▅▃▇▆▃ ▂
▄▄██████████▇▇▆▆▇▇▆▅▆▅▄▅▆▄▅▅▄▄▄▅▄▆▄▄▅▂▅▃▄▄▅▄▄▄▇████▇▇▇▇▇▆▆▆▆ █
384 μs Histogram: log(frequency) by time 849 μs <
Memory estimate: 16.95 KiB, allocs estimate: 412.
BenchmarkTools.Trial: 38 samples with 1 evaluation.
Range (min … max): 119.892 ms … 150.380 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 134.175 ms ┊ GC (median): 0.00%
Time (mean ± σ): 134.351 ms ± 8.712 ms ┊ GC (mean ± σ): 0.10% ± 0.66%
█ ▃ ▃ █ ▃ ▃
▇▇▇▇▇▁▁▁▁▁▇▁▇▇▁▇█▇▇▇▁▇▇▁▁▁▇▁▇█▇▁▁▁█▁▁▁▁▁▇▇▇█▁▁█▇▁▇▁▁▁▁▁█▇▁▁▁▇ ▁
120 ms Histogram: frequency by time 150 ms <
Memory estimate: 3.83 MiB, allocs estimate: 99130. script.jlusing RxInfer, BenchmarkTools, Random
@model function filtering(; c::ConstVariable, v::ConstVariable)
x_prior_mean = datavar(Float64)
x_prior_var = datavar(Float64)
x_prior ~ Normal(mean = x_prior_mean, variance = x_prior_var)
x_next ~ x_prior + c
y = datavar(Float64)
y ~ Normal(mean = x_next, variance = v)
end
function run_filtering()
return RxInfer.create_model(filtering(c = 1.0, v = 1.0))
end
@model function smoothing(n; c::ConstVariable, v::ConstVariable)
x_prior ~ Normal(mean = 0.0, variance = 10000.0)
x = randomvar(n)
y = datavar(Float64, n)
x_prev = x_prior
for i in 1:n
x[i] ~ x_prev + c
y[i] ~ Normal(mean = x[i], variance = v)
x_prev = x[i]
end
return x, y
end
function run_smoothing(n)
return RxInfer.create_model(smoothing(n, c = 1.0, v = 1.0))
end
benchmark1 = @benchmark run_filtering()
benchmark2 = @benchmark run_smoothing(500)
show(benchmark1)
show(benchmark2)
display(benchmark1)
display(benchmark2) |
I revisited this. RxInfer.jl had a new major release last week ( script.jlusing RxInfer, BenchmarkTools, Randomseed = 123 rng = MersenneTwister(seed) v = 100.0 n = 500 hidden = collect(1:n) @model function filtering(y, x_prior_mean, x_prior_var, c, v) @model function smoothing(y, c, v)
end function filtering_model(y) function smoothing_model(y) filtering_model(data) benchmark1 = @benchmark filtering_model($data) println("Filtering model creation") function run_filtering(datastream, n, v)
end function run_smoothing(data, n, v) run_filtering(datastream, n, v) benchmark3 = @benchmark run_filtering($datastream, $n, $v) println("Filtering inference") Good news are that the performance is actually better on version Benchmark comparison between `1.10.2` and `1.11-beta1`Here are the results for → julia +release --startup-file=no --project=julia1.10 julia1.10/script.jl
Filtering model creation
BenchmarkTools.Trial: 10000 samples with 4 evaluations.
Range (min … max): 7.271 μs … 539.958 μs ┊ GC (min … max): 0.00% … 96.43%
Time (median): 7.615 μs ┊ GC (median): 0.00%
Time (mean ± σ): 8.163 μs ± 12.567 μs ┊ GC (mean ± σ): 6.04% ± 3.94%
▅▂▃█▃▁▄
▁▁▁▁▁▂▂▄▆▇███████▇▇▄▄▄▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
7.27 μs Histogram: frequency by time 8.73 μs <
Memory estimate: 19.81 KiB, allocs estimate: 360.
Smoothing model creation
BenchmarkTools.Trial: 1650 samples with 1 evaluation.
Range (min … max): 2.752 ms … 6.942 ms ┊ GC (min … max): 0.00% … 54.15%
Time (median): 2.836 ms ┊ GC (median): 0.00%
Time (mean ± σ): 3.031 ms ± 509.820 μs ┊ GC (mean ± σ): 5.80% ± 10.60%
▃▇█▆▄▁
███████▇▆▄▅▆▄▄▄▄▁▁▁▄▄▁▁▁▁▁▄▄▁▄▅▄▁▄▄▇▆▆▇▇█▇▇▆▇█▇▇██▇█▁▇▆▅▆▅▆ █
2.75 ms Histogram: log(frequency) by time 4.69 ms <
Memory estimate: 5.69 MiB, allocs estimate: 114772.
Filtering inference
BenchmarkTools.Trial: 2588 samples with 1 evaluation.
Range (min … max): 1.848 ms … 3.529 ms ┊ GC (min … max): 0.00% … 44.64%
Time (median): 1.886 ms ┊ GC (median): 0.00%
Time (mean ± σ): 1.931 ms ± 238.478 μs ┊ GC (mean ± σ): 1.94% ± 6.81%
▇█▆▅
█████▇▆▄▄▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▆█▇ █
1.85 ms Histogram: log(frequency) by time 3.4 ms <
Memory estimate: 1.12 MiB, allocs estimate: 32633.
Smoothing inference
BenchmarkTools.Trial: 175 samples with 1 evaluation.
Range (min … max): 24.887 ms … 104.396 ms ┊ GC (min … max): 0.00% … 75.94%
Time (median): 27.753 ms ┊ GC (median): 9.27%
Time (mean ± σ): 28.711 ms ± 8.320 ms ┊ GC (mean ± σ): 11.20% ± 10.16%
▃█▇ ▄▁▁
▃▇████▃▃▁▃▁▄▅▄███▅▅▃▃▃▅▄▃▅▃▄▄▃▅▄▇▅▄▄▁▁▃▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▃
24.9 ms Histogram: frequency by time 36.5 ms <
Memory estimate: 26.59 MiB, allocs estimate: 613635. And here are the results for → julia +1.11 --startup-file=no --project=julia1.11 julia1.11/script.jl
Filtering model creation
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
Range (min … max): 6.808 μs … 1.758 ms ┊ GC (min … max): 0.00% … 98.98%
Time (median): 7.133 μs ┊ GC (median): 0.00%
Time (mean ± σ): 8.006 μs ± 24.727 μs ┊ GC (mean ± σ): 8.76% ± 3.37%
▅▆▇█▁
▁▁▂▃▆██████▆▄▄▃▃▂▂▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
6.81 μs Histogram: frequency by time 8.79 μs <
Memory estimate: 19.81 KiB, allocs estimate: 413.
Smoothing model creation
BenchmarkTools.Trial: 1632 samples with 1 evaluation.
Range (min … max): 2.770 ms … 7.357 ms ┊ GC (min … max): 0.00% … 58.78%
Time (median): 2.843 ms ┊ GC (median): 0.00%
Time (mean ± σ): 3.064 ms ± 608.889 μs ┊ GC (mean ± σ): 6.98% ± 12.24%
▅█▇▂
█████▇▆▁▄▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃▃▁▄▃▅▅▄▅▆▇▆▆▅▆▅█▆█▇▇█▆▆▆▇▄▇▅▆▅▅▆ █
2.77 ms Histogram: log(frequency) by time 5.09 ms <
Memory estimate: 5.75 MiB, allocs estimate: 125717.
Filtering inference
BenchmarkTools.Trial: 3090 samples with 1 evaluation.
Range (min … max): 1.506 ms … 14.943 ms ┊ GC (min … max): 0.00% … 88.95%
Time (median): 1.568 ms ┊ GC (median): 0.00%
Time (mean ± σ): 1.618 ms ± 620.070 μs ┊ GC (mean ± σ): 3.08% ± 6.71%
▁▂▄▆▅▅▄▂▁▂▂▃▄▄▇█▇▅▃
▂▃▃▅████████████████████▇▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▁▁▂▂▂▂▂▂▁▂▂▁▂▂▂▂▂▂ ▄
1.51 ms Histogram: frequency by time 1.73 ms <
Memory estimate: 1.07 MiB, allocs estimate: 30647.
Smoothing inference
BenchmarkTools.Trial: 227 samples with 1 evaluation.
Range (min … max): 20.095 ms … 39.775 ms ┊ GC (min … max): 0.00% … 45.73%
Time (median): 20.692 ms ┊ GC (median): 0.00%
Time (mean ± σ): 22.080 ms ± 3.530 ms ┊ GC (mean ± σ): 6.15% ± 10.73%
▃▇█▅
█████▆▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆▄▁▆▄▇▁▆▁▄▆▄▆▁▆▆▁▆▁▆▁▄▁▁▁▁▁▁▁▁▁▁▄ ▆
20.1 ms Histogram: log(frequency) by time 34.8 ms <
Memory estimate: 23.68 MiB, allocs estimate: 595580. I've also checked the recent Benchmark comparison between `1.11-beta1` and `master#53f452a7216d5e8af9ccadb4c8a86bcf61466a87`Here are the results for `1.11-beta1` (installed via `juliaup`)→ julia +1.11 --startup-file=no --project=julia1.11 julia1.11/script.jl
Filtering model creation
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
Range (min … max): 6.808 μs … 1.758 ms ┊ GC (min … max): 0.00% … 98.98%
Time (median): 7.133 μs ┊ GC (median): 0.00%
Time (mean ± σ): 8.006 μs ± 24.727 μs ┊ GC (mean ± σ): 8.76% ± 3.37%
▅▆▇█▁
▁▁▂▃▆██████▆▄▄▃▃▂▂▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
6.81 μs Histogram: frequency by time 8.79 μs <
Memory estimate: 19.81 KiB, allocs estimate: 413.
Smoothing model creation
BenchmarkTools.Trial: 1632 samples with 1 evaluation.
Range (min … max): 2.770 ms … 7.357 ms ┊ GC (min … max): 0.00% … 58.78%
Time (median): 2.843 ms ┊ GC (median): 0.00%
Time (mean ± σ): 3.064 ms ± 608.889 μs ┊ GC (mean ± σ): 6.98% ± 12.24%
▅█▇▂
█████▇▆▁▄▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃▃▁▄▃▅▅▄▅▆▇▆▆▅▆▅█▆█▇▇█▆▆▆▇▄▇▅▆▅▅▆ █
2.77 ms Histogram: log(frequency) by time 5.09 ms <
Memory estimate: 5.75 MiB, allocs estimate: 125717.
Filtering inference
BenchmarkTools.Trial: 3090 samples with 1 evaluation.
Range (min … max): 1.506 ms … 14.943 ms ┊ GC (min … max): 0.00% … 88.95%
Time (median): 1.568 ms ┊ GC (median): 0.00%
Time (mean ± σ): 1.618 ms ± 620.070 μs ┊ GC (mean ± σ): 3.08% ± 6.71%
▁▂▄▆▅▅▄▂▁▂▂▃▄▄▇█▇▅▃
▂▃▃▅████████████████████▇▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▁▁▂▂▂▂▂▂▁▂▂▁▂▂▂▂▂▂ ▄
1.51 ms Histogram: frequency by time 1.73 ms <
Memory estimate: 1.07 MiB, allocs estimate: 30647.
Smoothing inference
BenchmarkTools.Trial: 227 samples with 1 evaluation.
Range (min … max): 20.095 ms … 39.775 ms ┊ GC (min … max): 0.00% … 45.73%
Time (median): 20.692 ms ┊ GC (median): 0.00%
Time (mean ± σ): 22.080 ms ± 3.530 ms ┊ GC (mean ± σ): 6.15% ± 10.73%
▃▇█▅
█████▆▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆▄▁▆▄▇▁▆▁▄▆▄▆▁▆▆▁▆▁▆▁▄▁▁▁▁▁▁▁▁▁▁▄ ▆
20.1 ms Histogram: log(frequency) by time 34.8 ms <
Memory estimate: 23.68 MiB, allocs estimate: 595580. And here are the result for the → ~/Projects/Julia/julia/julia --startup-file=no julia1.12/script.jl
Filtering model creation
BenchmarkTools.Trial: 10000 samples with 4 evaluations.
Range (min … max): 6.906 μs … 1.597 ms ┊ GC (min … max): 0.00% … 98.97%
Time (median): 7.260 μs ┊ GC (median): 0.00%
Time (mean ± σ): 8.114 μs ± 28.223 μs ┊ GC (mean ± σ): 8.69% ± 2.60%
▁▅▇██▆▃▂▁
▁▁▂▄▇█████████▆▅▄▄▃▄▃▃▃▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
6.91 μs Histogram: frequency by time 8.75 μs <
Memory estimate: 19.81 KiB, allocs estimate: 413.
Smoothing model creation
BenchmarkTools.Trial: 1604 samples with 1 evaluation.
Range (min … max): 2.802 ms … 8.084 ms ┊ GC (min … max): 0.00% … 62.53%
Time (median): 2.880 ms ┊ GC (median): 0.00%
Time (mean ± σ): 3.118 ms ± 670.430 μs ┊ GC (mean ± σ): 7.31% ± 12.73%
▅█▆▂▁
██████▆▄▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▅▁▁▄▄▅▇▄▆▆▆▆▇▆▇▇▇▆▆▆▇▆▆▆█▇▆▅▄▅ █
2.8 ms Histogram: log(frequency) by time 5.35 ms <
Memory estimate: 5.75 MiB, allocs estimate: 125587.
Filtering inference
BenchmarkTools.Trial: 3092 samples with 1 evaluation.
Range (min … max): 1.528 ms … 15.176 ms ┊ GC (min … max): 0.00% … 89.22%
Time (median): 1.559 ms ┊ GC (median): 0.00%
Time (mean ± σ): 1.616 ms ± 637.183 μs ┊ GC (mean ± σ): 3.03% ± 6.47%
▂▃▆██▆▅▄
▂▃▄▇████████▇▇▅▅▃▃▃▃▄▄▄▅▅▅▄▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▂▁▂▁▁▂▂▁▁▁▂▂▂▂▂▁▂ ▃
1.53 ms Histogram: frequency by time 1.72 ms <
Memory estimate: 1.07 MiB, allocs estimate: 30647.
Smoothing inference
BenchmarkTools.Trial: 228 samples with 1 evaluation.
Range (min … max): 19.823 ms … 41.105 ms ┊ GC (min … max): 0.00% … 48.46%
Time (median): 20.509 ms ┊ GC (median): 0.00%
Time (mean ± σ): 21.933 ms ± 3.727 ms ┊ GC (mean ± σ): 6.33% ± 11.13%
▁▅█▇▄
█████▆▆▁▁▄▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▄▄▆▆▁▁▆▆▁▆▆▄▆▁▁▆▄▄▁▆▁▁▆▁▁▁▁▆ ▆
19.8 ms Histogram: log(frequency) by time 33.4 ms <
Memory estimate: 23.67 MiB, allocs estimate: 595451. I think the issue can be closed. |
We have a set of small benchmarks to quickly test our code in RxInfer. The aim of the package is to run efficient Bayesian inference, potentially on low-power low-memory devices like RaspberryPI. We just noticed, that on Julia 1.10 we have quite a noticeable GC regression. Consider this notebook. Not an MWE but still, this notebook computes Bayesian posteriors in a simple linear Gaussian state-space probabilistic model. There are two settings:
Here are the results on the current Julia release
Here are the results on the 1.10-beta1
As you can see in the case of
run_filtering
, the maximum time jumped from13ms
to78ms
. The GC max also indicates a jump from69%
to94%
. In the case ofrun_smoothing
the situation is similar, the maximum time jumped from29ms
to88ms
. The GC max jumped from35%
to79%
.The inference precedure allocates a lot of intermediate "messages" in a form of distributions from
Distributions.jl
package, but does not use any sampling. Instead, it computes analytical solutions for posteriors. This analytical solutions also rely on dynamic multiple dispatch in many places. Eliminating dynamic multiple dispatch is not really an option, it just how it works and it was quite efficient anyway until now.The major differences between two functions is that
run_filtering
allocates a lot of information (messages) and do not use it afterwards that is probably can be free-ed right away, and therun_smoothing
retains/stores this information till the end of the procedure. You can also see that the resulting minimum execution time is also worse in both cases.I think this is quite a severe regression, especially for the
filtering
case, which is supposed to run the real-time Bayesian inference with as little GC pauses as possible. We can of course refine our code base, but in the mean-time can it be improved in general? What can cause this? How should we proceed and debug this? How can we help figuring out further?The text was updated successfully, but these errors were encountered: