-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inference: revive CachedMethodTable
mechanism
#46535
Conversation
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
Can we come up with a nanosoldier benchmark that mirrors what ControlSystems does here, so we'd catch regressions of this form on benchmarking runs? |
I dumped the cached method signatures and AFAICT it seems like that their code base involves many broadcasting operations. Still not sure how the code contains so many broadcasting operations, but I may try to come up with a target benchmark example. |
Okay, I come up with a following target example (quite artificial though), let # see https://github.com/JuliaLang/julia/pull/45276
n = 100
ex = Expr(:block)
var = gensym()
push!(ex.args, :(x .= rand(); y = sum(x)))
for i = 1:n
push!(ex.args, :(x .= $i; y += sum(x)))
end
push!(ex.args, :(return y))
@eval global function issue46535(x)
$ex
end
end
|
This benchmark target corresponds to JuliaLang/julia/pull/46535. `method_match_cache` contains many artificially generated broadcasting operations, that will lead to constant propagations and accompanying method match analysis on call sites with same abstract signatures, e.g. - `Tuple{typeof(convert), Type{Int64}, Int64}` - `Tuple{typeof(convert), Type{Float64}, Float64}` Since we currently don't cache method match result for abstract call signatures on the level of the runtime system, we can obtain the best performance if we cache such abstract method match results on Julia level, that will be revived by JuliaLang/julia/pull/46535.
Filed as JuliaCI/BaseBenchmarks.jl#299. |
This PR appears to improve the compile time with over 13x 🥳 , but the benchmark shows a 1.17x improvement only, is that really enough to catch regressions that are smaller than 13x? |
I think we probably won't want to replicate the entire call graph of the |
#299) This benchmark target corresponds to JuliaLang/julia/pull/46535. `method_match_cache` contains many artificially generated broadcasting operations, that will lead to constant propagations and accompanying method match analysis on call sites with same abstract signatures, e.g. - `Tuple{typeof(convert), Type{Int64}, Int64}` - `Tuple{typeof(convert), Type{Float64}, Float64}` Since we currently don't cache method match result for abstract call signatures on the level of the runtime system, we can obtain the best performance if we cache such abstract method match results on Julia level, that will be revived by JuliaLang/julia/pull/46535.
`CachedMethodTable` was removed within #44240 as we couldn't confirm any performance improvement then. However it turns out the optimization was critical in some real world cases (e.g. #46492), so this commit revives the mechanism with the following tweaks that should make it more effective: - create method table cache per inference (rather than per local inference on a function call as on the previous implementation) - only use cache mechanism for abstract types (since we already cache lookup result at the next level as for concrete types) As a result, the following snippet reported at #46492 recovers the compilation performance: ```julia using ControlSystems a_2 = [-5 -3; 2 -9] C_212 = ss(a_2, [1; 2], [1 0; 0 1], [0; 0]) @time norm(C_212) ``` > on master ``` julia> @time norm(C_212) 364.489044 seconds (724.44 M allocations: 92.524 GiB, 6.01% gc time, 100.00% compilation time) 0.5345224838248489 ``` > on this commit ``` julia> @time norm(C_212) 26.539016 seconds (62.09 M allocations: 5.537 GiB, 5.55% gc time, 100.00% compilation time) 0.5345224838248489 ```
@nanosoldier |
SGTM |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
CachedMethodTable
was removed within #44240 as we couldn't confirm anyperformance improvement then. However it turns out the optimization was
critical in some real world cases (e.g. #46492), so this commit revives
the mechanism with the following tweaks that should make it more effective:
inference on a function call as on the previous implementation)
lookup result at the next level as for concrete types)
As a result, the following snippet reported at #46492 recovers the
compilation performance: