-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not round-trip uncached inference results through IRCode #47137
Conversation
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
Well, that's disappointing. Will look into it in the morning. |
So we need something like: diff --git a/base/compiler/typeinfer.jl b/base/compiler/typeinfer.jl
index 1d119660d3..443b184803 100644
--- a/base/compiler/typeinfer.jl
+++ b/base/compiler/typeinfer.jl
@@ -932,6 +932,9 @@ function typeinf_edge(interp::AbstractInterpreter, method::Method, @nospecialize
# completely new
lock_mi_inference(interp, mi)
result = InferenceResult(mi)
+ if cache === :local
+ result.must_be_codeinf = true # TODO directly keep `opt.ir` for this case
+ end
frame = InferenceState(result, cache, interp) # always use the cache for edge targets
if frame === nothing
# can't get the source for this, so we know nothing
@@ -1005,6 +1008,7 @@ function typeinf_frame(interp::AbstractInterpreter, method::Method, @nospecializ
mi = specialize_method(method, atype, sparams)::MethodInstance
ccall(:jl_typeinf_timing_begin, Cvoid, ())
result = InferenceResult(mi)
+ result.must_be_codeinf = true
frame = InferenceState(result, run_optimizer ? :global : :no, interp)
frame === nothing && return nothing
typeinf(interp, frame)
@@ -1063,8 +1067,9 @@ function typeinf_ext(interp::AbstractInterpreter, mi::MethodInstance)
return retrieve_code_info(mi)
end
lock_mi_inference(interp, mi)
- frame = InferenceState(InferenceResult(mi), #=cache=#:global, interp)
- frame.result.must_be_codeinf = true
+ result = InferenceResult(mi)
+ result.must_be_codeinf = true
+ frame = InferenceState(result, #=cache=#:global, interp)
frame === nothing && return nothing
typeinf(interp, frame)
ccall(:jl_typeinf_timing_end, Cvoid, ())
@@ -1107,6 +1112,7 @@ function typeinf_ext_toplevel(interp::AbstractInterpreter, linfo::MethodInstance
ccall(:jl_typeinf_timing_begin, Cvoid, ())
if !src.inferred
result = InferenceResult(linfo)
+ result.must_be_codeinf = true
frame = InferenceState(result, src, #=cache=#:global, interp)
typeinf(interp, frame)
@assert frame.inferred # TODO: deal with this better |
Is there ever a reason we would want this field to be a CodeInfo? Seems like only the global cache needs that (preparing for codegen and other outside consumers), but that inlining would always be happier to get the IRCode directly if we had it |
typeinf_ext sometimes wants codeinfos back that it doesn't put in the global cache IIRC. |
50d9261
to
374c1bd
Compare
@nanosoldier |
Your job failed. |
Ugh, |
374c1bd
to
d7b0e01
Compare
There's generally three reasons inference results end up uncached: 1. They come from typeinf_ext 2. We discover some validity limitation (generally due to recursion) 3. They are used for constant propagation Currently, we convert all such inference results back to CodeInfo, in case they come from 1. However, for inference results of kind 3, the only thing we ever do with them is turn them back into IRCode for inlining. This round-tripping through IRCode is quite wasteful. Stop doing that. This PR is the minimal change to accomplish that by marking those inference results that actually need to be converted back (for case 1). This probably needs some tweaking for external AbstractInterpreters, but let's make sure this works and has the right performance first. This commit just adds the capability, but doesn't turn it on by default, since the performance for base didn't quite look favorable yet.
d7b0e01
to
0378012
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
Currently the inlining algorithm is allowed to use inferred source of const-prop'ed call that is always locally available (since const-prop' result isn't cached globally). For non const-prop'ed and globally cached calls, however, it undergoes a more expensive process, making a round-trip through serialized inferred source. We can actually bypass this expensive deserialization when inferred source for globally-cached result is available locally, i.e. when it has been inferred in the same inference shot. Note that it would be more efficient to propagate `IRCode` object directly and skip inflation from `CodeInfo` to `IRCode` as experimented in #47137, but currently the round-trip through `CodeInfo`-representation is necessary because it often leads to better CFG simplification and `cfg_simplify!` seems to be still expensive.
Currently the inlining algorithm is allowed to use inferred source of const-prop'ed call that is always locally available (since const-prop' result isn't cached globally). For non const-prop'ed and globally cached calls, however, it undergoes a more expensive process, making a round-trip through serialized inferred source. We can actually bypass this expensive deserialization when inferred source for globally-cached result is available locally, i.e. when it has been inferred in the same inference shot. Note that it would be more efficient to propagate `IRCode` object directly and skip inflation from `CodeInfo` to `IRCode` as experimented in #47137, but currently the round-trip through `CodeInfo`-representation is necessary because it often leads to better CFG simplification and `cfg_simplify!` seems to be still expensive.
Currently the inlining algorithm is allowed to use inferred source of const-prop'ed call that is always locally available (since const-prop' result isn't cached globally). For non const-prop'ed and globally cached calls, however, it undergoes a more expensive process, making a round-trip through serialized inferred source. We can actually bypass this expensive deserialization when inferred source for globally-cached result is available locally, i.e. when it has been inferred in the same inference shot. Note that it would be more efficient to propagate `IRCode` object directly and skip inflation from `CodeInfo` to `IRCode` as experimented in #47137, but currently the round-trip through `CodeInfo`-representation is necessary because it often leads to better CFG simplification and `cfg_simplify!` seems to be still expensive.
Currently the inlining algorithm is allowed to use inferred source of const-prop'ed call that is always locally available (since const-prop' result isn't cached globally). For non const-prop'ed and globally cached calls, however, it undergoes a more expensive process, making a round-trip through serialized inferred source. We can actually bypass this expensive deserialization when inferred source for globally-cached result is available locally, i.e. when it has been inferred in the same inference shot. Note that it would be more efficient to propagate `IRCode` object directly and skip inflation from `CodeInfo` to `IRCode` as experimented in #47137, but currently the round-trip through `CodeInfo`-representation is necessary because it often leads to better CFG simplification and `cfg_simplify!` seems to be still expensive.
Currently the inlining algorithm is allowed to use inferred source of const-prop'ed call that is always locally available (since const-prop' result isn't cached globally). For non const-prop'ed and globally cached calls, however, it undergoes a more expensive process, making a round-trip through serialized inferred source. We can actually bypass this expensive deserialization when inferred source for globally-cached result is available locally, i.e. when it has been inferred in the same inference shot. Note that it would be more efficient to propagate `IRCode` object directly and skip inflation from `CodeInfo` to `IRCode` as experimented in #47137, but currently the round-trip through `CodeInfo`-representation is necessary because it often leads to better CFG simplification and `cfg_simplify!` seems to be still expensive.
Currently the inlining algorithm is allowed to use inferred source of const-prop'ed call that is always locally available (since const-prop' result isn't cached globally). For non const-prop'ed and globally cached calls, however, it undergoes a more expensive process, making a round-trip through serialized inferred source. We can actually bypass this expensive deserialization when inferred source for globally-cached result is available locally, i.e. when it has been inferred in the same inference shot. Note that it would be more efficient to propagate `IRCode` object directly and skip inflation from `CodeInfo` to `IRCode` as experimented in #47137, but currently the round-trip through `CodeInfo`-representation is necessary because it often leads to better CFG simplification and `cfg_simplify!` seems to be still expensive.
Currently the inlining algorithm is allowed to use inferred source of const-prop'ed call that is always locally available (since const-prop' result isn't cached globally). For non const-prop'ed and globally cached calls, however, it undergoes a more expensive process, making a round-trip through serialized inferred source. We can actually bypass this expensive deserialization when inferred source for globally-cached result is available locally, i.e. when it has been inferred in the same inference shot. Note that it would be more efficient to propagate `IRCode` object directly and skip inflation from `CodeInfo` to `IRCode` as experimented in #47137, but currently the round-trip through `CodeInfo`-representation is necessary because it often leads to better CFG simplification and `cfg_simplify!` seems to be still expensive.
Currently the inlining algorithm is allowed to use inferred source of const-prop'ed call that is always locally available (since const-prop' result isn't cached globally). For non const-prop'ed and globally cached calls, however, it undergoes a more expensive process, making a round-trip through serialized inferred source. We can actually bypass this expensive deserialization when inferred source for globally-cached result is available locally, i.e. when it has been inferred in the same inference shot. Note that it would be more efficient to propagate `IRCode` object directly and skip inflation from `CodeInfo` to `IRCode` as experimented in #47137, but currently the round-trip through `CodeInfo`-representation is necessary because it often leads to better CFG simplification and `cfg_simplify!` seems to be still expensive.
Currently the inlining algorithm is allowed to use inferred source of const-prop'ed call that is always locally available (since const-prop' result isn't cached globally). For non const-prop'ed and globally cached calls, however, it undergoes a more expensive process, making a round-trip through serialized inferred source. We can actually bypass this expensive deserialization when inferred source for globally-cached result is available locally, i.e. when it has been inferred in the same inference shot. Note that it would be more efficient to propagate `IRCode` object directly and skip inflation from `CodeInfo` to `IRCode` as experimented in #47137, but currently the round-trip through `CodeInfo`-representation is necessary because it often leads to better CFG simplification and `cfg_simplify!` seems to be still expensive.
Currently the inlining algorithm is allowed to use inferred source of const-prop'ed call that is always locally available (since const-prop' result isn't cached globally). For non const-prop'ed and globally cached calls, however, it undergoes a more expensive process, making a round-trip through serialized inferred source. We can actually bypass this expensive deserialization when inferred source for globally-cached result is available locally, i.e. when it has been inferred in the same inference shot. Note that it would be more efficient to propagate `IRCode` object directly and skip inflation from `CodeInfo` to `IRCode` as experimented in #47137, but currently the round-trip through `CodeInfo`-representation is necessary because it often leads to better CFG simplification and `cfg_simplify!` seems to be still expensive.
…sult (#51934) Currently the inlining algorithm is allowed to use inferred source of const-prop'ed call that is always locally available (since const-prop' result isn't cached globally). For non const-prop'ed and globally cached calls, however, it undergoes a more expensive process, making a round-trip through serialized inferred source. We can improve efficiency by bypassing the serialization round-trip for newly-inferred and globally-cached frames. As these frames are never cached locally, they can be viewed as volatile. This means we can use their source destructively while inline-expanding them. The benchmark results show that this optimization achieves 2-4% allocation reduction and about 5% speed up in the real-world-ish compilation targets (`allinference`). Note that it would be more efficient to propagate `IRCode` object directly and skip inflation from `CodeInfo` to `IRCode` as experimented in #47137, but currently the round-trip through `CodeInfo`-representation is necessary because it often leads to better CFG simplification while `cfg_simplify!` being expensive (xref: #51960).
Currently the inlining algorithm is allowed to use inferred source of const-prop'ed call that is always locally available (since const-prop' result isn't cached globally). For non const-prop'ed and globally cached calls, however, it undergoes a more expensive process, making a round-trip through serialized inferred source. We can actually bypass this expensive deserialization when inferred source for globally-cached result is available locally, i.e. when it has been inferred in the same inference shot. Note that it would be more efficient to propagate `IRCode` object directly and skip inflation from `CodeInfo` to `IRCode` as experimented in JuliaLang#47137, but currently the round-trip through `CodeInfo`-representation is necessary because it often leads to better CFG simplification and `cfg_simplify!` seems to be still expensive.
…source Built on top of JuliaLang#51958, with the improved performance of `cfg_simplify!`, let's give another try on JuliaLang#47137. Tha aim is to retain locally cached inferred source as `IRCode`, eliminating the need for the inlining algorithm to round-trip it through `CodeInfo` representation.
…source Built on top of JuliaLang#51958, with the improved performance of `cfg_simplify!`, let's give another try on JuliaLang#47137. Tha aim is to retain locally cached inferred source as `IRCode`, eliminating the need for the inlining algorithm to round-trip it through `CodeInfo` representation.
There's generally three reasons inference results end up uncached:
Currently, we convert all such inference results back to CodeInfo, in case they come from 1. However, for inference results of kind 3, the only thing we ever do with them is turn them back into IRCode for inlining. This round-tripping through IRCode is quite wasteful. Stop doing that. This PR is the minimal change to accomplish that by marking those inference results that actually need to be converted back (for case 1). This probably needs some tweaking for external AbstractInterpreters, but let's make sure this works and has the right performance first.