Do not round-trip uncached inference results through IRCode #47137

Keno · 2022-10-12T06:59:19Z

There's generally three reasons inference results end up uncached:

They come from typeinf_ext
We discover some validity limitation (generally due to recursion)
They are used for constant propagation

Currently, we convert all such inference results back to CodeInfo, in case they come from 1. However, for inference results of kind 3, the only thing we ever do with them is turn them back into IRCode for inlining. This round-tripping through IRCode is quite wasteful. Stop doing that. This PR is the minimal change to accomplish that by marking those inference results that actually need to be converted back (for case 1). This probably needs some tweaking for external AbstractInterpreters, but let's make sure this works and has the right performance first.

Keno · 2022-10-12T06:59:36Z

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2022-10-12T07:53:59Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

Keno · 2022-10-12T07:58:15Z

Your benchmark job has completed - possible performance regressions were detected

Well, that's disappointing. Will look into it in the morning.

aviatesk · 2022-10-12T12:20:16Z

typeinf_ext_toplevel and typeinf_frame also want to have (frame::InferenceState).src, so we need to mark their results as well. The typeinf_edge for callsite-inlined frame also needs special treatment.

So we need something like:

diff --git a/base/compiler/typeinfer.jl b/base/compiler/typeinfer.jl
index 1d119660d3..443b184803 100644
--- a/base/compiler/typeinfer.jl
+++ b/base/compiler/typeinfer.jl
@@ -932,6 +932,9 @@ function typeinf_edge(interp::AbstractInterpreter, method::Method, @nospecialize
         # completely new
         lock_mi_inference(interp, mi)
         result = InferenceResult(mi)
+        if cache === :local
+            result.must_be_codeinf = true # TODO directly keep `opt.ir` for this case
+        end
         frame = InferenceState(result, cache, interp) # always use the cache for edge targets
         if frame === nothing
             # can't get the source for this, so we know nothing
@@ -1005,6 +1008,7 @@ function typeinf_frame(interp::AbstractInterpreter, method::Method, @nospecializ
     mi = specialize_method(method, atype, sparams)::MethodInstance
     ccall(:jl_typeinf_timing_begin, Cvoid, ())
     result = InferenceResult(mi)
+    result.must_be_codeinf = true
     frame = InferenceState(result, run_optimizer ? :global : :no, interp)
     frame === nothing && return nothing
     typeinf(interp, frame)
@@ -1063,8 +1067,9 @@ function typeinf_ext(interp::AbstractInterpreter, mi::MethodInstance)
         return retrieve_code_info(mi)
     end
     lock_mi_inference(interp, mi)
-    frame = InferenceState(InferenceResult(mi), #=cache=#:global, interp)
-    frame.result.must_be_codeinf = true
+    result = InferenceResult(mi)
+    result.must_be_codeinf = true
+    frame = InferenceState(result, #=cache=#:global, interp)
     frame === nothing && return nothing
     typeinf(interp, frame)
     ccall(:jl_typeinf_timing_end, Cvoid, ())
@@ -1107,6 +1112,7 @@ function typeinf_ext_toplevel(interp::AbstractInterpreter, linfo::MethodInstance
             ccall(:jl_typeinf_timing_begin, Cvoid, ())
             if !src.inferred
                 result = InferenceResult(linfo)
+                result.must_be_codeinf = true
                 frame = InferenceState(result, src, #=cache=#:global, interp)
                 typeinf(interp, frame)
                 @assert frame.inferred # TODO: deal with this better

vtjnash · 2022-10-12T17:29:12Z

Is there ever a reason we would want this field to be a CodeInfo? Seems like only the global cache needs that (preparing for codegen and other outside consumers), but that inlining would always be happier to get the IRCode directly if we had it

Keno · 2022-10-14T20:00:45Z

Is there ever a reason we would want this field to be a CodeInfo? Seems like only the global cache needs that (preparing for codegen and other outside consumers), but that inlining would always be happier to get the IRCode directly if we had it

typeinf_ext sometimes wants codeinfos back that it doesn't put in the global cache IIRC.

Keno · 2022-10-14T21:11:02Z

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2022-10-14T21:17:55Z

Your job failed.

Keno · 2022-10-14T21:53:09Z

Ugh, cfg_simplify! has too many bugs and compact!(ir, true) is too slow. I don't want to track this down right now. I'll just switch the default to what this used to be, so I can play with this in my external AbstractInterpreter setup and worry about the base case later.

There's generally three reasons inference results end up uncached: 1. They come from typeinf_ext 2. We discover some validity limitation (generally due to recursion) 3. They are used for constant propagation Currently, we convert all such inference results back to CodeInfo, in case they come from 1. However, for inference results of kind 3, the only thing we ever do with them is turn them back into IRCode for inlining. This round-tripping through IRCode is quite wasteful. Stop doing that. This PR is the minimal change to accomplish that by marking those inference results that actually need to be converted back (for case 1). This probably needs some tweaking for external AbstractInterpreters, but let's make sure this works and has the right performance first. This commit just adds the capability, but doesn't turn it on by default, since the performance for base didn't quite look favorable yet.

Keno · 2022-10-14T21:59:49Z

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2022-10-14T22:54:31Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

We can come back to when exactly we need to turn this option on once we enable this option for Base.

Currently the inlining algorithm is allowed to use inferred source of const-prop'ed call that is always locally available (since const-prop' result isn't cached globally). For non const-prop'ed and globally cached calls, however, it undergoes a more expensive process, making a round-trip through serialized inferred source. We can actually bypass this expensive deserialization when inferred source for globally-cached result is available locally, i.e. when it has been inferred in the same inference shot. Note that it would be more efficient to propagate `IRCode` object directly and skip inflation from `CodeInfo` to `IRCode` as experimented in #47137, but currently the round-trip through `CodeInfo`-representation is necessary because it often leads to better CFG simplification and `cfg_simplify!` seems to be still expensive.

Built on top of #51958, with the improved performance of `cfg_simplify!`, let's give another try on #47137. Tha aim is to retain locally cached inferred source as `IRCode`, eliminating the need for the inlining algorithm to round-trip it through `CodeInfo` representation.

Currently the inlining algorithm is allowed to use inferred source of const-prop'ed call that is always locally available (since const-prop' result isn't cached globally). For non const-prop'ed and globally cached calls, however, it undergoes a more expensive process, making a round-trip through serialized inferred source. We can actually bypass this expensive deserialization when inferred source for globally-cached result is available locally, i.e. when it has been inferred in the same inference shot. Note that it would be more efficient to propagate `IRCode` object directly and skip inflation from `CodeInfo` to `IRCode` as experimented in #47137, but currently the round-trip through `CodeInfo`-representation is necessary because it often leads to better CFG simplification and `cfg_simplify!` seems to be still expensive.

…sult (#51934) Currently the inlining algorithm is allowed to use inferred source of const-prop'ed call that is always locally available (since const-prop' result isn't cached globally). For non const-prop'ed and globally cached calls, however, it undergoes a more expensive process, making a round-trip through serialized inferred source. We can improve efficiency by bypassing the serialization round-trip for newly-inferred and globally-cached frames. As these frames are never cached locally, they can be viewed as volatile. This means we can use their source destructively while inline-expanding them. The benchmark results show that this optimization achieves 2-4% allocation reduction and about 5% speed up in the real-world-ish compilation targets (`allinference`). Note that it would be more efficient to propagate `IRCode` object directly and skip inflation from `CodeInfo` to `IRCode` as experimented in #47137, but currently the round-trip through `CodeInfo`-representation is necessary because it often leads to better CFG simplification while `cfg_simplify!` being expensive (xref: #51960).

Currently the inlining algorithm is allowed to use inferred source of const-prop'ed call that is always locally available (since const-prop' result isn't cached globally). For non const-prop'ed and globally cached calls, however, it undergoes a more expensive process, making a round-trip through serialized inferred source. We can actually bypass this expensive deserialization when inferred source for globally-cached result is available locally, i.e. when it has been inferred in the same inference shot. Note that it would be more efficient to propagate `IRCode` object directly and skip inflation from `CodeInfo` to `IRCode` as experimented in JuliaLang#47137, but currently the round-trip through `CodeInfo`-representation is necessary because it often leads to better CFG simplification and `cfg_simplify!` seems to be still expensive.

…source Built on top of JuliaLang#51958, with the improved performance of `cfg_simplify!`, let's give another try on JuliaLang#47137. Tha aim is to retain locally cached inferred source as `IRCode`, eliminating the need for the inlining algorithm to round-trip it through `CodeInfo` representation.

Built on top of #51958, with the improved performance of `cfg_simplify!`, let's give another try on #47137. Tha aim is to retain locally cached inferred source as `IRCode`, eliminating the need for the inlining algorithm to round-trip it through `CodeInfo` representation.

Keno mentioned this pull request Oct 12, 2022

Check inlining policy before inlining semi concrete eval IR #47139

Merged

Keno force-pushed the kf/noinlineroundtrip branch 2 times, most recently from 50d9261 to 374c1bd Compare October 14, 2022 21:10

Keno force-pushed the kf/noinlineroundtrip branch from 374c1bd to d7b0e01 Compare October 14, 2022 21:53

Keno force-pushed the kf/noinlineroundtrip branch from d7b0e01 to 0378012 Compare October 14, 2022 21:54

Keno merged commit 532125d into master Oct 15, 2022

Keno deleted the kf/noinlineroundtrip branch October 15, 2022 14:53

aviatesk added a commit that referenced this pull request Nov 9, 2022

follow up #47137, delete setfield! call to must_be_codeinf

3638f7a

We can come back to when exactly we need to turn this option on once we enable this option for Base.

aviatesk added a commit that referenced this pull request Nov 10, 2022

follow up #47137, delete setfield! call to must_be_codeinf (#47508)

bedd14d

We can come back to when exactly we need to turn this option on once we enable this option for Base.

KristofferC pushed a commit that referenced this pull request Nov 13, 2022

follow up #47137, delete setfield! call to must_be_codeinf (#47508)

2b04a0d

We can come back to when exactly we need to turn this option on once we enable this option for Base.

aviatesk mentioned this pull request Oct 29, 2023

inference: don't put globally-cached results into inference local cache #51888

Merged

aviatesk mentioned this pull request Oct 30, 2023

inlining: avoid source deserialization by using volatile inference result #51934

Merged

aviatesk mentioned this pull request Oct 31, 2023

revisit #47137: avoid round-trip of locally-cached inferred source #51960

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not round-trip uncached inference results through IRCode #47137

Do not round-trip uncached inference results through IRCode #47137

Keno commented Oct 12, 2022

Keno commented Oct 12, 2022

nanosoldier commented Oct 12, 2022

Keno commented Oct 12, 2022

aviatesk commented Oct 12, 2022

vtjnash commented Oct 12, 2022

Keno commented Oct 14, 2022

Keno commented Oct 14, 2022

nanosoldier commented Oct 14, 2022

Keno commented Oct 14, 2022

Keno commented Oct 14, 2022

nanosoldier commented Oct 14, 2022

Do not round-trip uncached inference results through IRCode #47137

Do not round-trip uncached inference results through IRCode #47137

Conversation

Keno commented Oct 12, 2022

Keno commented Oct 12, 2022

nanosoldier commented Oct 12, 2022

Keno commented Oct 12, 2022

aviatesk commented Oct 12, 2022

vtjnash commented Oct 12, 2022

Keno commented Oct 14, 2022

Keno commented Oct 14, 2022

nanosoldier commented Oct 14, 2022

Keno commented Oct 14, 2022

Keno commented Oct 14, 2022

nanosoldier commented Oct 14, 2022