Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not round-trip uncached inference results through IRCode #47137

Merged
merged 1 commit into from
Oct 15, 2022

Conversation

Keno
Copy link
Member

@Keno Keno commented Oct 12, 2022

There's generally three reasons inference results end up uncached:

  1. They come from typeinf_ext
  2. We discover some validity limitation (generally due to recursion)
  3. They are used for constant propagation

Currently, we convert all such inference results back to CodeInfo, in case they come from 1. However, for inference results of kind 3, the only thing we ever do with them is turn them back into IRCode for inlining. This round-tripping through IRCode is quite wasteful. Stop doing that. This PR is the minimal change to accomplish that by marking those inference results that actually need to be converted back (for case 1). This probably needs some tweaking for external AbstractInterpreters, but let's make sure this works and has the right performance first.

@Keno
Copy link
Member Author

Keno commented Oct 12, 2022

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@Keno
Copy link
Member Author

Keno commented Oct 12, 2022

Your benchmark job has completed - possible performance regressions were detected

Well, that's disappointing. Will look into it in the morning.

@aviatesk
Copy link
Member

typeinf_ext_toplevel and typeinf_frame also want to have (frame::InferenceState).src, so we need to mark their results as well. The typeinf_edge for callsite-inlined frame also needs special treatment.

So we need something like:

diff --git a/base/compiler/typeinfer.jl b/base/compiler/typeinfer.jl
index 1d119660d3..443b184803 100644
--- a/base/compiler/typeinfer.jl
+++ b/base/compiler/typeinfer.jl
@@ -932,6 +932,9 @@ function typeinf_edge(interp::AbstractInterpreter, method::Method, @nospecialize
         # completely new
         lock_mi_inference(interp, mi)
         result = InferenceResult(mi)
+        if cache === :local
+            result.must_be_codeinf = true # TODO directly keep `opt.ir` for this case
+        end
         frame = InferenceState(result, cache, interp) # always use the cache for edge targets
         if frame === nothing
             # can't get the source for this, so we know nothing
@@ -1005,6 +1008,7 @@ function typeinf_frame(interp::AbstractInterpreter, method::Method, @nospecializ
     mi = specialize_method(method, atype, sparams)::MethodInstance
     ccall(:jl_typeinf_timing_begin, Cvoid, ())
     result = InferenceResult(mi)
+    result.must_be_codeinf = true
     frame = InferenceState(result, run_optimizer ? :global : :no, interp)
     frame === nothing && return nothing
     typeinf(interp, frame)
@@ -1063,8 +1067,9 @@ function typeinf_ext(interp::AbstractInterpreter, mi::MethodInstance)
         return retrieve_code_info(mi)
     end
     lock_mi_inference(interp, mi)
-    frame = InferenceState(InferenceResult(mi), #=cache=#:global, interp)
-    frame.result.must_be_codeinf = true
+    result = InferenceResult(mi)
+    result.must_be_codeinf = true
+    frame = InferenceState(result, #=cache=#:global, interp)
     frame === nothing && return nothing
     typeinf(interp, frame)
     ccall(:jl_typeinf_timing_end, Cvoid, ())
@@ -1107,6 +1112,7 @@ function typeinf_ext_toplevel(interp::AbstractInterpreter, linfo::MethodInstance
             ccall(:jl_typeinf_timing_begin, Cvoid, ())
             if !src.inferred
                 result = InferenceResult(linfo)
+                result.must_be_codeinf = true
                 frame = InferenceState(result, src, #=cache=#:global, interp)
                 typeinf(interp, frame)
                 @assert frame.inferred # TODO: deal with this better

@vtjnash
Copy link
Member

vtjnash commented Oct 12, 2022

Is there ever a reason we would want this field to be a CodeInfo? Seems like only the global cache needs that (preparing for codegen and other outside consumers), but that inlining would always be happier to get the IRCode directly if we had it

@Keno
Copy link
Member Author

Keno commented Oct 14, 2022

Is there ever a reason we would want this field to be a CodeInfo? Seems like only the global cache needs that (preparing for codegen and other outside consumers), but that inlining would always be happier to get the IRCode directly if we had it

typeinf_ext sometimes wants codeinfos back that it doesn't put in the global cache IIRC.

@Keno Keno force-pushed the kf/noinlineroundtrip branch 2 times, most recently from 50d9261 to 374c1bd Compare October 14, 2022 21:10
@Keno
Copy link
Member Author

Keno commented Oct 14, 2022

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your job failed.

@Keno
Copy link
Member Author

Keno commented Oct 14, 2022

Ugh, cfg_simplify! has too many bugs and compact!(ir, true) is too slow. I don't want to track this down right now. I'll just switch the default to what this used to be, so I can play with this in my external AbstractInterpreter setup and worry about the base case later.

There's generally three reasons inference results end up uncached:
1. They come from typeinf_ext
2. We discover some validity limitation (generally due to recursion)
3. They are used for constant propagation

Currently, we convert all such inference results back to CodeInfo,
in case they come from 1. However, for inference results of kind 3,
the only thing we ever do with them is turn them back into IRCode
for inlining. This round-tripping through IRCode is quite wasteful.
Stop doing that. This PR is the minimal change to accomplish that
by marking those inference results that actually need to be converted
back (for case 1). This probably needs some tweaking for external
AbstractInterpreters, but let's make sure this works and has the
right performance first.

This commit just adds the capability, but doesn't turn it on
by default, since the performance for base didn't quite look
favorable yet.
@Keno
Copy link
Member Author

Keno commented Oct 14, 2022

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@Keno Keno merged commit 532125d into master Oct 15, 2022
@Keno Keno deleted the kf/noinlineroundtrip branch October 15, 2022 14:53
aviatesk added a commit that referenced this pull request Nov 9, 2022
We can come back to when exactly we need to turn this option on once we
enable this option for Base.
aviatesk added a commit that referenced this pull request Nov 10, 2022
We can come back to when exactly we need to turn this option on once we
enable this option for Base.
KristofferC pushed a commit that referenced this pull request Nov 13, 2022
We can come back to when exactly we need to turn this option on once we
enable this option for Base.
aviatesk added a commit that referenced this pull request Oct 30, 2023
Currently the inlining algorithm is allowed to use inferred source of
const-prop'ed call that is always locally available (since const-prop'
result isn't cached globally). For non const-prop'ed and globally cached
calls, however, it undergoes a more expensive process, making a
round-trip through serialized inferred source. We can actually bypass
this expensive deserialization when inferred source for globally-cached
result is available locally, i.e. when it has been inferred in the same
inference shot.

Note that it would be more efficient to propagate `IRCode` object
directly and skip inflation from `CodeInfo` to `IRCode` as experimented
in #47137, but currently the round-trip through `CodeInfo`-representation
is necessary because it often leads to better CFG simplification and
`cfg_simplify!` seems to be still expensive.
aviatesk added a commit that referenced this pull request Oct 31, 2023
Currently the inlining algorithm is allowed to use inferred source of
const-prop'ed call that is always locally available (since const-prop'
result isn't cached globally). For non const-prop'ed and globally cached
calls, however, it undergoes a more expensive process, making a
round-trip through serialized inferred source. We can actually bypass
this expensive deserialization when inferred source for globally-cached
result is available locally, i.e. when it has been inferred in the same
inference shot.

Note that it would be more efficient to propagate `IRCode` object
directly and skip inflation from `CodeInfo` to `IRCode` as experimented
in #47137, but currently the round-trip through `CodeInfo`-representation
is necessary because it often leads to better CFG simplification and
`cfg_simplify!` seems to be still expensive.
aviatesk added a commit that referenced this pull request Oct 31, 2023
Currently the inlining algorithm is allowed to use inferred source of
const-prop'ed call that is always locally available (since const-prop'
result isn't cached globally). For non const-prop'ed and globally cached
calls, however, it undergoes a more expensive process, making a
round-trip through serialized inferred source. We can actually bypass
this expensive deserialization when inferred source for globally-cached
result is available locally, i.e. when it has been inferred in the same
inference shot.

Note that it would be more efficient to propagate `IRCode` object
directly and skip inflation from `CodeInfo` to `IRCode` as experimented
in #47137, but currently the round-trip through `CodeInfo`-representation
is necessary because it often leads to better CFG simplification and
`cfg_simplify!` seems to be still expensive.
aviatesk added a commit that referenced this pull request Oct 31, 2023
Currently the inlining algorithm is allowed to use inferred source of
const-prop'ed call that is always locally available (since const-prop'
result isn't cached globally). For non const-prop'ed and globally cached
calls, however, it undergoes a more expensive process, making a
round-trip through serialized inferred source. We can actually bypass
this expensive deserialization when inferred source for globally-cached
result is available locally, i.e. when it has been inferred in the same
inference shot.

Note that it would be more efficient to propagate `IRCode` object
directly and skip inflation from `CodeInfo` to `IRCode` as experimented
in #47137, but currently the round-trip through `CodeInfo`-representation
is necessary because it often leads to better CFG simplification and
`cfg_simplify!` seems to be still expensive.
aviatesk added a commit that referenced this pull request Oct 31, 2023
Currently the inlining algorithm is allowed to use inferred source of
const-prop'ed call that is always locally available (since const-prop'
result isn't cached globally). For non const-prop'ed and globally cached
calls, however, it undergoes a more expensive process, making a
round-trip through serialized inferred source. We can actually bypass
this expensive deserialization when inferred source for globally-cached
result is available locally, i.e. when it has been inferred in the same
inference shot.

Note that it would be more efficient to propagate `IRCode` object
directly and skip inflation from `CodeInfo` to `IRCode` as experimented
in #47137, but currently the round-trip through `CodeInfo`-representation
is necessary because it often leads to better CFG simplification and
`cfg_simplify!` seems to be still expensive.
aviatesk added a commit that referenced this pull request Oct 31, 2023
Built on top of #51958, with the improved performance of `cfg_simplify!`,
let's give another try on #47137. Tha aim is to retain
locally cached inferred source as `IRCode`, eliminating the need for the
inlining algorithm to round-trip it through `CodeInfo` representation.
aviatesk added a commit that referenced this pull request Oct 31, 2023
Built on top of #51958, with the improved performance of `cfg_simplify!`,
let's give another try on #47137. Tha aim is to retain
locally cached inferred source as `IRCode`, eliminating the need for the
inlining algorithm to round-trip it through `CodeInfo` representation.
aviatesk added a commit that referenced this pull request Oct 31, 2023
Built on top of #51958, with the improved performance of `cfg_simplify!`,
let's give another try on #47137. Tha aim is to retain
locally cached inferred source as `IRCode`, eliminating the need for the
inlining algorithm to round-trip it through `CodeInfo` representation.
aviatesk added a commit that referenced this pull request Oct 31, 2023
Built on top of #51958, with the improved performance of `cfg_simplify!`,
let's give another try on #47137. Tha aim is to retain
locally cached inferred source as `IRCode`, eliminating the need for the
inlining algorithm to round-trip it through `CodeInfo` representation.
aviatesk added a commit that referenced this pull request Nov 1, 2023
Currently the inlining algorithm is allowed to use inferred source of
const-prop'ed call that is always locally available (since const-prop'
result isn't cached globally). For non const-prop'ed and globally cached
calls, however, it undergoes a more expensive process, making a
round-trip through serialized inferred source. We can actually bypass
this expensive deserialization when inferred source for globally-cached
result is available locally, i.e. when it has been inferred in the same
inference shot.

Note that it would be more efficient to propagate `IRCode` object
directly and skip inflation from `CodeInfo` to `IRCode` as experimented
in #47137, but currently the round-trip through `CodeInfo`-representation
is necessary because it often leads to better CFG simplification and
`cfg_simplify!` seems to be still expensive.
aviatesk added a commit that referenced this pull request Nov 1, 2023
Currently the inlining algorithm is allowed to use inferred source of
const-prop'ed call that is always locally available (since const-prop'
result isn't cached globally). For non const-prop'ed and globally cached
calls, however, it undergoes a more expensive process, making a
round-trip through serialized inferred source. We can actually bypass
this expensive deserialization when inferred source for globally-cached
result is available locally, i.e. when it has been inferred in the same
inference shot.

Note that it would be more efficient to propagate `IRCode` object
directly and skip inflation from `CodeInfo` to `IRCode` as experimented
in #47137, but currently the round-trip through `CodeInfo`-representation
is necessary because it often leads to better CFG simplification and
`cfg_simplify!` seems to be still expensive.
aviatesk added a commit that referenced this pull request Nov 1, 2023
Currently the inlining algorithm is allowed to use inferred source of
const-prop'ed call that is always locally available (since const-prop'
result isn't cached globally). For non const-prop'ed and globally cached
calls, however, it undergoes a more expensive process, making a
round-trip through serialized inferred source. We can actually bypass
this expensive deserialization when inferred source for globally-cached
result is available locally, i.e. when it has been inferred in the same
inference shot.

Note that it would be more efficient to propagate `IRCode` object
directly and skip inflation from `CodeInfo` to `IRCode` as experimented
in #47137, but currently the round-trip through `CodeInfo`-representation
is necessary because it often leads to better CFG simplification and
`cfg_simplify!` seems to be still expensive.
aviatesk added a commit that referenced this pull request Nov 2, 2023
Currently the inlining algorithm is allowed to use inferred source of
const-prop'ed call that is always locally available (since const-prop'
result isn't cached globally). For non const-prop'ed and globally cached
calls, however, it undergoes a more expensive process, making a
round-trip through serialized inferred source. We can actually bypass
this expensive deserialization when inferred source for globally-cached
result is available locally, i.e. when it has been inferred in the same
inference shot.

Note that it would be more efficient to propagate `IRCode` object
directly and skip inflation from `CodeInfo` to `IRCode` as experimented
in #47137, but currently the round-trip through `CodeInfo`-representation
is necessary because it often leads to better CFG simplification and
`cfg_simplify!` seems to be still expensive.
aviatesk added a commit that referenced this pull request Nov 6, 2023
Currently the inlining algorithm is allowed to use inferred source of
const-prop'ed call that is always locally available (since const-prop'
result isn't cached globally). For non const-prop'ed and globally cached
calls, however, it undergoes a more expensive process, making a
round-trip through serialized inferred source. We can actually bypass
this expensive deserialization when inferred source for globally-cached
result is available locally, i.e. when it has been inferred in the same
inference shot.

Note that it would be more efficient to propagate `IRCode` object
directly and skip inflation from `CodeInfo` to `IRCode` as experimented
in #47137, but currently the round-trip through `CodeInfo`-representation
is necessary because it often leads to better CFG simplification and
`cfg_simplify!` seems to be still expensive.
aviatesk added a commit that referenced this pull request Nov 6, 2023
…sult (#51934)

Currently the inlining algorithm is allowed to use inferred source of
const-prop'ed call that is always locally available (since const-prop'
result isn't cached globally). For non const-prop'ed and globally cached
calls, however, it undergoes a more expensive process, making a
round-trip through serialized inferred source.

We can improve efficiency by bypassing the serialization round-trip for
newly-inferred and globally-cached frames. As these frames are never
cached locally, they can be viewed as volatile. This means we can use
their source destructively while inline-expanding them.

The benchmark results show that this optimization achieves 2-4% 
allocation reduction and about 5% speed up in the real-world-ish 
compilation targets (`allinference`).

Note that it would be more efficient to propagate `IRCode` object
directly and skip inflation from `CodeInfo` to `IRCode` as experimented
in #47137, but currently the round-trip through
`CodeInfo`-representation is necessary because it often leads to better
CFG simplification while `cfg_simplify!` being expensive (xref: #51960).
aviatesk added a commit to aviatesk/julia that referenced this pull request Nov 6, 2023
Currently the inlining algorithm is allowed to use inferred source of
const-prop'ed call that is always locally available (since const-prop'
result isn't cached globally). For non const-prop'ed and globally cached
calls, however, it undergoes a more expensive process, making a
round-trip through serialized inferred source. We can actually bypass
this expensive deserialization when inferred source for globally-cached
result is available locally, i.e. when it has been inferred in the same
inference shot.

Note that it would be more efficient to propagate `IRCode` object
directly and skip inflation from `CodeInfo` to `IRCode` as experimented
in JuliaLang#47137, but currently the round-trip through `CodeInfo`-representation
is necessary because it often leads to better CFG simplification and
`cfg_simplify!` seems to be still expensive.
aviatesk added a commit to aviatesk/julia that referenced this pull request Nov 6, 2023
…source

Built on top of JuliaLang#51958, with the improved performance of `cfg_simplify!`,
let's give another try on JuliaLang#47137. Tha aim is to retain
locally cached inferred source as `IRCode`, eliminating the need for the
inlining algorithm to round-trip it through `CodeInfo` representation.
aviatesk added a commit to aviatesk/julia that referenced this pull request Nov 6, 2023
…source

Built on top of JuliaLang#51958, with the improved performance of `cfg_simplify!`,
let's give another try on JuliaLang#47137. Tha aim is to retain
locally cached inferred source as `IRCode`, eliminating the need for the
inlining algorithm to round-trip it through `CodeInfo` representation.
aviatesk added a commit that referenced this pull request Nov 6, 2023
Built on top of #51958, with the improved performance of `cfg_simplify!`,
let's give another try on #47137. Tha aim is to retain
locally cached inferred source as `IRCode`, eliminating the need for the
inlining algorithm to round-trip it through `CodeInfo` representation.
aviatesk added a commit that referenced this pull request Nov 8, 2023
Built on top of #51958, with the improved performance of `cfg_simplify!`,
let's give another try on #47137. Tha aim is to retain
locally cached inferred source as `IRCode`, eliminating the need for the
inlining algorithm to round-trip it through `CodeInfo` representation.
aviatesk added a commit that referenced this pull request Nov 10, 2023
Built on top of #51958, with the improved performance of `cfg_simplify!`,
let's give another try on #47137. Tha aim is to retain
locally cached inferred source as `IRCode`, eliminating the need for the
inlining algorithm to round-trip it through `CodeInfo` representation.
aviatesk added a commit that referenced this pull request Nov 11, 2023
Built on top of #51958, with the improved performance of `cfg_simplify!`,
let's give another try on #47137. Tha aim is to retain
locally cached inferred source as `IRCode`, eliminating the need for the
inlining algorithm to round-trip it through `CodeInfo` representation.
aviatesk added a commit that referenced this pull request Nov 22, 2023
Built on top of #51958, with the improved performance of `cfg_simplify!`,
let's give another try on #47137. Tha aim is to retain
locally cached inferred source as `IRCode`, eliminating the need for the
inlining algorithm to round-trip it through `CodeInfo` representation.
aviatesk added a commit that referenced this pull request Nov 22, 2023
Built on top of #51958, with the improved performance of `cfg_simplify!`,
let's give another try on #47137. Tha aim is to retain
locally cached inferred source as `IRCode`, eliminating the need for the
inlining algorithm to round-trip it through `CodeInfo` representation.
aviatesk added a commit that referenced this pull request Dec 11, 2023
Built on top of #51958, with the improved performance of `cfg_simplify!`,
let's give another try on #47137. Tha aim is to retain
locally cached inferred source as `IRCode`, eliminating the need for the
inlining algorithm to round-trip it through `CodeInfo` representation.
aviatesk added a commit that referenced this pull request Dec 11, 2023
Built on top of #51958, with the improved performance of `cfg_simplify!`,
let's give another try on #47137. Tha aim is to retain
locally cached inferred source as `IRCode`, eliminating the need for the
inlining algorithm to round-trip it through `CodeInfo` representation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants