-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semi-concrete IR interpreter #44803
Semi-concrete IR interpreter #44803
Conversation
66b45df
to
6ff6eb8
Compare
@nanosoldier |
@nanosoldier |
The build failed -- re-requesting PkgEval won't fix that.
|
Ah yeah, glanced at it late last night and thought it might've been a CI thing so asked Shuhei to re-run. Thanks for the logs |
9576635
to
96e4731
Compare
0f669e2
to
53cc7cd
Compare
The failing test case will be fixed by #44896. |
@nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
Let's do a re-run of failed tests to double check for real ones: @nanosoldier |
Not definitive, but just wanted to make this list of test failures that look real at first glance Inference accuracy regressions: Allocation regressions: Possibly need upstream changes: Other: The unreachable instruction ones are also real as they turned up in the PkgEval on the original PR, but hopefully are all from the same issue... |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
cac1d83
to
ed737ad
Compare
@nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
cf35245
to
9d5a93a
Compare
This will fail some abstract unionsplit tests due to 4874755 |
@nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
9d5a93a
to
52ab1f5
Compare
if isa(linfo, InferenceResult) | ||
ecache = get(interp.cache, linfo, nothing) | ||
elseif isa(linfo, SemiConcreteResult) | ||
ecache = get(interp.cache, linfo, nothing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This never returns cache (as interp.cache
is IdDict{InferenceResult,EscapeCache}
object). We should tweak callsites of get_escape_cache
instead.
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
ir = codeinst_to_ir(interp, code) | ||
if isa(ir, IRCode) | ||
T = ir_abstract_constant_propagation(interp, mi_cache, sv, mi, ir, arginfo.argtypes) | ||
if !isa(T, Type) || typeintersect(T, Bool) === Union{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to have this check?
# All other effects already guaranteed effect free by construction | ||
if is_nothrow(effects) | ||
ir.stmts[idx][:flag] |= IR_FLAG_EFFECT_FREE | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to handle this here rather than letting the optimizer to annotate this as like the regular absint->optimization chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just like to generally move towards inference annotating this information if it has it. We're already doing all the analysis, so doing it over again in the optimizer is wasteful. That said, this branch is dead anyway, since we're currently requiring nothrow, but I'll plan to address that after merge.
Is this still a consideration? I'm still not sure why abstract interpretation on |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
We would lose the ability to further constrain and refine the value further, since we don't allow SSAValues to be used in in the Conditional type. But as Ian mentioned, it is somewhat unlikely we would consider the program to be |
bdd0d25
to
4382c9b
Compare
@nanosoldier @nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
0b7ac1a
to
14382bb
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
14382bb
to
3557af8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, let's get this in. There some follow-up cleanup that I have to do, but let's get this in first.
Original PR description:
Background
in #43852, the compiler learned how to make use of the codegen'd version of a function in order to speed up constant propagation by 1000x, assuming two conditions are fulfilled:
All arguments are known to be constant at inference time
The evaluation is legal according to appropriate effect conditions
This mechanism works well, but leaves some very sharp performance edges in inference. For example, adding an unused argument to an otherwise consteval-eligible function will fall back to inference-based constant propagation and thus incur the 1000x penalty (even though the argument is entirely unused).
This PR attempts to address this by adding a new middle-of-the-road fallback that sits between the super fast constant, concrete evaluation from #43852 and the original inference based constant propagation.
This PR
The core idea is to use inference's existing abstract interpretation lattice, but rather than performing abstract interpretation over untyped source, we perform it over the typed/optimized julia IR that the non-constant inference pass produced.
Now, this is not legal to do in general, but the
:consistent
and:terminates_globally
effects from #43852, do ensure legality (in the currently implementation we additionally require:effect_free
for consistency with consteval, but that could be easily relaxed).Why is this a good idea
There are several benefits to performing abstract evaluation over optimized IR rather than untyped IR:
We do not need to perform any method lookups and thus can also avoid inference's relatively complicated tracking of potential recursions. Method lookups were performed during the non-constant inference pass and the
:termiantes_globally
effect ensures the absence of recursion (at least when proven by the compiler - the property I documented for manual annotation is slightly too weak than what we need, so we may need to split out these effects).Optimized IR is likely to be smaller (in terms of total number of instructions that need to be visited for a given set of input arguments) because some optimizations/simplification/DCE have been performed.
Optimized IR does not have any slots in it, so we avoid Very WIP: Refactor core inference loops to use less memory #43999-like problems around inference memory usage.
The same optimized IR can be re-used many times for different sets of constants.
Threats to validity
Now, there are some challenges also. For one, it is not guaranteed that this will have the same precision as the inference-based constant propagation. I've done some testing here and precision is decent, but there are some cases where precision is worse than the inference-based constant propagation:
In the current implementation, we never perform any method matching, instead relying on methods annotated in
:invoke
statements. This mostly works, but there are certain situations (e.g. the cases that optimizer: inline abstract union-split callsite #44512 is supposed to fix), where we currently cannot produce:invoke
, even though inference was able to compute a full method match. My preferred solution to this is to extend the semantics of:invoke
to always make it possible to generate an:invoke
node when the full method match set is known (even if codegen does not make use of it). However, I haven't really seen this be much of a problem in my testing.There are certain branching patterns that would require the insertion of additional Pi nodes in order to avoid overapproximation of the lattice elements. This is one of the reasons that we currently don't do regular inference on SSA IR (the other being debuggability of lowered code). We've been thinking about some solutions to this, but this particular case is somewhat simpler than inference because 1) the branching patterns that inference can prove
:consistent
are limited anyway 2) regular inference may have already inserted the requisite PiNodes in which case there isn't a problem. In particular, in my testing I haven't actually seen any case of precision regressions due to this issue.Benchmark results
The benchmarks for this are looking very attractive. On my ongoing inference torture test (https://gist.github.com/Keno/5587fbfe89bff96c863c8eeb1477defa), this gives an additional 2x improvement in inference time on top of the improvements already merged on master and those pending in #44447 and #44494. I should note that this is total inference time. By the nature of the benchmark, it basically performs inference and constant propagation once for each specialized call site. The 2x improvement here essentially comes down to the constant-propagation portion of that time dropping down to noise level.
The performance is even better on the real world benchmark that motivated https://gist.github.com/Keno/5587fbfe89bff96c863c8eeb1477defa. There constant propagation is performed multiple times with different constants for each non-constant inference, so the performance improvement is proportionally better, to the point where inference time is no longer dominating in that benchmark (down to about 30s from several hours in early January before all the recent compile-time perf improvements).
Current status
This passes tests for me locally, but I'm convinced that issues will show up during PkgEval. The re-use of the abstract interpreter pieces is also quite janky and should be factored better.