cranelift/egraphs: allow simplifying trapping arithmetic #5908

jameysharp · 2023-03-02T01:14:18Z

Feature

We currently have simplification rules in opts/algebraic.isle to rewrite x/1 to x, and in opts/cprop.isle to constant-fold division when both operands are constant. But these rules can't fire, because simplify is only called on instructions which never have side effects, and division may trap if the divisor is 0.

I'd like to be able to use simplification rules like these on trapping arithmetic.

Benefit

We could optimize away more instructions. There are comments in algebraic.isle suggesting more desired optimizations that we can't implement today either:

;; TODO: strength reduction: div to shifts
;; TODO: div/rem by constants -> magic multiplications

In addition to replacing expressions with simpler equivalents, we could sometimes remove them entirely if they're unused. We currently can't do that because if the instruction would trap then we need to ensure that trap occurs. But if we have a valid rewrite from a possibly-trapping instruction to an expression which can't trap, then it's actually safe to treat it as a pure instruction.

I suspect this might be particularly valuable for uadd_overflow_trap, which is used in bounds checks for dynamic heaps in cranelift-wasm.

Implementation

One idea is in #5796, but that approach was not selected in #5800 because it "is a little more complex than I would like". I still like the idea of a force instruction to pull otherwise-pure instructions into the side-effecting skeleton, but it's possible some other approach is better.

Alternatives

We could maybe call simplify from the is_mergeable_for_egraph branch of optimize_skeleton_inst, or introduce a new ISLE term for simplifying idempotent instructions.

The text was updated successfully, but these errors were encountered:

cfallin · 2023-03-02T02:01:11Z

Strong +1 to this -- this is the right way to represent idempotent side-effects IMHO. (Doing it the simpler way was a matter of expediency on my part as I wanted to get the optimization we disabled back in, but I fully support reworking it.)

Making side-effecting-but-idempotent ops part of the egraph has potential beyond arithmetic too, IMHO: the simplest case I can think of is actually just trapnz vN where we've rewritten vN to a constant 0. If a trapnz is really a (force (trapnz vN)) and (trapnz 0) is rewritten to 0, or (nop), or something appropriately-typed, then we can elide traps lowered/legalized from other logic as well if it's safe to do so.

jameysharp · 2023-03-02T02:30:47Z

I suspect we want to optimize trapnz the same way we do for branches, whatever that turns out to be. I suspect that will be different than this force mechanism because I think force needs to take a Value operand, and traps/branches have no results.

But definitely, I think it'll be important that we have some story for optimizing conditional traps too!

Note these rules are currently dead, pending resolution of bytecodealliance#5908

jameysharp · 2023-03-27T19:32:46Z

After discussion with @cfallin and @fitzgen today, I think this is our current plan:

Call the simplify ISLE term for instructions which have idempotent side-effects, just like we do for pure instructions.
If any of the equivalent values are produced by pure instructions, then we have some rewrite rule that proves that the side effects can't actually happen. Build the e-class using only the pure alternatives and don't add this instruction to the side-effecting skeleton.
Otherwise, put all the side-effecting alternatives in an e-class and record that this e-class must be evaluated at this point in the side-effecting skeleton if it hasn't already been evaluated at a point which dominates this one.

In earlier discussion we talked about introducing a force CLIF instruction as the way to connect the side-effecting skeleton to an e-class. But here's an alternative that I think we've agreed is reasonable:

For each basic block, keep a list of side-effects. Each side effect is either an Inst or a Value. The last side effect will be the block's terminator Inst.
Stop removing instructions from the Layout during equality saturation. Instead of tracking whether an instruction has side-effects by whether it's still in the layout, use the above explicit list.
During equality saturation, add a side-effect to the list incrementally, after processing a side-effecting instruction.
Between equality saturation and elaboration, clear all instructions out of the Layout.
During elaboration, use the side-effect lists as the depth-first traversal roots, but otherwise treat the Value case as just like any other demanded value.

Elaboration already duplicates a pure instruction if its existing placement doesn't dominate the new place where it's needed, which is exactly what we need for instructions with idempotent side effects too. Given that, GVN is safe for these instructions as well.

Open question: what should happen if a simplify rule creates a side-effecting instruction somewhere other than at top-level of the right-hand side? I think that needs to be prohibited because we wouldn't know there were still side effects somewhere in the expression tree, so we'd treat it as pure. We can use the type system in the same way as #6080 to express this constraint statically.

… option. This PR removes the LICM, GVN, and preopt passes, and associated support pieces, from `cranelift-codegen`. Not to worry, we still have optimizations: the egraph framework subsumes all of these, and has been on by default since bytecodealliance#5181. A few decision points: - Filetests for the legacy LICM, GVN and simple_preopt were removed too. As we built optimizations in the egraph framework we wrote new tests for the equivalent functionality, and many of the old tests were testing specific behaviors in the old implementations that may not be relevant anymore. However if folks prefer I could take a different approach here and try to port over all of the tests. - The corresponding filetest modes (commands) were deleted too. The `test alias_analysis` mode remains, but no longer invokes a separate GVN first (since there is no separate GVN that will not also do alias analysis) so the tests were tweaked slightly to work with that. The egrpah testsuite also covers alias analysis. - The `divconst_magic_numbers` module is removed since it's unused without `simple_preopt`, though this is the one remaining optimization we still need to build in the egraphs framework, pending bytecodealliance#5908. The magic numbers will live forever in git history so removing this in the meantime is not a major issue IMHO. - The `use_egraphs` setting itself was removed at both the Cranelift and Wasmtime levels. It has been marked deprecated for a few releases now (Wasmtime 6.0, 7.0, upcoming 8.0, and corresponding Cranelift versions) so I think this is probably OK. As an alternative if anyone feels strongly, we could leave the setting and make it a no-op.

… option. (#6167) * Cranelift: remove non-egraphs optimization pipeline and `use_egraphs` option. This PR removes the LICM, GVN, and preopt passes, and associated support pieces, from `cranelift-codegen`. Not to worry, we still have optimizations: the egraph framework subsumes all of these, and has been on by default since #5181. A few decision points: - Filetests for the legacy LICM, GVN and simple_preopt were removed too. As we built optimizations in the egraph framework we wrote new tests for the equivalent functionality, and many of the old tests were testing specific behaviors in the old implementations that may not be relevant anymore. However if folks prefer I could take a different approach here and try to port over all of the tests. - The corresponding filetest modes (commands) were deleted too. The `test alias_analysis` mode remains, but no longer invokes a separate GVN first (since there is no separate GVN that will not also do alias analysis) so the tests were tweaked slightly to work with that. The egrpah testsuite also covers alias analysis. - The `divconst_magic_numbers` module is removed since it's unused without `simple_preopt`, though this is the one remaining optimization we still need to build in the egraphs framework, pending #5908. The magic numbers will live forever in git history so removing this in the meantime is not a major issue IMHO. - The `use_egraphs` setting itself was removed at both the Cranelift and Wasmtime levels. It has been marked deprecated for a few releases now (Wasmtime 6.0, 7.0, upcoming 8.0, and corresponding Cranelift versions) so I think this is probably OK. As an alternative if anyone feels strongly, we could leave the setting and make it a no-op. * Update test outputs for remaining test differences.

… option. (bytecodealliance#6167) * Cranelift: remove non-egraphs optimization pipeline and `use_egraphs` option. This PR removes the LICM, GVN, and preopt passes, and associated support pieces, from `cranelift-codegen`. Not to worry, we still have optimizations: the egraph framework subsumes all of these, and has been on by default since bytecodealliance#5181. A few decision points: - Filetests for the legacy LICM, GVN and simple_preopt were removed too. As we built optimizations in the egraph framework we wrote new tests for the equivalent functionality, and many of the old tests were testing specific behaviors in the old implementations that may not be relevant anymore. However if folks prefer I could take a different approach here and try to port over all of the tests. - The corresponding filetest modes (commands) were deleted too. The `test alias_analysis` mode remains, but no longer invokes a separate GVN first (since there is no separate GVN that will not also do alias analysis) so the tests were tweaked slightly to work with that. The egrpah testsuite also covers alias analysis. - The `divconst_magic_numbers` module is removed since it's unused without `simple_preopt`, though this is the one remaining optimization we still need to build in the egraphs framework, pending bytecodealliance#5908. The magic numbers will live forever in git history so removing this in the meantime is not a major issue IMHO. - The `use_egraphs` setting itself was removed at both the Cranelift and Wasmtime levels. It has been marked deprecated for a few releases now (Wasmtime 6.0, 7.0, upcoming 8.0, and corresponding Cranelift versions) so I think this is probably OK. As an alternative if anyone feels strongly, we could leave the setting and make it a no-op. * Update test outputs for remaining test differences.

fitzgen · 2024-09-28T22:32:07Z

I'd love to get to this one day.

My Wasm GC work is producing two patterns that require implementing the optimization of trapping arithmetic to get good code:

(uadd_overflow_trap (iconst a) (iconst b)) should turn into either an unconditional trap, or more likely, into (iconst a+b)
(uadd_overflow_trap (uextend a) (uextend b)) should turn into (iadd a b)

FWIW, these patterns can be matched and optimized during lowering, where we don't distinguish between side effectful vs non-side effectful lowerings. For example the Pulley backend optimizes the lowering of (trap{z,nz} (iconst ...)):

wasmtime/cranelift/codegen/src/isa/pulley_shared/lower.isle

Lines 98 to 114 in 5c8cd22

    
           (rule 2 (lower (trapz (iconst (u64_from_imm64 (u64_nonzero _))) code)) 
        
                 (output_none)) 
        
           (rule 2 (lower (trapnz (iconst (u64_from_imm64 0)) code)) 
        
                 (output_none)) 
        
           ;; TODO: These rules are disabled because they insert a block terminator into 
        
           ;; the middle of the current block, which leads to regalloc errors. We should 
        
           ;; ideally be able to lower conditional traps that will always trap into 
        
           ;; unconditional traps though. This isn't very high priority though because 
        
           ;; traps, pretty much by definition, are not hot paths. 
        
           ;; 
        
           ;; (rule 3 (lower (trapnz (iconst (u64_from_imm64 (u64_nonzero _))) code)) 
        
           ;;       (side_effect (pulley_trap code))) 
        
           ;; 
        
           ;; (rule 3 (lower (trapz (iconst (u64_from_imm64 0)) code)) 
        
           ;;       (side_effect (pulley_trap code)))

However, we still can't insert block terminators early into the block (which should presumably truncate the rest of the block's lowering) so we can't turn (trapz (const 0)) into (trap), only (trapz (iconst (non_zero ...))) into the empty instruction sequence.

Additionally, when performed at lowering time, these rules' results cannot feed into other rules to create cascading optimizations.

jameysharp added cranelift:goal:optimize-speed Focus area: the speed of the code produced by Cranelift. cranelift:mid-end clif-to-clif related passes, legalizations, etc... labels Mar 2, 2023

Kmeakin added a commit to Kmeakin/wasmtime that referenced this issue Mar 17, 2023

cranelift: rewrite urem of constants

1f98223

Note these rules are currently dead, pending resolution of bytecodealliance#5908

Kmeakin mentioned this issue Mar 17, 2023

cranelift: rewrite urem of constants #6048

Closed

Kmeakin added a commit to Kmeakin/wasmtime that referenced this issue Mar 17, 2023

cranelift: rewrite urem of constants

e7bf0fa

Note these rules are currently dead, pending resolution of bytecodealliance#5908

This was referenced Mar 20, 2023

Port divide-by-constant magic number optimizations from simple_preopt to ISLE mid-end #6049

Open

cranelift/egraphs: Allow rewriting to unconditional trap #6080

Open

Redefine select_spectre_guard #5206

Open

This was referenced Mar 27, 2023

cranelift/egraphs: Optimize instructions without results #6106

Open

cranelift: Integrate "remove constant phis" pass into egraph pass #6109

Open

Branch folding #5851

Open

cfallin mentioned this issue Apr 6, 2023

Cranelift: remove non-egraphs optimization pipeline and use_egraphs option. #6167

Merged

afonso360 mentioned this issue Aug 9, 2023

Cranelift: Optimize op+splat into splat+op in the mid-end #6828

Closed

fitzgen mentioned this issue Apr 25, 2024

div to shift for positive power of 2 divs #8474

Closed

fitzgen mentioned this issue Oct 1, 2024

List of future optimizations for Wasm GC performance #9351

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cranelift/egraphs: allow simplifying trapping arithmetic #5908

cranelift/egraphs: allow simplifying trapping arithmetic #5908

jameysharp commented Mar 2, 2023

cfallin commented Mar 2, 2023

jameysharp commented Mar 2, 2023

jameysharp commented Mar 27, 2023

fitzgen commented Sep 28, 2024 •

edited

Loading

cranelift/egraphs: allow simplifying trapping arithmetic #5908

cranelift/egraphs: allow simplifying trapping arithmetic #5908

Comments

jameysharp commented Mar 2, 2023

Feature

Benefit

Implementation

Alternatives

cfallin commented Mar 2, 2023

jameysharp commented Mar 2, 2023

jameysharp commented Mar 27, 2023

fitzgen commented Sep 28, 2024 • edited Loading

fitzgen commented Sep 28, 2024 •

edited

Loading