Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISLE: rewrite and/or of icmp #6095

Merged
merged 3 commits into from
Mar 29, 2023
Merged

Conversation

Kmeakin
Copy link
Contributor

@Kmeakin Kmeakin commented Mar 23, 2023

Adds ISLE rewrites for logical combinations of icmp

(band ty (icmp ty cc1 x y) (icmp ty cc2 x y)):

band eq ne ult ule ugt uge slt sle sgt sge
eq eq false false eq false eq false eq false eq
ne false ne ult ult ugt ugt slt slt sgt sgt
ult false ult ult ult false false N/A N/A N/A N/A
ule eq ult ult ule false eq N/A N/A N/A N/A
ugt false ugt false false ugt ugt N/A N/A N/A N/A
uge eq uge false eq ugt uge N/A N/A N/A N/A
slt false slt N/A N/A N/A N/A slt slt false false
sle eq slt N/A N/A N/A N/A slt sle false eq
sgt false sgt N/A N/A N/A N/A false false sgt sgt
sge eq sgt N/A N/A N/A N/A false eq sgt sge

(bor ty (icmp ty cc1 x y) (icmp ty cc2 x y)):

bor eq ne ult ule ugt uge slt sle sgt sge
eq eq true ule ule uge uge sle sle sge sge
ne true ne ne true ne true ne true ne true
ult ule ne ult ule ne true N/A N/A N/A N/A
ule ule true ule ule true true N/A N/A N/A N/A
ugt uge ne ne true ugt uge N/A N/A N/A N/A
uge uge true true true uge uge N/A N/A N/A N/A
slt sle ne N/A N/A N/A N/A slt sle ne true
sle sle true N/A N/A N/A N/A sle sle true true
sgt sge ne N/A N/A N/A N/A ne true sgt sge
sge sge true N/A N/A N/A N/A true true sge sge

Tests were generated by a script

@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label Mar 23, 2023
@Kmeakin Kmeakin marked this pull request as ready for review March 23, 2023 20:21
@Kmeakin Kmeakin changed the title [WIP] ISLE: rewrite and/or of icmp ISLE: rewrite and/or of icmp Mar 23, 2023
@Kmeakin Kmeakin marked this pull request as draft March 23, 2023 21:12
@Kmeakin Kmeakin force-pushed the icmp-and-or branch 2 times, most recently from 5b7f8af to 0bb6e69 Compare March 23, 2023 23:38
@Kmeakin Kmeakin marked this pull request as ready for review March 23, 2023 23:49
@jameysharp
Copy link
Contributor

I think we can factor these rules in a different way to reduce a bunch of the duplication. But I haven't figured out the details. I was fiddling with ideas like the following, though I'm not sure if this actually turns out any shorter:

(decl pure partial eq_implies (Value Value Value) bool)
(rule (eq_implies x y (eq x y)) $true)
(rule (eq_implies x y (eq y x)) $true)
(rule (eq_implies x y (ule x y)) $true)
(rule (eq_implies x y (uge x y)) $true)
(rule (eq_implies x y (sle x y)) $true)
(rule (eq_implies x y (sge x y)) $true)
(rule (eq_implies x y (ne x y)) $false)
(rule (eq_implies x y (ne y x)) $false)
(rule (eq_implies x y (ult x y)) $false)
(rule (eq_implies x y (ugt x y)) $false)
(rule (eq_implies x y (slt x y)) $false)
(rule (eq_implies x y (sgt x y)) $false)

(decl pure partial ult_implies (Value Value Value) bool)
(rule (ult_implies x y (eq x y)) $false)
(rule (ult_implies x y (eq y x)) $false)
(rule (ult_implies x y (ne x y)) $true)
(rule (ult_implies x y (ne y x)) $true)
(rule (ult_implies x y (ult x y)) $true)
(rule (ult_implies x y (ule x y)) $true)
(rule (ult_implies x y (uge x y)) $false)
(rule (ult_implies x y (ugt x y)) $false)

(rule (simplify (band ty (eq x y) other))
  (if-let $true (eq_implies x y other))
  (subsume (iconst ty 1)))

(rule (simplify (band ty other (eq x y)))
  (if-let $true (eq_implies x y other))
  (subsume (iconst ty 1)))

(rule (simplify (band ty (eq x y) other))
  (if-let $false (eq_implies x y other))
  (subsume (iconst ty 0)))

@Kmeakin
Copy link
Contributor Author

Kmeakin commented Mar 24, 2023

I think we can factor these rules in a different way to reduce a bunch of the duplication. But I haven't figured out the details. I was fiddling with ideas like the following, though I'm not sure if this actually turns out any shorter:

I think this would work out to being basically the same length. We could remove half the cases by adding rules to make band and bor commutative. Not sure if that would lead to a performance penalty though.

@jameysharp
Copy link
Contributor

I've been unable to let go of this and just enjoy my weekend, so:

Can we do it this way? Decompose both condition codes into separate "less than", "equal to", and "greater than" booleans, then pairwise apply "and"/"or" to the booleans according to whether the comparisons were combined with band or bor, and finally compose the booleans back into a condition code.

(decl pure decompose_intcc (IntCC) u64)
(rule (decompose_intcc IntCC.Equal) 1)
(rule (decompose_intcc IntCC.UnsignedLessThan) 2)
(rule (decompose_intcc IntCC.UnsignedLessThanOrEqual) 3)
(rule (decompose_intcc IntCC.UnsignedGreaterThan) 4)
(rule (decompose_intcc IntCC.UnsignedGreaterThanOrEqual) 5)
(rule (decompose_intcc IntCC.NotEqual) 6)

(decl pure partial compose_intcc (u64) IntCC)
; inverse of `decompose_intcc`

(rule (simplify (band ty (icmp ty cc1 x y) (icmp ty cc2 x y)))
  (if-let cc (compose_intcc (u64_and (decompose_intcc cc1) (decompose_intcc cc2))))
  (icmp ty cc x y))
(rule (simplify (band ty (icmp ty cc1 x y) (icmp ty cc2 x y)))
  (if-let 0 (u64_and (decompose_intcc cc1) (decompose_intcc cc2)))
  (subsume (iconst ty (imm64 0))))

(rule (simplify (bor ty (icmp ty cc1 x y) (icmp ty cc2 x y)))
  (if-let cc (compose_intcc (u64_or (decompose_intcc cc1) (decompose_intcc cc2))))
  (icmp ty cc x y))
(rule (simplify (bor ty (icmp ty cc1 x y) (icmp ty cc2 x y)))
  (if-let 7 (u64_or (decompose_intcc cc1) (decompose_intcc cc2)))
  (subsume (iconst ty (imm64 1))))

That doesn't handle signedness but I think fixing that shouldn't be too much more involved.

Adding the cases where one of the comparisons has its operands flipped (icmp ty cc2 y x) should be easy to handle this way too; we just need to use intcc_reverse on cc2.

Also I think this extends in the obvious way to xor, which may not be important but seems nice to have.

@jameysharp
Copy link
Contributor

The signedness condition can work in a similar way:

(decl pure intcc_class (IntCC) u64)
(rule (intcc_class IntCC.UnsignedLessThan) 0b01)
(rule (intcc_class IntCC.UnsignedLessThanEqual) 0b01)
(rule (intcc_class IntCC.UnsignedGreaterThan) 0b01)
(rule (intcc_class IntCC.UnsignedGreaterThanEqual) 0b01)
(rule (intcc_class IntCC.SignedLessThan) 0b10)
(rule (intcc_class IntCC.SignedLessThanEqual) 0b10)
(rule (intcc_class IntCC.SignedGreaterThan) 0b10)
(rule (intcc_class IntCC.SignedGreaterThanEqual) 0b10)
(rule (intcc_class IntCC.Equal) 0b11)
(rule (intcc_class IntCC.NotEqual) 0b11)

(decl pure intcc_comparable (IntCC IntCC) Unit)
(rule (intcc_comparable x y)
  (if-let (u64_nonzero _) (u64_and (intcc_class x) (intcc_class y)))
  (unit))

Then add (if-let _ (intcc_comparable cc1 cc2)) to the simplify rules that I wrote out earlier.

The decompose_intcc rules might be easier to understand with binary literals instead of decimal, too: Equal is 0b001, LessThan is 0b010, LessThanOrEqual is 0b011, etc. At the least all of this needs comments identifying which bit means what; I just wanted to write it out quickly to illustrate the idea.

@cfallin
Copy link
Member

cfallin commented Mar 26, 2023

That's a really incredibly neat way of seeing the rewrite!

I was trying to work out for myself why it should work, since it's non-obvious (at least to me!) why AND of the bitfield representation should be AND of the icmp results. I think both OR and AND work because of the following (let's take just unsigneds for now):

  • An IntCC condition is a disjunction (OR) of up to three "fundamental" comparisons: e.g. UnsignedLessThanOrEqual is logically LessThan OR Equal.
  • As such, your three-bit encoding could be seen as a Boolean polynomial that describes the icmp result. Take x := LT, y := EQ, z: GT, then one might have icmp(..) = x + y (011) or icmp(..) = x + z (101) or just icmp(..) = y (010).
  • The OR of the icmp results is then the OR of these polynomials, and Boolean logic tells us that they add in exactly the way that the OR of the bitfield representation would give us. So if result = icmp(cc1, ..) + icmp(cc2, ..) and we have the substitutions result = (x + y) + (x + z), then we get result = x + y + z and convert that back into a condition code.
    • Incidentally, x + y + z here reduces to constant true which corresponds to your special-case rule for 7!
  • The AND is where it gets interesting! This looks like: result = icmp(cc1, ..) * icmp(cc2, ..) = (x + y) * (x + z), and AND distributes over OR so we can do the binomial expansion x*x + x*z + y*x + y*z, then use identities (x AND x = x) and a fact about our inputs, namely that x, y, and z are disjoint (x AND z = 0, y AND z = 0, x AND y = 0), to realize that the simplified product x is exactly the sum of terms that appear in both sides of the product. That's the bitwise AND.

The above line of reasoning doesn't quite work directly for XOR (XOR does not distribute over OR in general) but I think given the disjointness property one could argue that it does in this case.

@jameysharp
Copy link
Contributor

That's a nice way to prove it! I was thinking about it differently, by conditioning on the actual relationship between the inputs. For example, assume that x is less than y. Then and-of-icmp is true iff both icmp conditions include less-than. Similar reasoning applies to the case where x is equal to y, for any other boolean operator, and so on.

On further thought I think there are even more concise ways to factor this:

(decl pure partial intcc_comparable (IntCC IntCC) bool)
(rule (intcc_comparable x y)
  (if-let (u64_nonzero class) (u64_and (intcc_class x) (intcc_class y)))
  (u64_eq 0b10 class))

(decl pure compose_icmp (Type u64 bool Value Value) Value)
(rule (compose_icmp ty 0 _ _ _) (subsume (iconst ty (imm64 0))))
(rule (compose_icmp ty 1 _ x y) (icmp ty IntCC.Equal x y))
(rule (compose_icmp ty 2 $false x y) (icmp ty IntCC.UnsignedLessThan x y))
(rule (compose_icmp ty 2 $true x y) (icmp ty IntCC.SignedLessThan x y))
; ...
(rule (compose_icmp ty 6 _ x y) (icmp ty IntCC.NotEqual x y))
(rule (compose_icmp ty 7 _ _ _) (subsume (iconst ty (imm64 1))))

(rule (simplify (band ty (icmp ty cc1 x y) (icmp ty cc2 x y)))
  (if-let signed (intcc_comparable cc1 cc2))
  (compose_icmp ty (u64_and (decompose_intcc cc1) (decompose_intcc cc2)) signed x y))

(rule (simplify (bor ty (icmp ty cc1 x y) (icmp ty cc2 x y)))
  (if-let signed (intcc_comparable cc1 cc2))
  (compose_icmp ty (u64_or (decompose_intcc cc1) (decompose_intcc cc2)) signed x y))

@Kmeakin
Copy link
Contributor Author

Kmeakin commented Mar 28, 2023

Thank you @jameysharp and @cfallin. I have updated the patch with your contributions

@jameysharp
Copy link
Contributor

Nice! This means the changes to prelude.isle and the various corresponding Rust sources are no longer necessary, right?

I meant to ask before: could you commit the script you used to generate the tests too? We have precedent for that in cranelift/filetests/filetests/wasm/load-store/make-load-store-tests.sh.

Also, I guess now we need to either merge #6111 before finishing this PR, or decide we want these tests to not check the full output and back out the precise-output changes. I haven't thought about this enough to have an opinion on that question yet.

@Kmeakin
Copy link
Contributor Author

Kmeakin commented Mar 28, 2023

Nice! This means the changes to prelude.isle and the various corresponding Rust sources are no longer necessary, right?

Good point, I forgot to remove those

I meant to ask before: could you commit the script you used to generate the tests too? We have precedent for that in cranelift/filetests/filetests/wasm/load-store/make-load-store-tests.sh.

I used a rust program but I can convert it to a shell script

Also, I guess now we need to either merge #6111 before finishing this PR, or decide we want these tests to not check the full output and back out the precise-output changes. I haven't thought about this enough to have an opinion on that question yet.

I would prefer to wait for #6111

@jameysharp
Copy link
Contributor

You can remove the changes in cranelift/codegen/src/ir/condcodes.rs too, right?

And the changes in cranelift/codegen/src/isle_prelude.rs aren't necessary; I think I prefer the existing way. If somebody added a condition code for some reason, using a wildcard match here hides potential bugs.

Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, let's merge this!

Here's some future work I'd like to see, but I'm not going to delay this PR further for any of this:

  • Comments explaining why decompose/compose work the way they do, and what intcc_comparable is doing. Right now I know why this is correct but I'll probably be confused when I look at this in a month, even though it was my suggestion.
  • Flipped versions of the rules for when icmp x y is combined with icmp y x. That's now just one extra rule for band plus one extra rule for bor, to handle all possible commutative cases.
  • Rules for bxor, because we might as well.
  • Potentially, rules matching ne of two icmps, which is equivalent to bxor in this context; similarly for eq.

@jameysharp jameysharp added this pull request to the merge queue Mar 29, 2023
Merged via the queue into bytecodealliance:main with commit dcf0ea9 Mar 29, 2023
@Kmeakin Kmeakin deleted the icmp-and-or branch January 30, 2024 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants