Cranelift: avoid integer-shift optimization with invalid reduces on vector types. #10413
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
An optimization rule in our mid-end implemented the following equivalence:
where
T1
is a smaller integer type andT2
is a larger integer type, andN
is the difference in bitwidths between them. In other words, when a left-then-right-shift clears the upper bits (or sign-extends them), we can implement that by reducing to a smaller type then extending back to the original type.Unfortunately,
ireduce
is not valid on vector-of-integer types. A fuzzbug reported in #10409 shows that this pattern results in invalid CLIF when optimizing this pattern with ani64x2
as input.This PR tightens the rule's LHS to only apply to integer types.
Note that there is another optimization rule that can also implement this behavior with a bitwise AND with a mask. Previously the two resulting expressions had the same cost in the egraph, and whatever perturbation occurred in the ISLE compilation of the rule (which is a multi-constructor because this is the mid-end, so is returning multiple results) caused that rule's result to be picked instead during extraction. I've also tweaked the costs to make reduces/extends cheaper than ordinary arithmetic as a result, to keep the same optimizer outputs after extraction. One x64 test in particular showed why this is correct: the mask-with-AND resulted in two instructions (move, AND-with-immediate) while a uextend can be done with one (zero-extending move,
movzbl
), and aarch64 has similar builtin extends in many cases.Fixes #10409.