Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
👋 Hey,
This PR Implements SIMD
icmp
for RISC-V. These rules are implemented as a combination of two steps, mask generation and mask expansion. Our comparison rules only return their results as a mask register, so we need to expand the mask into lane sized elements.We have 20 (!) comparison instructions, nearly the full table of all IntCC codes in VV, VX and VI formats. However there are some holes in this table.
They are:
vmsltu.vi
(Less than Unsigned (Vec-Imm))vmslt.vi
(Less than (Vec-Imm))vmsgtu.vv
(Greater than Unsigned (Vec-Vec))vmsgt.vv
(Greater than (Vec-Vec))vmsgeu.*
(Greater than or equal Unsigned (All formats))vmsge.*
(Greater than or equal (All formats))Most of these can be replaced with the inverted IntCC instruction. To minimize the size of this initial PR I've only implemented rules for the opcodes that we have a direct translation.
However, in order to get all IntCC's working I've implemented some of the inverted instruction which are
vmsgtu.vv
,vmsgt.vv
,vmsgeu.vv
,vmsge.vv
. These are implemented as alias to their inverted counterparts (with the inputs swapped).I'm planning on adding a follow up commit with the rest of the VX and VI rules in both the LHS an RHS sides. We should end up with 5 rules per IntCC once this is all done.
I've split the actual mask expansion into it's own separate rule since we are going to need it for the
fcmp
rules as well.The instruction selection for
icmp
is on a separate rule simply because the rules end up less verbose than if they were inlined directly into theicmp
rule.