Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simd_extmul_* support for x64 #3084

Merged
merged 3 commits into from
Jul 15, 2021

Conversation

jlb6740
Copy link
Contributor

@jlb6740 jlb6740 commented Jul 14, 2021

No description provided.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen cranelift:wasm labels Jul 14, 2021
@jlb6740 jlb6740 marked this pull request as ready for review July 14, 2021 02:29
Copy link
Contributor

@abrown abrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me with the minor comments about inlining and reordering the lowerings. @akirilov-arm, the code_translator.rs part should look OK for the aarch64 backend, right?

insn: swiden1_high,
input: 0,
},
];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why swiden_input is needed: couldn't the InsnInputs be created inline?

dst,
));
}
_ => panic!("Unsupported extmul_low_signed type"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I thought that this matching was too greedy--that it would match sequences that it shouldn't be and then crash because the types are wrong. I think now that it is OK because swiden_high only really allows the following types: I8X16, I16X8, I32X4. I think it might be worthwhile to mention this assumption in some comments at the top so that future readers aren't confused.

v0 = swiden.i64

@@ -1662,7 +1662,348 @@ fn lower_insn_to_regs<C: LowerCtx<I = Inst>>(

Opcode::Imul => {
let ty = ty.unwrap();
if ty == types::I64X2 {

// First check for ext_mul_* instructions. Where possible ext_mul_* lowerings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the order of the lowerings should be reversed: scalar lowerings first, then vector lowerings, then pattern matching lowerings. As it stands in this PR, every imul.i64 is going to require a bunch of matching calls and type comparisons before it ever gets lowered. I think scalar multiplication is going to be the most common, then plain vector multiplication, so I would think we would want to make those common paths shorter. (Not sure exactly how much this affects compile time but hopefully you see what I mean).

Copy link
Contributor Author

@jlb6740 jlb6740 Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure .. I agree. That was my original thought but I never reverted the initial implementation. Upon looking at this again I see why I naturally put the extmul first. It is because when we share all these instructions with imul the current checks such as types::I64x2, and ty.lane_count() > 1 are entered before we have a chance to check for an opcode source. I think the best we can do is have a matches_input_any at the top and have a branch from there. That is what I will do before merging. I supposed if there is another/better solution we can refactor later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note, the code must remain at the top as the fall through target since I am using "if let Some() .." to check for op sources and there is no if not let Some() ..". Still there is just one branch to get to the other lowerings.

@akirilov-arm
Copy link
Contributor

@abrown I will let @sparker-arm comment because he has done the same changes in PR #3070.

@jlb6740 jlb6740 requested a review from abrown July 15, 2021 07:31
@jlb6740 jlb6740 merged commit 2452a4c into bytecodealliance:main Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:x64 Issues related to x64 codegen cranelift:wasm cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants