x64: lower iabs.i64x2 using a single AVX512 instruction when possible #2819

abrown · 2021-04-08T17:45:12Z

This introduces the mechanism for encoding EVEX instructions to the new backend (ported with slight changes from the old) and then uses it to improve the lowering of iabs.i64x2 from 5 instructions to 1 instruction (i.e. VPABSQ).

abrown · 2021-04-08T17:57:02Z

I'm introducing this PR as a draft to solicit opinions about how best to integrate this:

any preference to switch to a builder pattern (e.g. Evex::new().v128().prefix(...).map(...).reg(...).rm(...).w(true).opcode(0x1F).encode() matching the manual's syntax, EVEX.128.66.0F38.W1 1F /r) over the current encode_evex(...)?
a better integration with Inst than XmmUnaryRmREvex { op: Avx512Opcode, ...}? I cannot just add a boolean evex field to XmmUnaryRmR because some AVX512 instructions (e.g. VPABSQ) don't exist so they should not be included in SseOpcode. But the XmmUnaryRmREvex approach is not going to scale to a bunch of other instructions well.

bnjbvr · 2021-04-12T09:24:18Z

any preference to switch to a builder pattern (e.g. Evex::new().v128().prefix(...).map(...).reg(...).rm(...).w(true).opcode(0x1F).encode() matching the manual's syntax, EVEX.128.66.0F38.W1 1F /r) over the current encode_evex(...)?

Yeah, this could be nice! If the flags don't have any relationship with each other, it might be possible to pass a &sink to the new() function and encode it as we call the builder functions. Alternatively, deferring it all and building in encode(&sink) sounds good too! And if this works out well, we could refactor a bit the previous encode functions; their API really looks a bit too C-ish.

a better integration with Inst than XmmUnaryRmREvex { op: Avx512Opcode, ...}? I cannot just add a boolean evex field to XmmUnaryRmR because some AVX512 instructions (e.g. VPABSQ) don't exist so they should not be included in SseOpcode. But the XmmUnaryRmREvex approach is not going to scale to a bunch of other instructions well.

Why wouldn't this scale well, out of curiosity? In general I think we should tend to APIs that make it impossible to create invalid instructions. It might not be the case for existing APIs, which should be refactored ideally...

abrown · 2021-04-13T18:58:25Z

@bnjbvr, the latest commit moves to the builder pattern. It's a hybrid of your idea: the EVEX 4-byte prefix is built up as the builder methods are called but we still need to call .encode(sink) to emit later because some methods, e.g. .reg(...), modify both the prefix and the ModR/M byte, which are emitted at different times.

Being a bit paranoid about performance, I benchmarked the encode_evex function approach against the EvexInstruction builder approach by encoding the same instruction repeatedly inside bencher. For some reason, the builder approach turns out to be faster:

test isa::x64::inst::encoding::evex::tests::encode_with_function ... bench:          17 ns/iter (+/- 0
test isa::x64::inst::encoding::evex::tests::encode_with_builder  ... bench:           6 ns/iter (+/- 0)

I didn't dig too deep into this since @cfallin mentioned that emission is hardly the long pole in the tent. I was happy enough that this builder approach was not slower and left it at that. I think this is OK to review and merge understanding that there are still pieces coming (e.g. this only supports reg-reg addressing).

Why wouldn't this scale well, out of curiosity?

I mean mental scaling, not codegen performance or anything like that (some of us operate on limited brain RAM). Finding the right Inst variant when adding a new instruction can be tricky: "so this is unary, right, so I'm going to use that... no wait, I need to use the Xmm form... no, hold on, this is AVX512 so I need to find the Evex form of that." It's a developer experience thing more than anything. I agree that we should restrict the inputs somehow to only generate valid instructions but this "which Inst variant?" question seems a bit different.

abrown · 2021-04-13T19:05:05Z

cranelift/codegen/src/isa/x64/inst/emit_tests.rs

@@ -4276,6 +4282,7 @@ fn test_x64_emit() {
    let mut isa_flag_builder = x64::settings::builder();
    isa_flag_builder.enable("has_ssse3").unwrap();
    isa_flag_builder.enable("has_sse41").unwrap();
+    isa_flag_builder.enable("has_avx512f").unwrap();


I wish there were a way to avoid this here and instead specify it up above when necessary--suggestions welcome because I can't think of a great way to do this without rewriting this entire file.

Instead of having a single array with all the testing tuples, we could have several ones, one for each Flag combination we'd like to test. Then we could refactor this function so that it takes flags from a parameter and an array of test tuples, or something like this?

Also, maybe not something to worry too much about, since there's no actual instruction selection test here. What was your concern?

Yeah, I guess that's true. It just looked weird to lump the AVX512 flag in with the baseline flags. Maybe in a follow-on PR I will try to factor out the "check these instructions" code so that I can do as you suggest above.

abrown · 2021-04-14T16:09:50Z

@bnjbvr, can you take another look at this when you get a chance?

bnjbvr · 2021-04-14T16:20:20Z

@bnjbvr, can you take another look at this when you get a chance?

Yep, happy to take a look in the next few days!

bnjbvr

I'm fine with the Builder approach. I'd like to note that while there's nothing preventing from writing the same bits twice in a row, that's actually not an issue because it'll just overwrite them instead.

It'd be nice if we could avoid an additional CodeSink trait, since we already have one. I only skimmed the encodings bit twiddling; as long as there are tests that validate that the right encoding is generated, this patch looks fine!

bnjbvr · 2021-04-15T14:37:11Z

cranelift/codegen/src/isa/x64/inst/emit_tests.rs

@@ -4276,6 +4282,7 @@ fn test_x64_emit() {
    let mut isa_flag_builder = x64::settings::builder();
    isa_flag_builder.enable("has_ssse3").unwrap();
    isa_flag_builder.enable("has_sse41").unwrap();
+    isa_flag_builder.enable("has_avx512f").unwrap();


Instead of having a single array with all the testing tuples, we could have several ones, one for each Flag combination we'd like to test. Then we could refactor this function so that it takes flags from a parameter and an array of test tuples, or something like this?

cranelift/codegen/src/isa/x64/inst/encoding/evex.rs

bnjbvr · 2021-04-15T14:47:40Z

cranelift/codegen/src/isa/x64/inst/encoding/evex.rs

+
+#[derive(Copy, Clone, Default)]
+pub struct Register(u8);
+impl From<u8> for Register {


So the way it works right now is that it goes from the RealReg hardware encoding into an u8 then into this Register structure. Could instead the methods taking Register take a RealReg instead?

I was under the impression that Rust will pare away those abstractions so I didn't worry too much about overhead. I am trying to keep this encoding code dependency-free (especially of external dependencies) so, like with CodeSink, I am using slightly different types/traits.

bnjbvr · 2021-04-15T14:50:29Z

cranelift/codegen/src/isa/x64/inst/encoding/mod.rs

 pub mod rex;
+pub mod vex;
+
+pub trait CodeSink {


Isn't there already another CodeSink trait somewhere in this crate too? I wonder if we could avoid this one, to reduce the number of concepts, and just read out of the MachBuffer internal vector, for testing purposes?

Yes, here, but as you can see it would force this module to implement things it has no idea about and I wasn't too sure the old backend trait would be around forever. Also, I am trying to keep this module as dependency-free as possible so that in the future it could be used elsewhere--if I want to use this in the future to just encode instructions I won't want or need the additional MachBuffer methods and fields.

Maybe we could name the trait something else -- ByteSink or something?

It might even make sense to do the factoring in the other direction -- split out ByteSink from CodeSink and make the former a constraint on the latter (trait CodeSink : ByteSink { ... } with just the additional methods), but that's probably out-of-scope for this PR...

bnjbvr · 2021-04-15T14:55:24Z

cranelift/codegen/src/isa/x64/inst/emit_tests.rs

@@ -4276,6 +4282,7 @@ fn test_x64_emit() {
    let mut isa_flag_builder = x64::settings::builder();
    isa_flag_builder.enable("has_ssse3").unwrap();
    isa_flag_builder.enable("has_sse41").unwrap();
+    isa_flag_builder.enable("has_avx512f").unwrap();


Also, maybe not something to worry too much about, since there's no actual instruction selection test here. What was your concern?

Also, includes an empty stub module for the VEX encoding.

This change replaces the `encode_evex` function with a builder-style struct, `EvexInstruction`. This approach clarifies the code, adds documentation, and results in slight speedups when benchmarked.

…bytecodealliance#2819) * x64: add EVEX encoding mechanism Also, includes an empty stub module for the VEX encoding. * x64: lower abs.i64x2 to VPABSQ when available * x64: refactor EVEX encodings to use `EvexInstruction` This change replaces the `encode_evex` function with a builder-style struct, `EvexInstruction`. This approach clarifies the code, adds documentation, and results in slight speedups when benchmarked. * x64: rename encoding CodeSink to ByteSink

abrown requested review from cfallin, bnjbvr and jlb6740 April 8, 2021 17:57

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels Apr 8, 2021

abrown force-pushed the inst-format-3 branch from e2f8baf to 52ad0a7 Compare April 13, 2021 18:36

abrown marked this pull request as ready for review April 13, 2021 19:00

abrown commented Apr 13, 2021

View reviewed changes

abrown force-pushed the inst-format-3 branch from 52ad0a7 to 377253b Compare April 13, 2021 19:28

bnjbvr approved these changes Apr 15, 2021

View reviewed changes

abrown added 2 commits April 15, 2021 08:36

x64: add EVEX encoding mechanism

48d7c77

Also, includes an empty stub module for the VEX encoding.

x64: lower abs.i64x2 to VPABSQ when available

2f73d66

abrown force-pushed the inst-format-3 branch from 377253b to 9bbf3c2 Compare April 15, 2021 15:37

x64: refactor EVEX encodings to use EvexInstruction

edb142b

This change replaces the `encode_evex` function with a builder-style struct, `EvexInstruction`. This approach clarifies the code, adds documentation, and results in slight speedups when benchmarked.

abrown force-pushed the inst-format-3 branch from 9bbf3c2 to edb142b Compare April 15, 2021 15:41

x64: rename encoding CodeSink to ByteSink

addaed3

abrown merged commit 0acc145 into bytecodealliance:main Apr 15, 2021

abrown deleted the inst-format-3 branch April 15, 2021 18:54

abrown restored the inst-format-3 branch April 15, 2021 18:54

abrown mentioned this pull request May 10, 2021

x64: add x64 encoding benchmarks #2888

Merged

abrown deleted the inst-format-3 branch May 17, 2021 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x64: lower iabs.i64x2 using a single AVX512 instruction when possible #2819

x64: lower iabs.i64x2 using a single AVX512 instruction when possible #2819

abrown commented Apr 8, 2021

abrown commented Apr 8, 2021 •

edited

Loading

bnjbvr commented Apr 12, 2021

abrown commented Apr 13, 2021 •

edited

Loading

abrown Apr 13, 2021

bnjbvr Apr 15, 2021

bnjbvr Apr 15, 2021

abrown Apr 15, 2021

abrown commented Apr 14, 2021

bnjbvr commented Apr 14, 2021

bnjbvr left a comment

bnjbvr Apr 15, 2021

bnjbvr Apr 15, 2021

abrown Apr 15, 2021

bnjbvr Apr 15, 2021

abrown Apr 15, 2021 •

edited

Loading

cfallin Apr 15, 2021

bnjbvr Apr 15, 2021

x64: lower iabs.i64x2 using a single AVX512 instruction when possible #2819

x64: lower iabs.i64x2 using a single AVX512 instruction when possible #2819

Conversation

abrown commented Apr 8, 2021

abrown commented Apr 8, 2021 • edited Loading

bnjbvr commented Apr 12, 2021

abrown commented Apr 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abrown commented Apr 14, 2021

bnjbvr commented Apr 14, 2021

bnjbvr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abrown Apr 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abrown commented Apr 8, 2021 •

edited

Loading

abrown commented Apr 13, 2021 •

edited

Loading

abrown Apr 15, 2021 •

edited

Loading