-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x64: lower iabs.i64x2 using a single AVX512 instruction when possible #2819
Conversation
I'm introducing this PR as a draft to solicit opinions about how best to integrate this:
|
Yeah, this could be nice! If the flags don't have any relationship with each other, it might be possible to pass a
Why wouldn't this scale well, out of curiosity? In general I think we should tend to APIs that make it impossible to create invalid instructions. It might not be the case for existing APIs, which should be refactored ideally... |
@bnjbvr, the latest commit moves to the builder pattern. It's a hybrid of your idea: the EVEX 4-byte prefix is built up as the builder methods are called but we still need to call Being a bit paranoid about performance, I benchmarked the
I didn't dig too deep into this since @cfallin mentioned that emission is hardly the long pole in the tent. I was happy enough that this builder approach was not slower and left it at that. I think this is OK to review and merge understanding that there are still pieces coming (e.g. this only supports reg-reg addressing).
I mean mental scaling, not codegen performance or anything like that (some of us operate on limited brain RAM). Finding the right |
@@ -4276,6 +4282,7 @@ fn test_x64_emit() { | |||
let mut isa_flag_builder = x64::settings::builder(); | |||
isa_flag_builder.enable("has_ssse3").unwrap(); | |||
isa_flag_builder.enable("has_sse41").unwrap(); | |||
isa_flag_builder.enable("has_avx512f").unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish there were a way to avoid this here and instead specify it up above when necessary--suggestions welcome because I can't think of a great way to do this without rewriting this entire file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having a single array with all the testing tuples, we could have several ones, one for each Flag combination we'd like to test. Then we could refactor this function so that it takes flags
from a parameter and an array of test tuples, or something like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, maybe not something to worry too much about, since there's no actual instruction selection test here. What was your concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess that's true. It just looked weird to lump the AVX512 flag in with the baseline flags. Maybe in a follow-on PR I will try to factor out the "check these instructions" code so that I can do as you suggest above.
@bnjbvr, can you take another look at this when you get a chance? |
Yep, happy to take a look in the next few days! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the Builder approach. I'd like to note that while there's nothing preventing from writing the same bits twice in a row, that's actually not an issue because it'll just overwrite them instead.
It'd be nice if we could avoid an additional CodeSink
trait, since we already have one. I only skimmed the encodings bit twiddling; as long as there are tests that validate that the right encoding is generated, this patch looks fine!
@@ -4276,6 +4282,7 @@ fn test_x64_emit() { | |||
let mut isa_flag_builder = x64::settings::builder(); | |||
isa_flag_builder.enable("has_ssse3").unwrap(); | |||
isa_flag_builder.enable("has_sse41").unwrap(); | |||
isa_flag_builder.enable("has_avx512f").unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having a single array with all the testing tuples, we could have several ones, one for each Flag combination we'd like to test. Then we could refactor this function so that it takes flags
from a parameter and an array of test tuples, or something like this?
|
||
#[derive(Copy, Clone, Default)] | ||
pub struct Register(u8); | ||
impl From<u8> for Register { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the way it works right now is that it goes from the RealReg
hardware encoding into an u8 then into this Register
structure. Could instead the methods taking Register
take a RealReg
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression that Rust will pare away those abstractions so I didn't worry too much about overhead. I am trying to keep this encoding code dependency-free (especially of external dependencies) so, like with CodeSink
, I am using slightly different types/traits.
pub mod rex; | ||
pub mod vex; | ||
|
||
pub trait CodeSink { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there already another CodeSink
trait somewhere in this crate too? I wonder if we could avoid this one, to reduce the number of concepts, and just read out of the MachBuffer
internal vector, for testing purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, here, but as you can see it would force this module to implement things it has no idea about and I wasn't too sure the old backend trait would be around forever. Also, I am trying to keep this module as dependency-free as possible so that in the future it could be used elsewhere--if I want to use this in the future to just encode instructions I won't want or need the additional MachBuffer
methods and fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could name the trait something else -- ByteSink
or something?
It might even make sense to do the factoring in the other direction -- split out ByteSink
from CodeSink
and make the former a constraint on the latter (trait CodeSink : ByteSink { ... }
with just the additional methods), but that's probably out-of-scope for this PR...
@@ -4276,6 +4282,7 @@ fn test_x64_emit() { | |||
let mut isa_flag_builder = x64::settings::builder(); | |||
isa_flag_builder.enable("has_ssse3").unwrap(); | |||
isa_flag_builder.enable("has_sse41").unwrap(); | |||
isa_flag_builder.enable("has_avx512f").unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, maybe not something to worry too much about, since there's no actual instruction selection test here. What was your concern?
Also, includes an empty stub module for the VEX encoding.
This change replaces the `encode_evex` function with a builder-style struct, `EvexInstruction`. This approach clarifies the code, adds documentation, and results in slight speedups when benchmarked.
…bytecodealliance#2819) * x64: add EVEX encoding mechanism Also, includes an empty stub module for the VEX encoding. * x64: lower abs.i64x2 to VPABSQ when available * x64: refactor EVEX encodings to use `EvexInstruction` This change replaces the `encode_evex` function with a builder-style struct, `EvexInstruction`. This approach clarifies the code, adds documentation, and results in slight speedups when benchmarked. * x64: rename encoding CodeSink to ByteSink
This introduces the mechanism for encoding EVEX instructions to the new backend (ported with slight changes from the old) and then uses it to improve the lowering of
iabs.i64x2
from 5 instructions to 1 instruction (i.e.VPABSQ
).