-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine how to expose hwintrinsics that are only supported in 64-bit mode #10617
Comments
FYI. @CarolEidt, @fiigii, @eerhardt, @terrajobst This tracks the discussion raised here: dotnet/coreclr#18734 (comment) |
At least
The JIT is better positioned to choose an optimal solution than user code - it can load directly from memory with MOVQ or spill the low+high registers and then use MOVQ or use 2xMOVD + PUNPCKLDQ or use MOVD + PINSRD (with SSE4.1). For the reverse it can choose between storing directly to memory with MOVQ or to registers with MOVD + PEXTRD (with SSE4.1). |
The general principle of the intrinsics so far is that they are not emulated, ever. This simplifies the implementation overall and provides some very nice guarantees about what the intrinsics will do. The downside is that the consumer ends up having to do a bit more, but I believe that can/will be alleviated by having some separate "helper" library. Such a library could do things like:
|
These proposed libraries would be sub-optimal because a long is internally two int registers or a memory location, but managed code can't access these registers directly nor affect register allocation/spilling and has to either always spill or always load in two parts. If fallbacks for intrinsics are undesirable, then at least there could be a helper intrinsic in |
This is something that is much more general than just hardware intrinsics. If there is an issue here, it is probably worth addressing more generally (or via some separate intrinsic). |
There is a general issue of the JIT not generally being able to determine whether it is best to use a value from memory, the "natural" register type for the value (generally 2 int regs for a long), or an xmm register. As @tannergooding points out, this is a general issue, though certainly of greater impact for code like this. As a design issue, I think it's clearly best to keep the intrinsics as true to the hw as possible. Then, as we identify scenarios (esp. scenarios that can be illustrated to have some impact on real-world-code) that aren't well addressed, we can determine how best to get the desired performance. Creating more and more intrinsics doesn't feel like the right answer to me, but we'd clearly have to do some more analysis first. |
We discuseed and resolved this as part of dotnet/corefx#32721 |
There are several hwintrinsics exposed that are only emittable in 64-bit mode and throw a PNSE exception if invoked in 32-bit mode (regardless of the fact that the general
Isa.IsSupported
check returns true).We should have a deeper discussion on how to properly expose this data to the consumer of these APIs.
The text was updated successfully, but these errors were encountered: