-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bmi2 MultiplyNoFlags2 #44926
Comments
Tagging subscribers to this area: @tannergooding, @jeffhandley Issue Detailsnamespace System.Runtime.Intrinsics.X86
{
public abstract class Bmi2 : X86Base
{
public static (uint, uint) MultiplyNoFlags2(uint left, uint right);
public abstract class X64: X86Base.X64
{
public static (ulong, ulong) MultiplyNoFlags2(ulong left, ulong right);
}
}
} Based on work Carol did in #37928
|
It this a workaround for JIT not being able to optimize |
It would be useful to follow the API review template for the top post. |
Yes. Currently the only way the JIT can handle a node that produces multiple registers is if it produces a struct. An alternative to explicitly adding an API that returns a struct ( Note that, by adding this API and having |
Should consider giving names to the returned tuple elements, e.g., (int Hi, int Lo). |
I agree, but prefer |
I updated the description to include |
Is it that hard to remove the address taken attribute in the example? I would think that it should be pretty easy for the JIT to prove that the address does not leak anywhere. |
From my understanding, it would be quite tricky. The JIT doesn't have any mechanism at the moment to prove that an address is not leaking when it's passed to an intrinsic for the same reasons as when it's passed to a function that was not inlined. |
Also, it is not sufficient to prove that the address doesn't leak. It's also problematic to transform the IR into something that allows the backend to put both results in a register, when the IR has it as a reference. Really, in order to do that, by the time it gets to the backend it needs to be a single instruction that defines two registers. And for now, the best way (really, the only way without major changes) is to have it produce a two-field struct. |
Yeah, I figured that the real problem is the representation and not just the address taken that the description at the top highlights. |
Well, they're both real problems. It would be somewhat easier to transform just the backend to accommodate this (though still requiring major changes), but by that point we would already forced the variable to live on the stack. |
I propose naming the new methods |
namespace System.Runtime.Intrinsics.X86
{
public abstract class Bmi2 : X86Base
{
public static (uint Lower, uint Upper) MultiplyNoFlags2(uint left, uint right);
public abstract class X64: X86Base.X64
{
public static (ulong Lower, ulong Upper) MultiplyNoFlags2(ulong left, ulong right);
}
}
} |
@terrajobst, did you mean to close this? Is there a separate issue tracking "we should also do a pass of all the APIs that we believe need out or tuple overloads"? Or for that matter a separate issue tracking doing the perf fix for "Performance issue, which we can solve with a private intrinsic that uses tuples."? |
Background and Motivation
In .NET 3.1 we introduced the following intrinsics to expose
mulx
x86 instruction:uint MultiplyNoFlags(uint left, uint right, uint* low)
ulong MultiplyNoFlags(ulong left, ulong right, ulong* low)
where
low
is an out parameter that is used to return the lower 32-bit/64-bit part of 64-bit/128-bit result ofleft * right
multiplication while the return value contains the upper part.When the instrinsics are used the JIT produces sub-optimal code due to the fact that
low
has "address-taken" attribute.For example, the following C# methods
will be compiled down to the following code by the current implementation of the JIT
mulx
mulx_64
However, if the
Bmi2.MultiplyNoFlags
were implemented instead asthe JIT as in #37928 would inline
MultiplyNoFlags
and be able to remove the address-taken attribute from a local corresponding tolow
:mulx
mulx_64
Proposed API
Based on work Carol did in #37928
cc @CarolEidt @tannergooding
The text was updated successfully, but these errors were encountered: