-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve APIs for Intel string handling intrinsics #957
Comments
@eerhardt @fiigii @tannergooding @4creators - This is intended to be the follow-on discussion from dotnet/coreclr#17637. Please feel free to modify the description or title, and add any context I've missed. |
I think it'd be simpler to control everything via enum rather than have a large number of methods for random combinations of flags -- otherwise I'll have to keep a translation in my head of what maps to specifically what instruction. Is this still part of the 3.0 milestone? Very keen to start using these. |
@CarolEidt, @terrajobst, @eerhardt, @jkotas I've moved this to I think someone will need to sit down and take a much deeper look at the instructions and how we should expose them, but I don't think I'll have time to do this before 3.0, so I've marked it as |
Heads up... CoreFX links at the top of the page are broken. Attempted to see if I could add canonical links here but wasn't sure which branch they should now point to. |
Hi all, First of all I want to say big thanks for all the hard work on hardware intrinsics, I've been keeping an eye on this progress for many years now and clearly the work was not as easy as I assumed it would be. But I just want to raise this specific issue again - string comparison intrinsics are still missing, something I've been waiting for nearly 10 years and it was very disappointing to not see them in .NET 5, and after some digging it now seems unlikely these will appear at all. From my point of view a functionally available version of these is far better than status quo (nothing!) - my answer to the 3 open questions is that choose anything, I'll be happy with it as along as I can use string intrisics in .NET code. Thank you. |
Is there something specific you'd like to implement that you're having trouble with? Perhaps we can help with a workaround. Aside from coming up with a .NET-like API for these instructions, there are other issues. Namely:
|
I've got a lot of long string comparison code that I'd like to use PcmpEstrI - it does exactly what I need 128 bits at once and giving me the byte where strings differ. |
In general, such a comparison can be performed with You can see an example of my second point above by looking at popular projects that use In that code, there's really no benefit to using |
Ok, thank you, I'll give AVX2 a go, wasn't keen on it as it can affect clocks, worth giving it a shot since there isn't any other easier option. |
To be clear, you don't have to use AVX2 to work around the absence of the SSE4.2 intructions. You can use SSE2 instructions to build the same algorithm at 128 bits at a time if you find that AVX throttling causes performance issues. I was mostly using the AVX2 example as evidence the string comparison instructions aren't strictly necessary to accelerate string comparisons and that they don't have a future -- otherwise they would exist as 256-bit instructions in AVX2 as well. |
Makes sense, thank you for the explanation, I might benchmark both and see which one is best for me. |
* Skip RemoveAttributeInstances with arguments ILLinkTrim.Attributes format received support for removing attribute instances with specific values of their arguments. The support for this in illink equals to almost 1000 lines of code. There was no data on how much size savings this can actually provide when it was introduced. My spider-sense says "miniscule savings" (possibly somewhere around 0.1%). We have plenty of other possible sources of savings with a smaller implementation complexity. This code will make sure to ignore such entries. * Update UsageBasedMetadataManager.cs
The actual instruction actually performs fairly poorly being a macro-op converted to a large number of micro-ops; having decently high latency, and taking up a large number of ports: The single instruction actually represents several different "APIs" ranging from UTF-8 to UTF-16 to various kinds of specialized masking, etc and that's part of why the underlying instruction is expensive in comparison. Generally speaking, you can emulate the behavior with
If you're directly needing
So, the following logic is generally what you want: var mask = Sse2.MoveMask(Sse2.CompareEqual(left, right));
if (mask != 0)
{
return (offset + TrailingZeroCount(mask));
}
offset += Vector128<byte>.Count; It's similar to the implementation of |
Just wanted to post here that AVX2 worked very nicely, thank you for that suggestion. Hope AVX512 will be added soon, keeping an eye on those threads... |
See dotnet/coreclr#17637 for some discussion and context.
The fundamental issues revolve around how much of the behavior should be controlled by enums passed into the intrinsic method, versus separating the different behaviors into differently-named intrinsics.
The current API includes both
StringComparisonMode
(https://github.com/dotnet/corefx/blob/master/src/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs#L864) andResultsFlag
(https://github.com/dotnet/corefx/blob/master/src/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs#L875).Some of the open questions are:
ResultsFlag
, which determines the comparison results for which a true result is returned, become part of the method name?CFlag
, as in the current API), or something more descriptive likeCompareNonZeroMask
?enum
or remain separate?The text was updated successfully, but these errors were encountered: