-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorize IndexOfAnyExcept<T>(T value) #73488
Conversation
Tagging subscribers to this area: @dotnet/area-system-memory Issue DetailsContributes to #67942 [Params(1, 4, 16, 64, 256, 1024)]
public int Length { get; set; }
private byte[] _zeros;
[GlobalSetup]
public void Setup() => _zeros = new byte[Length];
[Benchmark]
public bool AllZeroTrue() => _zeros.AsSpan().IndexOfAnyExcept((byte)0) < 0;
The one thing that makes me a tad hesitant is the case of the except value being found at the very beginning, in which case this does incur a (very small) penalty even for really long inputs, e.g. [Params(1024)]
public int Length { get; set; }
private byte[] _allBitsSet;
[GlobalSetup]
public void Setup() => _allBitsSet = Enumerable.Repeat((byte)0xFF, Length).ToArray();
[Benchmark]
public bool AllBitsSetLookingForNon0() => _allBitsSet.AsSpan().IndexOfAnyExcept((byte)0) < 0;
I'm not sure what if anything to do about it.
|
Failure is #73247 |
@adamsitnik, can you please review this? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you @stephentoub ! I really like the namings you use + lack of gotos ;)
Thanks for reviewing, Adam. |
I think this is an acceptable tradeoff. The measured difference is just over 1ns which would be about 2-4 clock cycles on most modern computers, this is likely representative of the jump to In practice, the actual spilling of args, call, and general memory latency is more than this so most users likely won't see a difference. It's likewise a general avenue we might want to look at improved JIT support around, for being able to ensure that cost for such checks can be minimized. One think that might help would be making it so that you have a pattern like the following and marking the containing method as if (length < Vector128<T>.Count)
{
ScalarImpl();
}
else
{
VectorImpl();
} This would theoretically allow the JIT to hoist, constant fold, or otherwise mitigate the branch for various input kinds. |
{ | ||
for (int i = 0; i < length; i++) | ||
{ | ||
if (!Unsafe.Add(ref searchSpace, i).Equals(value0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering, do you have the disassembly the JIT emits for this?
I'd expect it emits a cmp reg, [addr]
(where reg
contains value0
), but I know there were some issues with Unsafe.Add
being converted to an [addr]
before.
To be clear, I wasn't referring to the case where there are fewer than Count elements, rather the case where there are more than Count elements and the very first element doesn't match the specified value. I expect that to be a fairly common occurrence with some uses of this method. |
By a nanosecond or two, right? That's largely expected. I'd expect IsEmpty to get a tad slower when the dictionary isn't empty and faster when it is empty; basically an extra branch or two in the check. |
I totally agree and didn't file an issue for it, it's just needed for perf study report |
Contributes to #67942
The one thing that makes me a tad hesitant is the case of the except value being found at the very beginning, in which case this does incur a (very small) penalty even for really long inputs, e.g.
I'm not sure what if anything to do about it.