-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Vectorize and use ROSpan.LastIndexOf as the workhorse for string.LastIndexOf #17370
Conversation
|
@dotnet-bot test Windows_NT x64 Checked corefx_baseline |
| } | ||
| } | ||
|
|
||
| public static unsafe int LastIndexOf(ref char firstChar, int startIndex, char value, int length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the startIndex argument? Isn't it always length - 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I think this should be length, just like for IndexOf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how that would work.
Let's say we change the caller to pass in length for startIndex instead:
public int LastIndexOf(char value) => SpanHelpers.LastIndexOf(ref _firstChar, Length, value, Length);For a string of length 1, we would end up with the incorrect result (-1 instead of 0).
'a'.LastIndexOf('a'); // Expected: 0, Actual: -1We have to pass in Length - 1 as the startIndex since the character at position startIndex is included in the search. Which means, we have to special case Length == 0, as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IndexOf and LastIndexOf implementation should be pretty similar. IndexOf helper takes 3 arguments, so LastIndexOf helper should take 3 arguments as well.
The arguments of the SpanHelpers.LastIndexOf method should be the block of memory (ie pointer + length) and the value you are looking for. The method should always start searching at length - 1.
The naive non-vectorized algorithm would be:
static int LastIndexOf(ref char searchSpace, char value, int length)
{
for (int i = length - 1; i >= 0; i--)
if (Unsafe.Add(ref searchSpace, i) == value) return i;
return -1;
}And the call from int IndexOf(char value, int startIndex, int count) should be:
static int LastIndexOf(char value, int startIndex, int count)
{
...
int endIndex = startIndex + 1 - count;
int result = SpanHelpers.LastIndexOf(ref Unsafe.Add(_firstChar, endIndex), count, value);
return result == -1 ? result : result + endIndex;
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks. That is what I ended up doing :)
| public int LastIndexOf(char value) | ||
| { | ||
| return LastIndexOf(value, this.Length - 1, this.Length); | ||
| if (Length == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do not have Length == 0 check for IndexOf, we should not have one here either.
| if (Vector.IsHardwareAccelerated && length >= Vector<ushort>.Count * 2) | ||
| { | ||
| const int elementsPerByte = sizeof(ushort) / sizeof(byte); | ||
| length = (((int)pCh & (Unsafe.SizeOf<Vector<ushort>>() - 1)) / elementsPerByte) + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this need to be Vector<byte>? I think either you don't need elementsPerByte, or you should use Vector<byte>.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't Vector<byte>.Count and Unsafe.SizeOf<Vector<ushort>>() equivalent in value? In either case, we would still need elementsPerByte, no?
What about IndexOf: https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/System/SpanHelpers.Char.cs#L96?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct. I was thinking Vector<T>.Count, not Unsafe.SizeOf. Please ignore.
| while (length > 0) | ||
| { | ||
| char* pStart = pCh - Vector<ushort>.Count + 1; | ||
| Vector<ushort> vMatches = Vector.Equals(vComparison, Unsafe.ReadUnaligned<Vector<ushort>>(pStart)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use Unsafe.Read in IndexOf, but Unsafe.ReadUnaligned here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if pStart is guaranteed to be vector aligned here (unlike the case of IndexOf). I will have to reason about it to convince myself. If it is aligned, I will change it to Unsafe.Read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be guaranteed, or else the preamble that we are doing is worthless. The whole point of the first SequentialScan is to get our pointer aligned. That's what:
length = (((int)pCh & (Unsafe.SizeOf<Vector<ushort>>() - 1)) / elementsPerByte) + 1;
that line is doing.
Once pCh is aligned, pStart will be aligned because:
char* pStart = pCh - Vector<ushort>.Count + 1;
| while (length > Vector<ushort>.Count - 1) | ||
| { | ||
| char* pStart = pCh - Vector<ushort>.Count; | ||
| // Using Unsafe.ReadUnaligned instead of Read since it isn't gauranteed that pStart is vector aligned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not same as in IndexOf?
// Using Unsafe.Read instead of ReadUnaligned since the search space is pinned and pCh is always vector aligned
Debug.Assert(((int)pCh & (Unsafe.SizeOf<Vector<ushort>>() - 1)) == 0);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahsonkhan - I thought we discussed this last night, and I showed you why pStart is guaranteed to be aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, char* pStart = pCh - Vector<ushort>.Count + 1; is always aligned, but in the current implementation, that causes test failures. For correctness, I had to change it to char* pStart = pCh - Vector<ushort>.Count; (note: no + 1) which isn't aligned.
I will investigate how to fix the test failures.
As part of this change, I also had to change the sequential scan (and the goto logic) for correctness. We decrement pch at the start of the loop before comparing with value rather than doing it at the end of the loop.
| if (Vector.IsHardwareAccelerated && length >= Vector<ushort>.Count * 2) | ||
| { | ||
| const int elementsPerByte = sizeof(ushort) / sizeof(byte); | ||
| length = (((int)pCh & (Unsafe.SizeOf<Vector<ushort>>() - 1)) / elementsPerByte) + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A case I think this is missing is when pCh is already at the correct position for alignment. With the current code, we will do a whole SequentialScan on the first vector's length of chars before we start the vectorized section. We can skip that SequentialScan when pCh is already at the correct position for alignment.
Using my back of the napkin pseudo code, I think you can accomplish that with adding the line:
length = length & (Vector<ushort>.Count - 1);
This is also what the SpanHelpers.Byte.LastIndexOf version is also doing -
| nLength = (IntPtr)(((length & (Vector<byte>.Count - 1)) + unaligned) & (Vector<byte>.Count - 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If pCh is already aligned, adding that line won't skip the SequentialScan (due to the + 1):
// ((int)pCh & (Unsafe.SizeOf<Vector<ushort>>() - 1)) = 0
// 0 / elementsPerByte = 0
// length = 0 + 1 = 1
length = (((int)pCh & (Unsafe.SizeOf<Vector<ushort>>() - 1)) / elementsPerByte) + 1;
length = length & (Vector<ushort>.Count - 1); // length = 1There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not pCh that needs to be aligned. It's pStart below. Like I said above, the case where pCh is at the correct position for alignment.
A B C D | E F G H | I J K
^ ^
| pCh
pStart
It's pStart that needs to be aligned.
So you are checking for when pCh is at the end of the alignment. i.e. length = Vector<ushort>.Count.
|
@dotnet-bot test Windows_NT x64 Checked corefx_baseline |
|
Any other feedback before I merge this? |
eerhardt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. This is cleaner and more readable than the way I had it. I just needed to shift my thinking that pCh points to the char you've already checked through each iteration. But in doing that, it makes the math around it easier to understand.
The extra span.Length checks (required by the way string.LastIndexOf was implemented) ends up adding overhead for small spans.TODO: Investigate a way to avoid that overhead without having two separate implementations.Related PR: #17284
cc @jkotas, @tarekgh, @eerhardt