Vectorize and use ROSpan.LastIndexOf as the workhorse for string.LastIndexOf #17370

ahsonkhan · 2018-03-31T06:05:26Z

~~The extra span.Length checks (required by the way string.LastIndexOf was implemented) ends up adding overhead for small spans.~~

TODO: Investigate a way to avoid that overhead without having two separate implementations.

Version 1 of the PR (out-dated)

Related PR: #17284

cc @jkotas, @tarekgh, @eerhardt

…IndexOf

ahsonkhan · 2018-03-31T06:23:00Z

@dotnet-bot test Windows_NT x64 Checked corefx_baseline
@dotnet-bot test Ubuntu x64 Checked corefx_baseline

jkotas · 2018-03-31T06:29:15Z

src/mscorlib/shared/System/SpanHelpers.Char.cs

            }
        }

+        public static unsafe int LastIndexOf(ref char firstChar, int startIndex, char value, int length)


Why do we need the startIndex argument? Isn't it always length - 1?

I see. I think this should be length, just like for IndexOf.

I don't see how that would work.

Let's say we change the caller to pass in length for startIndex instead:

public int LastIndexOf(char value) => SpanHelpers.LastIndexOf(ref _firstChar, Length, value, Length);

For a string of length 1, we would end up with the incorrect result (-1 instead of 0).

'a'.LastIndexOf('a'); // Expected: 0, Actual: -1

We have to pass in Length - 1 as the startIndex since the character at position startIndex is included in the search. Which means, we have to special case Length == 0, as well.

IndexOf and LastIndexOf implementation should be pretty similar. IndexOf helper takes 3 arguments, so LastIndexOf helper should take 3 arguments as well.

The arguments of the SpanHelpers.LastIndexOf method should be the block of memory (ie pointer + length) and the value you are looking for. The method should always start searching at length - 1.

The naive non-vectorized algorithm would be:

static int LastIndexOf(ref char searchSpace, char value, int length) { for (int i = length - 1; i >= 0; i--) if (Unsafe.Add(ref searchSpace, i) == value) return i; return -1; }

And the call from int IndexOf(char value, int startIndex, int count) should be:

static int LastIndexOf(char value, int startIndex, int count) { ... int endIndex = startIndex + 1 - count; int result = SpanHelpers.LastIndexOf(ref Unsafe.Add(_firstChar, endIndex), count, value); return result == -1 ? result : result + endIndex; }

Got it. Thanks. That is what I ended up doing :)

jkotas · 2018-03-31T06:30:47Z

src/mscorlib/shared/System/String.Searching.cs

        public int LastIndexOf(char value)
        {
-            return LastIndexOf(value, this.Length - 1, this.Length);
+            if (Length == 0)


If we do not have Length == 0 check for IndexOf, we should not have one here either.

eerhardt · 2018-04-02T23:07:00Z

src/mscorlib/shared/System/SpanHelpers.Char.cs

+                if (Vector.IsHardwareAccelerated && length >= Vector<ushort>.Count * 2)
+                {
+                    const int elementsPerByte = sizeof(ushort) / sizeof(byte);
+                    length = (((int)pCh & (Unsafe.SizeOf<Vector<ushort>>() - 1)) / elementsPerByte) + 1;


Doesn't this need to be Vector<byte>? I think either you don't need elementsPerByte, or you should use Vector<byte>.

Aren't Vector<byte>.Count and Unsafe.SizeOf<Vector<ushort>>() equivalent in value? In either case, we would still need elementsPerByte, no?

What about IndexOf: https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/System/SpanHelpers.Char.cs#L96?

You're correct. I was thinking Vector<T>.Count, not Unsafe.SizeOf. Please ignore.

eerhardt · 2018-04-02T23:12:36Z

src/mscorlib/shared/System/SpanHelpers.Char.cs

+                    while (length > 0)
+                    {
+                        char* pStart = pCh - Vector<ushort>.Count + 1;
+                        Vector<ushort> vMatches = Vector.Equals(vComparison, Unsafe.ReadUnaligned<Vector<ushort>>(pStart));


We use Unsafe.Read in IndexOf, but Unsafe.ReadUnaligned here.

I am not sure if pStart is guaranteed to be vector aligned here (unlike the case of IndexOf). I will have to reason about it to convince myself. If it is aligned, I will change it to Unsafe.Read.

It should be guaranteed, or else the preamble that we are doing is worthless. The whole point of the first SequentialScan is to get our pointer aligned. That's what:

length = (((int)pCh & (Unsafe.SizeOf<Vector<ushort>>() - 1)) / elementsPerByte) + 1;

that line is doing.

Once pCh is aligned, pStart will be aligned because:

char* pStart = pCh - Vector<ushort>.Count + 1;

jkotas · 2018-04-04T04:14:53Z

src/mscorlib/shared/System/SpanHelpers.Char.cs

+                    while (length > Vector<ushort>.Count - 1)
+                    {
+                        char* pStart = pCh - Vector<ushort>.Count;
+                        // Using Unsafe.ReadUnaligned instead of Read since it isn't gauranteed that pStart is vector aligned


Why is this not same as in IndexOf?

// Using Unsafe.Read instead of ReadUnaligned since the search space is pinned and pCh is always vector aligned Debug.Assert(((int)pCh & (Unsafe.SizeOf<Vector<ushort>>() - 1)) == 0);

@ahsonkhan - I thought we discussed this last night, and I showed you why pStart is guaranteed to be aligned.

Yes, char* pStart = pCh - Vector<ushort>.Count + 1; is always aligned, but in the current implementation, that causes test failures. For correctness, I had to change it to char* pStart = pCh - Vector<ushort>.Count; (note: no + 1) which isn't aligned.

I will investigate how to fix the test failures.

As part of this change, I also had to change the sequential scan (and the goto logic) for correctness. We decrement pch at the start of the loop before comparing with value rather than doing it at the end of the loop.

eerhardt · 2018-04-04T15:24:49Z

src/mscorlib/shared/System/SpanHelpers.Char.cs

+                if (Vector.IsHardwareAccelerated && length >= Vector<ushort>.Count * 2)
+                {
+                    const int elementsPerByte = sizeof(ushort) / sizeof(byte);
+                    length = (((int)pCh & (Unsafe.SizeOf<Vector<ushort>>() - 1)) / elementsPerByte) + 1;


A case I think this is missing is when pCh is already at the correct position for alignment. With the current code, we will do a whole SequentialScan on the first vector's length of chars before we start the vectorized section. We can skip that SequentialScan when pCh is already at the correct position for alignment.

Using my back of the napkin pseudo code, I think you can accomplish that with adding the line:

length = length & (Vector<ushort>.Count - 1);

This is also what the SpanHelpers.Byte.LastIndexOf version is also doing -

coreclr/src/mscorlib/shared/System/SpanHelpers.Byte.cs

Line 261 in d1f49cc

nLength = (IntPtr)(((length & (Vector<byte>.Count - 1)) + unaligned) & (Vector<byte>.Count - 1));

If pCh is already aligned, adding that line won't skip the SequentialScan (due to the + 1):

// ((int)pCh & (Unsafe.SizeOf<Vector<ushort>>() - 1)) = 0 // 0 / elementsPerByte = 0 // length = 0 + 1 = 1 length = (((int)pCh & (Unsafe.SizeOf<Vector<ushort>>() - 1)) / elementsPerByte) + 1; length = length & (Vector<ushort>.Count - 1); // length = 1

It's not pCh that needs to be aligned. It's pStart below. Like I said above, the case where pCh is at the correct position for alignment.

A B C D | E F G H | I J K ^ ^ | pCh pStart

It's pStart that needs to be aligned.

So you are checking for when pCh is at the end of the alignment. i.e. length = Vector<ushort>.Count.

ahsonkhan · 2018-04-05T03:17:09Z

@dotnet-bot test Windows_NT x64 Checked corefx_baseline
@dotnet-bot test Ubuntu x64 Checked corefx_baseline

ahsonkhan · 2018-04-05T17:51:58Z

Any other feedback before I merge this?

eerhardt

Nice work. This is cleaner and more readable than the way I had it. I just needed to shift my thinking that pCh points to the char you've already checked through each iteration. But in doing that, it makes the math around it easier to understand.

Vectorize and use ROSpan.LastIndexOf as the workhorse for string.Last…

0ac309b

…IndexOf

ahsonkhan added the area-System.Memory label Mar 31, 2018

ahsonkhan self-assigned this Mar 31, 2018

ahsonkhan requested review from eerhardt, jkotas and tarekgh March 31, 2018 06:05

jkotas reviewed Mar 31, 2018

View reviewed changes

eerhardt reviewed Apr 2, 2018

View reviewed changes

ahsonkhan mentioned this pull request Apr 3, 2018

Created performance tests for ROS First and TryGet methods dotnet/corefx#28760

Closed

Address PR feedback, remove Length == 0 checks where unnecessary.

cae8bdc

jkotas reviewed Apr 4, 2018

View reviewed changes

eerhardt reviewed Apr 4, 2018

View reviewed changes

Use aligned vector read just like IndexOf

770b9e7

eerhardt approved these changes Apr 5, 2018

View reviewed changes

ahsonkhan merged commit f561c10 into dotnet:master Apr 5, 2018

ahsonkhan deleted the UseSpanLastIndexOf branch April 5, 2018 18:36

Vectorize and use ROSpan.LastIndexOf as the workhorse for string.LastIndexOf #17370

Vectorize and use ROSpan.LastIndexOf as the workhorse for string.LastIndexOf #17370

Uh oh!

Conversation

ahsonkhan commented Mar 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahsonkhan commented Mar 31, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahsonkhan Apr 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas Apr 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahsonkhan Apr 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahsonkhan commented Apr 5, 2018

Uh oh!

ahsonkhan commented Apr 5, 2018

Uh oh!

eerhardt left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ahsonkhan commented Mar 31, 2018 •

edited

Loading

ahsonkhan Apr 4, 2018 •

edited

Loading

jkotas Apr 4, 2018 •

edited

Loading

ahsonkhan Apr 4, 2018 •

edited

Loading