-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Add ReadOnlySpan string-like Equals/CompareTo/IndexOf/Contains API with globalization support #16467
Conversation
…th globalization support
Why? |
|
|
||
| public static bool Equals(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType) | ||
| { | ||
| StringSpanHelpers.CheckStringComparison(comparisonType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check needed?
|
|
||
| public static int CompareTo(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType) | ||
| { | ||
| StringSpanHelpers.CheckStringComparison(comparisonType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check needed?
|
|
||
| case StringComparison.Ordinal: | ||
| matchedLength = defaultMatchLength; | ||
| return IndexOfOrdinalHelper(span, value, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It'd be nice to specify the name of the bool parameter. Same for a few other calls in this file below.
| if (span.Length != value.Length) | ||
| return false; | ||
| if (value.Length == 0) | ||
| return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this return true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here span.Length == value.Length == 0 #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. I misread the previous condition if (span.Length != value.Length)
In reply to: 169812349 [](ancestors = 169812349)
| if (span.Length != value.Length) | ||
| return false; | ||
| if (value.Length == 0) | ||
| return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this return true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same reason here.
| return CompareOrdinalHelper(span, value); | ||
|
|
||
| case StringComparison.OrdinalIgnoreCase: | ||
| return CompareInfo.CompareOrdinalIgnoreCase(span, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return CompareInfo.CompareOrdinalIgnoreCase(span, value); [](start = 19, length = 58)
why you don't have the condition
if (span.Length == 0 || value.Length == 0)
return span.Length - value.Length;
as you have it in ordinal case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is already checked in here: https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/System/Globalization/CompareInfo.cs#L596
| matchedLength = 0; | ||
| return -1; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we do this check with the strings APIs? this looks wrong to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | ||
| } | ||
|
|
||
| internal static unsafe int IndexOfCultureHelper(ReadOnlySpan<char> span, ReadOnlySpan<char> value, int* matchLengthPtr, bool invariantCulture = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool invariantCulture = false [](start = 128, length = 29)
you don't need this. you already checking this value, instead you can just check CompareInfo.Name.Length == 0. remember current culture can be invariant too :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need this.
What are you referring to?
I don't understand what you mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the parameter invariantCulture.
just check CompareInfo.Name.Length == 0 inside the method
In reply to: 169813408 [](ancestors = 169813408)
| CultureInfo.CurrentCulture.CompareInfo.IndexOf(span, value, CompareOptions.None, matchLengthPtr); | ||
| } | ||
|
|
||
| internal static unsafe int IndexOfCultureIgnoreCaseHelper(ReadOnlySpan<char> span, ReadOnlySpan<char> value, int* matchLengthPtr, bool invariantCulture = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool invariantCulture = false [](start = 138, length = 29)
same comment as the one on IndexOfCultureHelper
|
|
||
| // Since we know that the first two chars are the same, | ||
| // we can increment by 2 here and skip 4 bytes. | ||
| // This leaves us 8-byte aligned, which results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This leaves us 8-byte [](start = 23, length = 21)
@jkotas does 4-bytes alignments considered as 8-bytes aligned too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment and optimization was true for String implementation. It does not hold for Span implementation.
| while (length > 0) | ||
| { | ||
| if (*(int*)a != *(int*)b) | ||
| goto DiffNextInt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this correct? what happen if we have length == 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is invalid copy&paste from the String implementation.
| return -1; | ||
| } | ||
|
|
||
| internal static unsafe int IndexOfOrdinalCore(ReadOnlySpan<char> source, ReadOnlySpan<char> value, bool ignoreCase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IndexOfOrdinalCore [](start = 35, length = 18)
why duplicating the code here? isn't better to have such methods take char* and avoid the code duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a method that takes a char*, we would have to pin the span/string to get the pointer in all cases (unnecessarily) instead of only around the OS call. Given this change - #16434, I want to re-measure the cost of casting string to span and see if we can address the code duplication that way. I will address this outside this PR.
| Debug.Assert(!_invariantMode); | ||
| Debug.Assert(source.Length != 0); | ||
| Debug.Assert(target.Length != 0); | ||
| Debug.Assert((options == CompareOptions.None || options == CompareOptions.IgnoreCase)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug.Assert((options == CompareOptions.None || options == CompareOptions.IgnoreCase)); [](start = 11, length = 88)
Although this may look true from the calls coming from Span but we may support spans in CompareInfo which will allow passing more options than what we are currently passing. so we'll need to re-visit this code again if we did that in the future. I prefer writing this code here with the assumptions you may get more options than what we are currently passing from the span APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What options should this Debug.Assert check then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe Contains can be implemented using IndexOf. but the problem would be in the APIs that need to return the matchedLength as there is no easy way to get this length from the currently exposed APIs. We may have the following options:
I prefer the second option. |
My question was, why do we need matchedLength at all? Why is that required in order to support slow span? |
See https://github.com/dotnet/corefx/issues/21395#issuecomment-359980642
If the user uses target.Length (i.e. 1) to slice the source to keep going, that would be incorrect. |
|
Maybe I misunderstood the question. @ahsonkhan wrote: "Can we expose string.IndexOf with out int matchedLength parameter? Can we expose string.Contains with StringComparison comparisonType parameter? If not, these APIs cannot be implemented for portable span." To me that sounds like he's saying unless we have |
To clarify, we can probably implement it but it would require doing a lot of the work that the string APIs already do and it will be slow. How would we correctly (and with reasonable perf) implement this API with the existing string public surface area (for portable span)? public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType, out int matchedLength)I was hoping for something like this: public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType, out int matchedLength)
{
return span.AsReadOnlySpan().IndexOf(value.AsReadOnlySpan(), comparisonType, out matchedLength);
}But, we can't do that. And it doesn't look like we can add the String APIs to netfx to allow this. |
|
@ahsonkhan I have proposed how we can do that in #16467 (comment) |
@tarekgh, can you elaborate? You're proposing re-implementing that support in System.Memory for the slow span implementation? What about on Unix where we call through System.Globalization.Native, require ICU, etc... we'd be signing up to make sure that this implementation works with downlevel System.Globalization.Natives? |
Ok, I was missing that you were talking about adding a new String.IndexOf to .NET Framework. Even if that were to happen, it would be for a future release, and that won't help the System.Memory implementation running on whatever netstandard version it's targeting. |
My suggestion was basically getting the matched length using the OS help. so the suggestion is scoped only when we doing IndexOf call. side point, I can see there is some complication here and I already talked offline with @ahsonkhan, so it may be ok for now not providing this functionality and we can consider it in the future releases. |
|
@dotnet-bot test Windows_NT arm Cross Checked corefx_baseline Build and Test |
@dotnet-bot test Ubuntu x64 Checked corefx_baseline |
|
This failure is not related to the PR (see dotnet/corefx#27339 (comment)): cc @JeremyKuhne |
|
Any other feedback? |
No, this isn't true for spans. I will address it outside this PR. |
|
I don't have any other feedback. thanks for getting this ready. I assume you have tested this as we are testing the same functionality with strings. LGTM |
Part of https://github.com/dotnet/corefx/issues/21395#issuecomment-359906138
(fast span only)(fast span only)TODO:
Verify correctness on UnixCan we expose string.IndexOf without int matchedLengthparameter? Can we expose string.Contains withStringComparison comparisonTypeparameter? If not, these APIs cannot be implemented for portable span.Related corefx PR: dotnet/corefx#27319
cc @jkotas, @stephentoub, @KrzysztofCwalina, @tarekgh, @JeremyKuhne, @joshfree