Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@ahsonkhan
Copy link

@ahsonkhan ahsonkhan commented Feb 21, 2018

Part of https://github.com/dotnet/corefx/issues/21395#issuecomment-359906138

  • Equals
  • CompareTo
  • IndexOf (fast span only)
  • Contains (fast span only)

TODO:

  • Add more tests in corefx
  • Verify correctness on Unix
  • Can we expose string.IndexOf with out int matchedLength parameter? Can we expose string.Contains with StringComparison comparisonType parameter? If not, these APIs cannot be implemented for portable span.

Related corefx PR: dotnet/corefx#27319

cc @jkotas, @stephentoub, @KrzysztofCwalina, @tarekgh, @JeremyKuhne, @joshfree

@stephentoub
Copy link
Member

Can we expose string.IndexOf with out int matchedLength parameter? Can we expose string.Contains with StringComparison comparisonType parameter? If not, these APIs cannot be implemented for portable span.

Why?


public static bool Equals(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType)
{
StringSpanHelpers.CheckStringComparison(comparisonType);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check needed?


public static int CompareTo(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType)
{
StringSpanHelpers.CheckStringComparison(comparisonType);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check needed?


case StringComparison.Ordinal:
matchedLength = defaultMatchLength;
return IndexOfOrdinalHelper(span, value, false);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It'd be nice to specify the name of the bool parameter. Same for a few other calls in this file below.

if (span.Length != value.Length)
return false;
if (value.Length == 0)
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this return true?

Copy link
Author

@ahsonkhan ahsonkhan Feb 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here span.Length == value.Length == 0 #Closed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. I misread the previous condition if (span.Length != value.Length)


In reply to: 169812349 [](ancestors = 169812349)

if (span.Length != value.Length)
return false;
if (value.Length == 0)
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this return true?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason here.

return CompareOrdinalHelper(span, value);

case StringComparison.OrdinalIgnoreCase:
return CompareInfo.CompareOrdinalIgnoreCase(span, value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return CompareInfo.CompareOrdinalIgnoreCase(span, value); [](start = 19, length = 58)

why you don't have the condition

if (span.Length == 0 || value.Length == 0)
return span.Length - value.Length;

as you have it in ordinal case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matchedLength = 0;
return -1;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we do this check with the strings APIs? this looks wrong to me.

Copy link
Author

@ahsonkhan ahsonkhan Feb 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks for confirming


In reply to: 169812892 [](ancestors = 169812892)

}
}

internal static unsafe int IndexOfCultureHelper(ReadOnlySpan<char> span, ReadOnlySpan<char> value, int* matchLengthPtr, bool invariantCulture = false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool invariantCulture = false [](start = 128, length = 29)

you don't need this. you already checking this value, instead you can just check CompareInfo.Name.Length == 0. remember current culture can be invariant too :-)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this.

What are you referring to?

I don't understand what you mean.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the parameter invariantCulture.

just check CompareInfo.Name.Length == 0 inside the method


In reply to: 169813408 [](ancestors = 169813408)

CultureInfo.CurrentCulture.CompareInfo.IndexOf(span, value, CompareOptions.None, matchLengthPtr);
}

internal static unsafe int IndexOfCultureIgnoreCaseHelper(ReadOnlySpan<char> span, ReadOnlySpan<char> value, int* matchLengthPtr, bool invariantCulture = false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool invariantCulture = false [](start = 138, length = 29)

same comment as the one on IndexOfCultureHelper


// Since we know that the first two chars are the same,
// we can increment by 2 here and skip 4 bytes.
// This leaves us 8-byte aligned, which results
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaves us 8-byte [](start = 23, length = 21)

@jkotas does 4-bytes alignments considered as 8-bytes aligned too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment and optimization was true for String implementation. It does not hold for Span implementation.

while (length > 0)
{
if (*(int*)a != *(int*)b)
goto DiffNextInt;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this correct? what happen if we have length == 1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is invalid copy&paste from the String implementation.

return -1;
}

internal static unsafe int IndexOfOrdinalCore(ReadOnlySpan<char> source, ReadOnlySpan<char> value, bool ignoreCase)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexOfOrdinalCore [](start = 35, length = 18)

why duplicating the code here? isn't better to have such methods take char* and avoid the code duplication?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a method that takes a char*, we would have to pin the span/string to get the pointer in all cases (unnecessarily) instead of only around the OS call. Given this change - #16434, I want to re-measure the cost of casting string to span and see if we can address the code duplication that way. I will address this outside this PR.

Debug.Assert(!_invariantMode);
Debug.Assert(source.Length != 0);
Debug.Assert(target.Length != 0);
Debug.Assert((options == CompareOptions.None || options == CompareOptions.IgnoreCase));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug.Assert((options == CompareOptions.None || options == CompareOptions.IgnoreCase)); [](start = 11, length = 88)

Although this may look true from the calls coming from Span but we may support spans in CompareInfo which will allow passing more options than what we are currently passing. so we'll need to re-visit this code again if we did that in the future. I prefer writing this code here with the assumptions you may get more options than what we are currently passing from the span APIs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What options should this Debug.Assert check then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the assert.


In reply to: 169814071 [](ancestors = 169814071)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tarekgh
Copy link
Member

tarekgh commented Feb 21, 2018

@stephentoub @ahsonkhan

Can we expose string.IndexOf with out int matchedLength parameter? Can we expose string.Contains with StringComparison comparisonType parameter? If not, these APIs cannot be implemented for portable span.

Why?

I believe Contains can be implemented using IndexOf. but the problem would be in the APIs that need to return the matchedLength as there is no easy way to get this length from the currently exposed APIs.

We may have the following options:

  • We can implement getting the matched length with a very slow path, by repeating the sub-string comparisons until we get the right match length.
  • We may think in doing the interop with the OS as we are doing coreclr. this is not a big deal to do.
  • We decide not supporting these APIs for the portable.

I prefer the second option.

@stephentoub
Copy link
Member

but the problem would be in the APIs that need to return the matchedLength as there is no easy way to get this length from the currently exposed APIs

My question was, why do we need matchedLength at all? Why is that required in order to support slow span?

@ahsonkhan
Copy link
Author

ahsonkhan commented Feb 21, 2018

My question was, why do we need matchedLength at all?

See https://github.com/dotnet/corefx/issues/21395#issuecomment-359980642

Example: imagine the source span have the characters A ̀ and we are searching for Á with culture aware operation. IndexOf will return the index of the match and will return the matched length is 2 and not just 1.

If the user uses target.Length (i.e. 1) to slice the source to keep going, that would be incorrect.

@stephentoub
Copy link
Member

Maybe I misunderstood the question. @ahsonkhan wrote:

"Can we expose string.IndexOf with out int matchedLength parameter? Can we expose string.Contains with StringComparison comparisonType parameter? If not, these APIs cannot be implemented for portable span."

To me that sounds like he's saying unless we have out int matchedLength in the signature of IndexOf, we can't implement IndexOf at all. Was that not the question?

@ahsonkhan
Copy link
Author

To me that sounds like he's saying unless we have out int matchedLength in the signature of IndexOf, we can't implement IndexOf at all. Was that not the question?

To clarify, we can probably implement it but it would require doing a lot of the work that the string APIs already do and it will be slow.

How would we correctly (and with reasonable perf) implement this API with the existing string public surface area (for portable span)?

public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType, out int matchedLength)

I was hoping for something like this:

public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType, out int matchedLength)
{
 return span.AsReadOnlySpan().IndexOf(value.AsReadOnlySpan(), comparisonType, out matchedLength);
}

But, we can't do that. And it doesn't look like we can add the String APIs to netfx to allow this.

@ahsonkhan ahsonkhan closed this Feb 21, 2018
@ahsonkhan ahsonkhan reopened this Feb 21, 2018
@tarekgh
Copy link
Member

tarekgh commented Feb 21, 2018

@ahsonkhan I have proposed how we can do that in #16467 (comment)

@stephentoub
Copy link
Member

stephentoub commented Feb 21, 2018

We may think in doing the interop with the OS as we are doing coreclr. this is not a big deal to do.

@tarekgh, can you elaborate? You're proposing re-implementing that support in System.Memory for the slow span implementation? What about on Unix where we call through System.Globalization.Native, require ICU, etc... we'd be signing up to make sure that this implementation works with downlevel System.Globalization.Natives?

@stephentoub
Copy link
Member

But, we can't do that. And it doesn't look like we can add the String APIs to netfx to allow this.

Ok, I was missing that you were talking about adding a new String.IndexOf to .NET Framework. Even if that were to happen, it would be for a future release, and that won't help the System.Memory implementation running on whatever netstandard version it's targeting.

@tarekgh
Copy link
Member

tarekgh commented Feb 21, 2018

@stephentoub

can you elaborate? You're proposing re-implementing that support in System.Memory for the slow span implementation? What about on Unix where we call through System.Globalization.Native, require ICU, etc... we'd be signing up to make sure that this implementation works with downlevel System.Globalization.Natives?

My suggestion was basically getting the matched length using the OS help. so the suggestion is scoped only when we doing IndexOf call.
I am seeing most of Linux users will just use net core and not use the portable library. I know this statement is not 100% accurate but expecting the majority of user will do that. The portable library will be more interesting for Windows users. If this right, then we'll just need to do only one call to Windows to get the matched length. for Linux users who want to use the portable library, we can provide the slow implementation and ask them to move to 2.1 to get the perf gain.

side point, I can see there is some complication here and I already talked offline with @ahsonkhan, so it may be ok for now not providing this functionality and we can consider it in the future releases.

@ahsonkhan
Copy link
Author

@dotnet-bot test Windows_NT arm Cross Checked corefx_baseline Build and Test
@dotnet-bot test Windows_NT x64_arm64_altjit Checked corefx_baseline
@dotnet-bot test Ubuntu x64 Checked corefx_baseline
@dotnet-bot test Windows_NT x64 Checked corefx_baseline
@dotnet-bot test Windows_NT x86_arm_altjit Checked corefx_baseline
@dotnet-bot test Windows_NT x86 Checked corefx_baseline

@ahsonkhan
Copy link
Author

/mnt/j/workspace/dotnet_coreclr/master/jitstress/x64_checked_ubuntu_corefx_baseline_prtest/_/fx/bin/tests/System.Net.Security.Tests/netcoreapp-Linux-Release-x64/RunTests.sh: line 85:  2694 Aborted                 (core dumped) $RUNTIME_PATH/dotnet xunit.console.netcore.exe System.Net.Security.Tests.dll -xml testResults.xml -notrait Benchmark=true -notrait category=nonnetcoreapptests -notrait category=nonlinuxtests -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing
23:13:44   /mnt/j/workspace/dotnet_coreclr/master/jitstress/x64_checked_ubuntu_corefx_baseline_prtest/_/fx/src/System.Net.Security/tests/FunctionalTests
23:13:44   ----- end 07:13:44 ----- exit code 134 ----------------------------------------------------------
23:13:44 /mnt/j/workspace/dotnet_coreclr/master/jitstress/x64_checked_ubuntu_corefx_baseline_prtest/_/fx/Tools/tests.targets(492,5): warning MSB3073: The command "/mnt/j/workspace/dotnet_coreclr/master/jitstress/x64_checked_ubuntu_corefx_baseline_prtest/_/fx/bin/tests/System.Net.Security.Tests/netcoreapp-Linux-Release-x64/RunTests.sh /mnt/j/workspace/dotnet_coreclr/master/jitstress/x64_checked_ubuntu_corefx_baseline_prtest/_/fx/bin/testhost/netcoreapp-Linux-Release-x64/" exited with code 134. [/mnt/j/workspace/dotnet_coreclr/master/jitstress/x64_checked_ubuntu_corefx_baseline_prtest/_/fx/src/System.Net.Security/tests/FunctionalTests/System.Net.Security.Tests.csproj]
23:13:44 
/mnt/j/workspace/dotnet_coreclr/master/jitstress/x64_checked_ubuntu_corefx_baseline_prtest/_/fx/Tools/tests.targets(500,5): error : One or more tests failed while running tests from 'System.Net.Security.Tests' please check /mnt/j/workspace/dotnet_coreclr/master/jitstress/x64_checked_ubuntu_corefx_baseline_prtest/_/fx/bin/tests/System.Net.Security.Tests/netcoreapp-Linux-Release-x64/testResults.xml for details! 

@dotnet-bot test Ubuntu x64 Checked corefx_baseline

@ahsonkhan
Copy link
Author

This failure is not related to the PR (see dotnet/corefx#27339 (comment)):

 System.IO.FileSystem.DriveInfoTests.DriveInfoWindowsTests.Ctor_InvalidPath_ThrowsArgumentException(driveName: \"\", paramName: \"path\") [FAIL]
23:01:15         Assert.Equal() Failure
23:01:15         Expected: path
23:01:15         Actual:   (null)
23:01:15         Stack Trace:
23:01:16            D:\j\workspace\x64_checked_w---d7295605\_\fx\src\CoreFx.Private.TestUtilities\src\System\AssertExtensions.cs(40,0): at System.AssertExtensions.Throws[T](String netCoreParamName, String netFxParamName, Action action)
23:01:16   Finished:    System.IO.FileSystem.DriveInfo.Tests

cc @JeremyKuhne

@ahsonkhan
Copy link
Author

Any other feedback?

@tarekgh
Copy link
Member

tarekgh commented Feb 22, 2018

            // PERF: This depends on the fact that the String objects are always zero terminated 

Is this true for spans? I mean the spans will be always null terminated?


Refers to: src/mscorlib/shared/System/Span.NonGeneric.cs:392 in b34cfd0. [](commit_id = b34cfd0, deletion_comment = False)

@ahsonkhan
Copy link
Author

ahsonkhan commented Feb 22, 2018

Is this true for spans? I mean the spans will be always null terminated?

No, this isn't true for spans. I will address it outside this PR.

@tarekgh
Copy link
Member

tarekgh commented Feb 22, 2018

I don't have any other feedback. thanks for getting this ready. I assume you have tested this as we are testing the same functionality with strings.

LGTM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants