Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Remove IsFastSort optimizations from String #26759

Merged
merged 7 commits into from
Sep 23, 2019

Conversation

jkotas
Copy link
Member

@jkotas jkotas commented Sep 18, 2019

Fixes #26758

@jkotas
Copy link
Member Author

jkotas commented Sep 18, 2019

Performance results:

The current microbenchmarks in performance repo are running the same tests on the same string, so they are helped by the cached state. For example, results for System.Tests.Perf_String.IndexOf:

Before:

Method options Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
IndexOf OrdinalIgnoreCase 131.131 us 0.8094 us 0.7175 us 130.867 us 130.497 us 132.775 us - - - -

After:

Method options Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
IndexOf OrdinalIgnoreCase 134.921 us 0.7286 us 0.6459 us 135.016 us 134.007 us 135.810 us - - - -

The scenario tested by the microbenchmarks is not realistic. It is unlikely that real workloads are going to be running the same IndexOf on two exact same string instances repeatedly. When the microbenchmark is modified to create a fresh instance of string using string.Copy, the version in this PR is faster.

Before:

Method options Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
IndexOf OrdinalIgnoreCase 139.426 us 1.4603 us 1.3660 us 138.840 us 137.604 us 142.475 us 1.6741 - - 19.83 KB

After:

Method options Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
IndexOf OrdinalIgnoreCase 138.191 us 0.9291 us 0.8237 us 138.240 us 136.397 us 139.309 us 1.6304 - - 19.83 KB

@adamsitnik Is it worth it to update the micro-benchmarks for IndexOf and similar operations to create string clones to be more realistic?

@jkotas jkotas changed the title Remove string state Remove IsFastSort optimizations from String Sep 18, 2019
@adamsitnik
Copy link
Member

Is it worth it to update the micro-benchmarks for IndexOf and similar operations to create string clones to be more realistic?

Yes. I am going to implement more realistic benchmarks dotnet/performance#885

@adamsitnik
Copy link
Member

The Windows failures seems to be not related. I am going to re-run the tests

/azp run

@adamsitnik
Copy link
Member

adamsitnik commented Sep 18, 2019

The Linux failures might have exposed differences between the string and ReadOnlySpan<char> text operations. I am going to try one thing to see if this is true.

Edit: I did not find any significant difference for StartsWith (see dotnet/corefx#41169)

Copy link
Member

@GrabYourPitchforks GrabYourPitchforks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Tarek's comments re: improving test cases, but otherwise LGTM.

@jkotas
Copy link
Member Author

jkotas commented Sep 19, 2019

@tarekgh The following test case is failing with my changes:

IsPrefix(compareInfo: CompareInfo - , source: "o\u0308", value: "o", options: None, expected: False)

We return true with my change because of o matches and it is in the ASCII set. We never look at \u0308. How is this supposed to work? Do we need to look at the following character in source, even after we find a match?

The same bug exists in Span version of StartsWith. We have test coverage hole for the Span version of StartsWith.

@tarekgh
Copy link
Member

tarekgh commented Sep 19, 2019

@jkotas

How is this supposed to work? Do we need to look at the following character in source, even after we find a match?

ö is not equal to o so the expected result is right (which is false). We'll need to look at the next character. we can look if the next character is NonSpacingMark character but still this may not be enough as we can encounter some characters with no sorting weight which I'll not be sure what would be the behavior (I can try that). Instead, I suggest if the next char is not Ascii, then just fallback to the slow path. If you want, I can try to think in a better way if we need to avoid going to the slow path.

@jkotas
Copy link
Member Author

jkotas commented Sep 19, 2019

I suggest if the next char is not Ascii, then just fallback to the slow path.

I think that's fine. Thanks!

@jkotas
Copy link
Member Author

jkotas commented Sep 20, 2019

I am fixing the test coverage hole for Span-based globalization methods in dotnet/corefx#41224

EgorBo and others added 3 commits September 20, 2019 10:04
- Fix bugs in the Span-based globalization fast paths. Note that some of these bug fixes are going to affect performance of the globalization fast paths.
- Use Span-based globalization fast paths for strings
- Avoid static array allocation for HighCharTable

Fixes #26758
@jkotas jkotas force-pushed the remove-string-state branch from b092d28 to 4dc388e Compare September 20, 2019 17:08
@adamsitnik
Copy link
Member

@jkotas could you please wait with merging this PR until I finish dotnet/performance#892 ? I just want to make sure that this change is tracked properly in our performance reporting system. It should be a matter of hours from now

@adamsitnik
Copy link
Member

@jkotas thanks for reviewing dotnet/performance#892 , I've merged it just right now. We run all benchmarks every 4 hours and it takes +-1h to run all Unix benchmarks (they are distributed to multiple machines). So 5h from now we can merge this PR. I hope it's not a problem!

@jkotas
Copy link
Member Author

jkotas commented Sep 23, 2019

No problem. Thank you!

@jkotas jkotas merged commit 5d1f15f into dotnet:master Sep 23, 2019
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/corefx that referenced this pull request Sep 23, 2019
* Remove IsFastSort optimizations from String
- Fix bugs in the Span-based globalization fast paths. Note that some of these bug fixes are going to affect performance of the globalization fast paths.
- Use Span-based globalization fast paths for strings
- Avoid static array allocation for HighCharTable

Fixes dotnet#26758

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/corefx that referenced this pull request Sep 24, 2019
* Remove IsFastSort optimizations from String
- Fix bugs in the Span-based globalization fast paths. Note that some of these bug fixes are going to affect performance of the globalization fast paths.
- Use Span-based globalization fast paths for strings
- Avoid static array allocation for HighCharTable

Fixes dotnet#26758

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/corert that referenced this pull request Sep 24, 2019
* Remove IsFastSort optimizations from String
- Fix bugs in the Span-based globalization fast paths. Note that some of these bug fixes are going to affect performance of the globalization fast paths.
- Use Span-based globalization fast paths for strings
- Avoid static array allocation for HighCharTable

Fixes #26758

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/mono that referenced this pull request Sep 24, 2019
* Remove IsFastSort optimizations from String
- Fix bugs in the Span-based globalization fast paths. Note that some of these bug fixes are going to affect performance of the globalization fast paths.
- Use Span-based globalization fast paths for strings
- Avoid static array allocation for HighCharTable

Fixes #26758

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
marek-safar pushed a commit to mono/mono that referenced this pull request Sep 24, 2019
* Remove IsFastSort optimizations from String
- Fix bugs in the Span-based globalization fast paths. Note that some of these bug fixes are going to affect performance of the globalization fast paths.
- Use Span-based globalization fast paths for strings
- Avoid static array allocation for HighCharTable

Fixes #26758

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
stephentoub pushed a commit to dotnet/corefx that referenced this pull request Sep 25, 2019
* Remove IsFastSort optimizations from String
- Fix bugs in the Span-based globalization fast paths. Note that some of these bug fixes are going to affect performance of the globalization fast paths.
- Use Span-based globalization fast paths for strings
- Avoid static array allocation for HighCharTable

Fixes #26758

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
jkotas added a commit to dotnet/corert that referenced this pull request Sep 25, 2019
* Remove IsFastSort optimizations from String
- Fix bugs in the Span-based globalization fast paths. Note that some of these bug fixes are going to affect performance of the globalization fast paths.
- Use Span-based globalization fast paths for strings
- Avoid static array allocation for HighCharTable

Fixes #26758

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
@jkotas jkotas deleted the remove-string-state branch October 1, 2019 16:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider removing IsFastSort optimizations from String
8 participants