String.StartsWith slower on Linux with some characters #30716

kevingosse · 2019-08-29T16:28:44Z

string.StartsWith on Linux becomes 2 orders of magnitude slower when the string contains a dash (-).

On Linux:

BenchmarkDotNet=v0.11.3, OS=centos 7
Intel Xeon CPU E5-2630L v3 1.80GHz, 2 CPU, 32 logical and 16 physical cores
  [Host]     : .NET Core 3.0.0-preview8-28405-07 (CoreCLR 4.700.19.37902, CoreFX 4.700.19.40503), 64bit RyuJIT
  Job-UBBGCZ : .NET Core 3.0.0-preview8-28405-07 (CoreCLR 4.700.19.37902, CoreFX 4.700.19.40503), 64bit RyuJIT

Runtime=Core  Toolchain=netcoreapp3.0

         Method |        Mean |      Error |     StdDev |
--------------- |------------:|-----------:|-----------:|
     StartsWith |    35.79 ns |  0.1069 ns |  0.0948 ns |
 StartsWithDash | 4,411.13 ns | 35.0054 ns | 29.2311 ns |

On Windows (only for reference, the hardware is not the same):

BenchmarkDotNet=v0.11.3, OS=Windows 10.0.18362
Intel Xeon CPU E3-1271 v3 3.60GHz, 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-preview8-013656
  [Host]     : .NET Core 3.0.0-preview8-28405-07 (CoreCLR 4.700.19.37902, CoreFX 4.700.19.40503), 64bit RyuJIT
  DefaultJob : .NET Core 3.0.0-preview8-28405-07 (CoreCLR 4.700.19.37902, CoreFX 4.700.19.40503), 64bit RyuJIT


         Method |     Mean |     Error |    StdDev |
--------------- |---------:|----------:|----------:|
     StartsWith | 69.42 ns | 0.2523 ns | 0.2236 ns |
 StartsWithDash | 69.47 ns | 1.4200 ns | 1.6904 ns |

Benchmark code:

    public class StartsWithBenchmark
    {
        private string _str1 = "aaaaaaaaaz";
        private string _str2 = "aaaaaaaaa-";

        [Benchmark]
        public bool StartsWith()
        {
            return _str1.StartsWith("i");
        }

        [Benchmark]
        public bool StartsWithDash()
        {
            return _str2.StartsWith("i");
        }
    }

The performance issue does not occur if using ordinal comparison.

The text was updated successfully, but these errors were encountered:

kevingosse · 2019-08-29T16:58:20Z

It seems to affect characters outside of the [9], [11-12], [32-38], [40-44], and [47-126] ranges.

danmoseley · 2019-08-29T17:11:25Z

@krwq that's an extraordinary multiplier. I am curious what this is about.

krwq · 2019-08-29T18:07:17Z

Yes this is interesting. @kevingosse what's you current culture set to? Looking at the code it behaves differently depending on that.

@tarekgh do you know if we (or ICU) have any special optimizations for some ranges?

tarekgh · 2019-08-29T18:14:54Z

I believe Linux treat '-' with some sort weights which can affect the operation with different cultures. on Windows, '-' is treated as normal ASCII and we optimize the call as ordinal at that time.

jkotas · 2019-08-29T18:15:07Z

do you know if we (or ICU) have any special optimizations for some ranges?

We do have special optimization for specific ranges. Look for IsFastSort in https://github.com/dotnet/coreclr/blob/52651008a5fb72eb678467cd2eb42aac4e5334e8/src/System.Private.CoreLib/shared/System/Globalization/CompareInfo.Unix.cs#L242 . - prevents us from taking this fast path because of https://github.com/dotnet/coreclr/blob/ab2f9caaf35c96d029b96aa171ee65d04253cf7c/src/utilcode/util_nodependencies.cpp#L910

This "optimization" is used on Unix only. We used to have it on Windows as well, but removed it there because of it was not very helpful.

One way to fix this problem is to remove this "optimization" on Unix too (PR: dotnet/coreclr#24400) and replace it with fast path and slow path with smaller degradation.

tarekgh · 2019-08-29T18:18:36Z

we need to be careful here as there is some characters in ASCII range is treated differently than Windows in ICU. trying to change that will start deviate us from the ICU sorting behavior and may not be a good thing to do.

tarekgh · 2019-08-29T18:21:19Z

Also, in such corner cases, it is usually better to use the ordinal operations and not the linguistic operations.

kevingosse · 2019-08-29T19:03:33Z

Yes this is interesting. @kevingosse what's you current culture set to? Looking at the code it behaves differently depending on that.

en-US. With other cultures the performance is bad with any set of character but this is a known issue since 1.0 (https://github.com/dotnet/coreclr/issues/5612)

Also, in such corner cases, it is usually better to use the ordinal operations and not the linguistic operations.

Yes, we're currently adding StringComparer.Ordinal everywhere in our codebase as this performance drop is a blocker for us. The problem is that developers on .net framework or .net core/windows aren't really used to adding StringComparer.Ordinal, since the performance of culture-based string comparisons is pretty good on Windows (so it's mostly seen as a micro optimization). And then it breaks when we start using those bits on Linux.
I wish there was a way to default all string comparisons to ordinal. There is the invariant mode, but it also breaks explicit culture-based comparisons, which we need in some parts of our code.

tarekgh · 2019-08-29T21:11:48Z

Note that this issue happens only in case of using some specific characters in the ASCII range while the whole string is ASCII too. These characters are mainly control characters which not really used in main stream cases. The only characters that is interesting are the - and '. So we are talking about a corner case here and not really main stream scenarios.

You can look at the special characters here https://github.com/dotnet/coreclr/blob/ab2f9caaf35c96d029b96aa171ee65d04253cf7c/src/utilcode/util_nodependencies.cpp#L856

mnmr · 2019-08-30T06:18:32Z

@tarekgh It’s not truly helpful to simply label this as a corner case. People expect consistent performance and if you have a method that is suddenly magnitudes slower due to certain characters being used it’s like dropping time bombs into peoples code.

tarekgh · 2019-08-30T16:30:15Z

People expect consistent performance and if you have a method that is suddenly magnitudes slower due to certain characters being used it’s like dropping time bombs into peoples code.

How this is different than inserting non ASCII character in the string even on Windows? you will get the same performance hit. Again, this is why we provide Ordinal option to let users control the behavior.

The issue here is the correctness or sort behavior against the perf. we can fix the perf by not specializing the hyphen on Linux but we should be clear this for sure can be a problem for some expected sort behavior. we can try to find a middle ground here which may work but we need to detect the cases we need to allow ordinal even when hyphen exist. just need some more investigation.

tarekgh · 2019-08-30T18:32:06Z

Looking more, looks on Windows we don't even try to optimize for ASCII cases and always call the underlying OS. so what @jkotas mentioned before looks reasonable to try.

filipnavara · 2019-09-01T20:05:26Z

On a related note, we are running dotnet/performance benchmarks on CoreCLR and Mono/netcore with LLVM JIT. I was investigating the spots where Mono performed especially poorly. One of them is the Perf_String.Contains benchmark with StringComparison.CurrentCultureIgnoreCase. It eventually hits fast-sort the optimization here.

The difference in the benchmark between Mono (which does not have this code path) and CoreCLR (which has it) was quite enormous:

Slower	diff/base	Base Median (ns)	Diff Median (ns)
System.Tests.Perf_String.StartsWith(size: 1000)	714.91	37.47	26784.84
System.Tests.Perf_String.StartsWith(size: 100)	311.68	26.41	8233.04
System.Tests.Perf_String.Contains(text: "This is a very nice sentence", value: "	41.56	190.40	7913.76
System.Tests.Perf_String.Contains(text: "This is a very nice sentence", value: "	27.03	276.06	7462.35

However, the numbers are actually lying because the IsFastSort/IsAscii optimization is computed during the benchmark initialization and not accounted for in the benchmark run. To accurately measure the impact we would likely have to get rid of the cache in the object header first and see what happens. Another question is whether people run StartsWith/Contains on the same string more than once.

kevingosse · 2019-09-02T12:33:02Z

Looking more, looks on Windows we don't even try to optimize for ASCII cases and always call the underlying OS. so what @jkotas mentioned before looks reasonable to try.

Unless I misunderstand, @jkotas's suggestion assumes that the perf disparity comes from the IsFastSort check before fallbacking to the slow path. I think the whole problem is how incredibly slow the ICU fallback actually is.

I've done a few more benchmarks to get more data. I'm measuring the cost of return string.Concat(new string('a', length), "-").StartsWith("i"); (length being the number appended to the name of the benchmark). I'm creating a new string every time to make sure we don't fall into the corner-case mentioned by @filipnavara. The CreateString benchmark just measures the cost of the creation of the string, to make sure it's negligible.

                Method |         Mean |      Error |     StdDev |
---------------------- |-------------:|-----------:|-----------:|
        CreateString_1 |     34.68 ns |  0.0902 ns |  0.0753 ns |
      CreateString_800 |    452.03 ns |  1.4944 ns |  1.3979 ns |
          StartsWith_1 |     76.85 ns |  0.4202 ns |  0.3725 ns |
        StartsWith_800 |  1,095.46 ns |  4.7791 ns |  4.4703 ns |
 StartsWithOrdinal_800 |  1,085.14 ns |  6.9067 ns |  6.1226 ns |
      StartsWithDash_1 |  4,269.29 ns | 29.9301 ns | 27.9966 ns |
     StartsWithDash_80 |  7,548.37 ns | 57.8701 ns | 51.3003 ns |
    StartsWithDash_160 | 10,363.72 ns | 45.1756 ns | 42.2573 ns |
    StartsWithDash_800 | 34,426.77 ns | 74.0593 ns | 65.6516 ns |

We see that StartsWith_800 and StartsWithOrdinal_800 are pretty much as fast (10ns absolute difference, I'd say 20ns if we take the worst of the confidence interval). So the cost of the prior IsFastSort is negligible.

On the other hand, we see how the performance degrades with the length of the string. We have an initial 4µs cost whenever we take the slow path, no matter the length of the string. Then we see the cost increasing linearly (3µs for 80, 6µs for 160, 30µs for 800).

So we really have two issues:

The initial cost of 4µs is ridiculously expensive. I see there has been a lot of optimizations on the ICU path, but clearly there's still something wrong
The performance shouldn't degrade linearly with the length of the string, since the first character doesn't match (so StartsWith should exit immediately). Looking at the implementation, I'd say it's because GlobalizationNative_StartsWith is implemented with the same logic as an IndexOf operation, and therefore will inspect the full string until the pattern is found. The ICU API is a mess, but I'm sure we can find a more efficient way

filipnavara · 2019-09-02T12:40:29Z

The performance shouldn't degrade linearly with the length of the string, since the first character doesn't match (so StartsWith should exit immediately). Looking at the implementation, I'd say it's because GlobalizationNative_StartsWith is implemented with the same logic as an IndexOf operation, and therefore will inspect the full string until the pattern is found. The ICU API is a mess, but I'm sure we can find a more efficient way

I am sure there are some edge cases I didn't consider but it should be relatively easy to implement GlobalizationNative_StartsWith in terms of ucol_strcoll call.

adamsitnik · 2019-09-02T20:20:23Z

I've profiled following app:

class Program
{
    static void Main()
    {
        while (true)
        {
            Consume(string.Concat(new string('a', 512), "-").StartsWith("i"));
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void Consume<T>(in T _) { }
}

with PerfCollect and the most expensive methods are:

Name	Exc %	Exc	Exc Ct	Inc %	Inc
libicui18n.so.60.2!unknown	22.1	6,811	6,811	22.2	6,834
libicui18n.so.60.2!icu_60::CollationElementIterator::next	14.2	4,383	4,383	14.3	4,388
libicui18n.so.60.2!icu_60::UCollationPCE::processCE	9.0	2,755	2,755	9.0	2,757
libicui18n.so.60.2!icu_60::UCollationPCE::nextProcessed	8.5	2,615	2,615	8.5	2,617
libicui18n.so.60.2!usearch_search_60	7.0	2,154	2,154	7.0	2,157
libicui18n.so.60.2!icu_60::UTF16CollationIterator::handleNextCE32	6.5	2,005	2,005	6.5	2,006
libicuuc.so.60.2!u_strlen_60	6.5	1,984	1,984	6.5	1,986
libc-2.27.so!unknown	3.3	1,029	1,029	3.4	1,054
libicui18n.so.60.2!icu_60::CollationElementIterator::getOffset	2.5	777	777	2.5	778
libicuuc.so.60.2!unknown	1.9	596	596	1.9	597
libicui18n.so.60.2!icu_60::UTF16CollationIterator::getOffset	1.7	513.291	514	1.7	514.291
libcoreclr.so!StringObject::InternalCheckHighChars	1.3	399	399	1.3	399
libicui18n.so.60.2!usearch_openFromCollator_60	1.1	349	349	1.1	349
libc-2.27.so!cfree	0.9	280	280	0.9	280
libc-2.27.so!malloc	0.9	263	263	0.9	263
libicui18n.so.60.2!ucol_primaryOrder_60	0.7	214	214	0.7	214
libicui18n.so.60.2!icu_60::PCEBuffer::reset	0.7	204	204	0.7	204
libicui18n.so.60.2!ucol_tertiaryOrder_60	0.6	196	196	0.6	197
libicui18n.so.60.2!ucol_secondaryOrder_60	0.6	190	190	0.6	190
libicuuc.so.60.2!icu_60::CharString::append	0.5	165	165	0.5	165

So IsFastSort is not a problem, but the ICU part is.

adamsitnik · 2019-09-02T20:21:13Z

@tarekgh is there any reason why we should not implement StartsWith in the following way:

bool StartsWith(string source, string prefix, StringComparison stringComparison)
  => CompareString(source.AsSpan(start: 0, length: prefix.Length), prefix, stringComparison);

Edit: nevermind, I've got an answer from @kevingosse in dotnet/coreclr#26481 ;)

tarekgh · 2019-09-03T16:24:22Z

is there any reason why we should not implement StartsWith in the following way:

@kevingosse is right this will not work except for ordinal cases.

tarekgh · 2019-09-03T16:29:39Z

@kevingosse thanks for you measurements. I believe @adamsitnik PR is going to help some with the StartsWith scenario.

adamsitnik · 2019-09-30T17:33:19Z

dotnet/coreclr#26759 and dotnet/coreclr#26621 combined together have fixed this problem.

Fun fact: while working on improving the performance of StartsWith on Linux we have found and fixed an 18 year old bug in ICU unicode-org/icu#840 ;)

adamsitnik self-assigned this Sep 3, 2019

adamsitnik closed this as completed Sep 30, 2019

msftgits transferred this issue from dotnet/corefx Feb 1, 2020

msftgits added this to the 5.0 milestone Feb 1, 2020

ghost locked as resolved and limited conversation to collaborators Dec 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String.StartsWith slower on Linux with some characters #30716

String.StartsWith slower on Linux with some characters #30716

kevingosse commented Aug 29, 2019

kevingosse commented Aug 29, 2019 •

edited

Loading

danmoseley commented Aug 29, 2019

krwq commented Aug 29, 2019

tarekgh commented Aug 29, 2019

jkotas commented Aug 29, 2019 •

edited

Loading

tarekgh commented Aug 29, 2019

tarekgh commented Aug 29, 2019

kevingosse commented Aug 29, 2019 •

edited

Loading

tarekgh commented Aug 29, 2019

mnmr commented Aug 30, 2019

tarekgh commented Aug 30, 2019

tarekgh commented Aug 30, 2019

filipnavara commented Sep 1, 2019

kevingosse commented Sep 2, 2019 •

edited

Loading

filipnavara commented Sep 2, 2019

adamsitnik commented Sep 2, 2019

adamsitnik commented Sep 2, 2019 •

edited

Loading

tarekgh commented Sep 3, 2019

tarekgh commented Sep 3, 2019

adamsitnik commented Sep 30, 2019

String.StartsWith slower on Linux with some characters #30716

String.StartsWith slower on Linux with some characters #30716

Comments

kevingosse commented Aug 29, 2019

kevingosse commented Aug 29, 2019 • edited Loading

danmoseley commented Aug 29, 2019

krwq commented Aug 29, 2019

tarekgh commented Aug 29, 2019

jkotas commented Aug 29, 2019 • edited Loading

tarekgh commented Aug 29, 2019

tarekgh commented Aug 29, 2019

kevingosse commented Aug 29, 2019 • edited Loading

tarekgh commented Aug 29, 2019

mnmr commented Aug 30, 2019

tarekgh commented Aug 30, 2019

tarekgh commented Aug 30, 2019

filipnavara commented Sep 1, 2019

kevingosse commented Sep 2, 2019 • edited Loading

filipnavara commented Sep 2, 2019

adamsitnik commented Sep 2, 2019

adamsitnik commented Sep 2, 2019 • edited Loading

tarekgh commented Sep 3, 2019

tarekgh commented Sep 3, 2019

adamsitnik commented Sep 30, 2019

kevingosse commented Aug 29, 2019 •

edited

Loading

jkotas commented Aug 29, 2019 •

edited

Loading

kevingosse commented Aug 29, 2019 •

edited

Loading

kevingosse commented Sep 2, 2019 •

edited

Loading

adamsitnik commented Sep 2, 2019 •

edited

Loading