New SearchValues<string> API doesn't use vectorized equality comparison once it finds a candidate #96142

andrewjsaid · 2023-12-18T17:42:20Z

Problem

The new SearchValues<string> has incredible strategies to find candidates where it suspects there might be a match but once it finds a match, uses a simple loop to confirm. Specifically implementations of StringSearchValuesHelper.ICaseSensitivity use ScalarEquals which is a for loop comparing chars.

This is a missed opportunity as Vectorized equality would be perfect here, especially considering the implementations of ICaseSensitivity already are able to transform vectors with some clever optimizations of their own.

It would be unfortunate if basic uses of this API are actually slower compared to string.IndexOf(string) in certain cases.

Benchmark Code

Consider the following benchmark code run in my local copy of dotnet/performance. It's an extreme example designed to prove the point - discussion later.

[BenchmarkCategory(Categories.Libraries)]
public class SearchValuesStringBenchmarks
{
    private SearchValues<string> _searchValuesCaseInsensitive;
    private SearchValues<string> _searchValuesCaseSensitive;
    private string _text = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz";
    private string _queryUpper = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ";
    private string _queryLower;

    [GlobalSetup]
    public void Setup()
    {
        _queryUpper = _text.ToUpperInvariant();
        _queryLower = _queryUpper.ToLowerInvariant(); // Force a new string to be created
        _searchValuesCaseSensitive = SearchValues.Create([_queryLower], StringComparison.Ordinal);
        _searchValuesCaseInsensitive = SearchValues.Create([_queryUpper], StringComparison.OrdinalIgnoreCase);
    }

    [Benchmark]
    public int RegularCaseInsensitive() => _text.IndexOf(_queryUpper, StringComparison.OrdinalIgnoreCase);

    [Benchmark]
    public int OptimizedCaseInsensitive() => _text.AsSpan().IndexOfAny(_searchValuesCaseInsensitive);

    [Benchmark]
    public bool EqualsCaseInsensitive() => _text.Equals(_queryUpper, StringComparison.OrdinalIgnoreCase);

    [Benchmark]
    public int RegularCaseSensitive() => _text.IndexOf(_queryLower, StringComparison.Ordinal);

    [Benchmark]
    public int OptimizedCaseSensitive() => _text.AsSpan().IndexOfAny(_searchValuesCaseSensitive);

    [Benchmark]
    public bool EqualsCaseSensitive() => _text.Equals(_queryLower, StringComparison.Ordinal);

}

Benchmark Results

// * Summary *

BenchmarkDotNet v0.13.11-nightly.20231126.107, Windows 11 (10.0.22621.2861/22H2/2022Update/SunValley2)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 9.0.100-alpha.1.23608.3
  [Host]     : .NET 9.0.0 (9.0.23.57707), X64 RyuJIT AVX2
  Job-ULGCAN : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000  Toolchain=CoreRun  IterationTime=250.0000 ms
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1

| Method                   | Mean      | Error     | StdDev    | Median    | Min       | Max       | Allocated |
|------------------------- |----------:|----------:|----------:|----------:|----------:|----------:|----------:|
| RegularCaseInsensitive   | 15.893 ns | 0.1417 ns | 0.1184 ns | 15.849 ns | 15.815 ns | 16.195 ns |         - |
| OptimizedCaseInsensitive | 46.448 ns | 0.2249 ns | 0.2104 ns | 46.425 ns | 46.060 ns | 46.764 ns |         - |
| EqualsCaseInsensitive    |  9.194 ns | 0.1949 ns | 0.1823 ns |  9.149 ns |  8.813 ns |  9.432 ns |         - |
| RegularCaseSensitive     |  9.122 ns | 0.1665 ns | 0.1558 ns |  9.166 ns |  8.922 ns |  9.363 ns |         - |
| OptimizedCaseSensitive   | 46.394 ns | 0.4518 ns | 0.4226 ns | 46.197 ns | 45.825 ns | 47.110 ns |         - |
| EqualsCaseSensitive      |  4.295 ns | 0.0433 ns | 0.0405 ns |  4.279 ns |  4.242 ns |  4.366 ns |         - |

Analysis

It's clear that the new SearchValues API is slower than we would like in the above cases. This is because it's optimized to find the first candidate character position which in the benchmark is index 0 which negates the benefit. Thus the remainder of the work is verifying the candidate at the position, which I would guess is vectorized in the string.IndexOf/string.Equal approaches.

I also couldn't find any benchmarks of this API in dotnet/performance, so maybe I've done something wrong.

Please note I'm not trying to bash this amazing API, just to point out where I saw a possible improvement. The strategy selection and reading about Teddy was truly inspiring.

The text was updated successfully, but these errors were encountered:

ghost · 2023-12-18T17:47:13Z

Tagging subscribers to this area: @dotnet/area-system-buffers
See info in area-owners.md if you want to be subscribed.

Issue Details

Problem

The new SearchValues<string> has incredible strategies to find candidates where it suspects there might be a match but once it finds a match, uses a simple loop to confirm. Specifically implementations of StringSearchValuesHelper.ICaseSensitivity use ScalarEquals which is a for loop comparing chars.

This is a missed opportunity as Vectorized equality would be perfect here, especially considering the implementations of ICaseSensitivity already are able to transform vectors with some clever optimizations of their own.

It would be unfortunate if basic uses of this API are actually slower compared to string.IndexOf(string) in certain cases.

Benchmark Code

Consider the following benchmark code run in my local copy of dotnet/performance. It's an extreme example designed to prove the point - discussion later.

[BenchmarkCategory(Categories.Libraries)]
public class SearchValuesStringBenchmarks
{
    private SearchValues<string> _searchValuesCaseInsensitive;
    private SearchValues<string> _searchValuesCaseSensitive;
    private string _text = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz";
    private string _queryUpper = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ";
    private string _queryLower;

    [GlobalSetup]
    public void Setup()
    {
        _queryUpper = _text.ToUpperInvariant();
        _queryLower = _queryUpper.ToLowerInvariant(); // Force a new string to be created
        _searchValuesCaseSensitive = SearchValues.Create([_queryLower], StringComparison.Ordinal);
        _searchValuesCaseInsensitive = SearchValues.Create([_queryUpper], StringComparison.OrdinalIgnoreCase);
    }

    [Benchmark]
    public int RegularCaseInsensitive() => _text.IndexOf(_queryUpper, StringComparison.OrdinalIgnoreCase);

    [Benchmark]
    public int OptimizedCaseInsensitive() => _text.AsSpan().IndexOfAny(_searchValuesCaseInsensitive);

    [Benchmark]
    public bool EqualsCaseInsensitive() => _text.Equals(_queryUpper, StringComparison.OrdinalIgnoreCase);

    [Benchmark]
    public int RegularCaseSensitive() => _text.IndexOf(_queryLower, StringComparison.Ordinal);

    [Benchmark]
    public int OptimizedCaseSensitive() => _text.AsSpan().IndexOfAny(_searchValuesCaseSensitive);

    [Benchmark]
    public bool EqualsCaseSensitive() => _text.Equals(_queryLower, StringComparison.Ordinal);

}

Benchmark Results

// * Summary *

BenchmarkDotNet v0.13.11-nightly.20231126.107, Windows 11 (10.0.22621.2861/22H2/2022Update/SunValley2)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 9.0.100-alpha.1.23608.3
  [Host]     : .NET 9.0.0 (9.0.23.57707), X64 RyuJIT AVX2
  Job-ULGCAN : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000  Toolchain=CoreRun  IterationTime=250.0000 ms
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1

| Method                   | Mean      | Error     | StdDev    | Median    | Min       | Max       | Allocated |
|------------------------- |----------:|----------:|----------:|----------:|----------:|----------:|----------:|
| RegularCaseInsensitive   | 15.893 ns | 0.1417 ns | 0.1184 ns | 15.849 ns | 15.815 ns | 16.195 ns |         - |
| OptimizedCaseInsensitive | 46.448 ns | 0.2249 ns | 0.2104 ns | 46.425 ns | 46.060 ns | 46.764 ns |         - |
| EqualsCaseInsensitive    |  9.194 ns | 0.1949 ns | 0.1823 ns |  9.149 ns |  8.813 ns |  9.432 ns |         - |
| RegularCaseSensitive     |  9.122 ns | 0.1665 ns | 0.1558 ns |  9.166 ns |  8.922 ns |  9.363 ns |         - |
| OptimizedCaseSensitive   | 46.394 ns | 0.4518 ns | 0.4226 ns | 46.197 ns | 45.825 ns | 47.110 ns |         - |
| EqualsCaseSensitive      |  4.295 ns | 0.0433 ns | 0.0405 ns |  4.279 ns |  4.242 ns |  4.366 ns |         - |

Analysis

It's clear that the new SearchValues API is slower than we would like in the above cases. This is because it's optimized to find the first candidate character position which in the benchmark is index 0 which negates the benefit. Thus the remainder of the work is verifying the candidate at the position, which I would guess is vectorized in the string.IndexOf/string.Equal approaches.

I also couldn't find any benchmarks of this API in dotnet/performance, so maybe I've done something wrong.

Please note I'm not trying to bash this amazing API, just to point out where I saw a possible improvement. The strategy selection and reading about Teddy was truly inspiring.

Author:	andrewjsaid
Assignees:	-
Labels:	`area-System.Buffers`, `tenet-performance`, `untriaged`
Milestone:	-

MihaZupan · 2023-12-18T19:10:50Z

Thanks for raising the issue.
If memory serves, I was mostly looking at shorter needles (and potentially less common ones, or cases with more false positives on candidate matches), so scenarios where an inlined scalar loop can do better.

But I agree a vectorized verification does make a lot of sense for scenarios like yours.
Looking back at the PR that added the logic (#88394), I initially had an optimization that did specialize based on the length of the value, but I removed it in 2e70415 as it wasn't that beneficial for the cases I was testing with. Looks like something worth looking into again.

andrewjsaid added the tenet-performance Performance related issue label Dec 18, 2023

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Dec 18, 2023

ghost added the untriaged New issue has not been triaged by the area owner label Dec 18, 2023

vcsjones added area-System.Buffers and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Dec 18, 2023

MihaZupan mentioned this issue Jan 2, 2024

Use SearchValues<string> for prefix searching in RegexCompiler / source generator #96402

Merged

stephentoub added this to the 9.0.0 milestone Jan 2, 2024

ghost removed the untriaged New issue has not been triaged by the area owner label Jan 2, 2024

MihaZupan self-assigned this Jan 2, 2024

MihaZupan mentioned this issue Jan 3, 2024

Specialize by length in single-value SearchValues<string> #96429

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 3, 2024

MihaZupan closed this as completed in #96429 Jan 3, 2024

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 3, 2024

github-actions bot locked and limited conversation to collaborators Feb 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New SearchValues<string> API doesn't use vectorized equality comparison once it finds a candidate #96142

New SearchValues<string> API doesn't use vectorized equality comparison once it finds a candidate #96142

andrewjsaid commented Dec 18, 2023 •

edited

Loading

ghost commented Dec 18, 2023

Problem

Benchmark Code

Benchmark Results

Analysis

MihaZupan commented Dec 18, 2023

New SearchValues<string> API doesn't use vectorized equality comparison once it finds a candidate #96142

New SearchValues<string> API doesn't use vectorized equality comparison once it finds a candidate #96142

Comments

andrewjsaid commented Dec 18, 2023 • edited Loading

Problem

Benchmark Code

Benchmark Results

Analysis

ghost commented Dec 18, 2023

Problem

Benchmark Code

Benchmark Results

Analysis

MihaZupan commented Dec 18, 2023

andrewjsaid commented Dec 18, 2023 •

edited

Loading