Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special-case Enumerable.SequenceEqual for byte[] #48287

Merged
merged 1 commit into from
Feb 23, 2021

Conversation

stephentoub
Copy link
Member

[Params(1, 100, 1_000_000)]
public int Count { get; set; }

private byte[] _arr1, _arr2;

[GlobalSetup]
public void Setup()
{
    _arr1 = RandomNumberGenerator.GetBytes(Count);
    _arr2 = (byte[])_arr1.Clone();
}

[Benchmark]
public bool SequenceEquals() => _arr1.SequenceEqual(_arr2);
Method Toolchain Count Mean Ratio
SequenceEquals \master\corerun.exe 1 44.769 ns 1.00
SequenceEquals \pr\corerun.exe 1 7.577 ns 0.17
SequenceEquals \master\corerun.exe 100 622.219 ns 1.00
SequenceEquals \pr\corerun.exe 100 9.766 ns 0.02
SequenceEquals \master\corerun.exe 1000000 5,882,686.440 ns 1.000
SequenceEquals \pr\corerun.exe 1000000 32,118.099 ns 0.005

@ghost
Copy link

ghost commented Feb 15, 2021

Tagging subscribers to this area: @eiriktsarpalis
See info in area-owners.md if you want to be subscribed.

Issue Details
[Params(1, 100, 1_000_000)]
public int Count { get; set; }

private byte[] _arr1, _arr2;

[GlobalSetup]
public void Setup()
{
    _arr1 = RandomNumberGenerator.GetBytes(Count);
    _arr2 = (byte[])_arr1.Clone();
}

[Benchmark]
public bool SequenceEquals() => _arr1.SequenceEqual(_arr2);
Method Toolchain Count Mean Ratio
SequenceEquals \master\corerun.exe 1 44.769 ns 1.00
SequenceEquals \pr\corerun.exe 1 7.577 ns 0.17
SequenceEquals \master\corerun.exe 100 622.219 ns 1.00
SequenceEquals \pr\corerun.exe 100 9.766 ns 0.02
SequenceEquals \master\corerun.exe 1000000 5,882,686.440 ns 1.000
SequenceEquals \pr\corerun.exe 1000000 32,118.099 ns 0.005
Author: stephentoub
Assignees: -
Labels:

area-System.Linq

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Feb 15, 2021

It optimizes Enumerable.SequenceEqual only for bytes but it should be possible to do it for all primitives if we expose #46017

@stephentoub
Copy link
Member Author

stephentoub commented Feb 15, 2021

It optimizes Enumerable.SequenceEqual only for bytes but it should be possible to do it for all primitives if we expose #46017

That's not necessary. As noted in the code comments, we could use ROS.SequenceEqual after doing a JIT- eliminated value type and assignable from check, but I don't see a good way to convince the C# compiler to let us call it due to the generic constraint.

@EgorBo
Copy link
Member

EgorBo commented Feb 15, 2021

A possible fix I mentioned: EgorBo@6a91bf3
(it only needs a few more changes in the jit) and Enumerable.SequenceEqual will be able to use the fast ROS.SequenceEqual for all primitives without any API changes (e.g. changing constraints)

@stephentoub
Copy link
Member Author

stephentoub commented Feb 15, 2021

I don't want that level of unsafe code in LINQ :-) If this is a problem worth solving, I'd rather add a non-constrained variant to MemoryExtensions, e.g. an overload that takes a nullable comparer.

@eiriktsarpalis
Copy link
Member

How common is using Enumerable.SequenceEqual for byte arrays in particular? How performance critical are such applications in the first place? Given that both inputs need to be arrays, performance conscious users would already know to use ROS.SequenceEqual instead. For all others, wouldn't the performance inconsistency between byte arrays and all other primitive types be a source of confusion?

@benaadams
Copy link
Member

Would a new overload be worth adding to cover all arrays? Or would that drop too many where they are passed as IEnumerable<TSource> rather than TSource[]

public static bool SequenceEqual<TSource>(this TSource[] first, TSource[] second) where TSource : IEquatable<TSource>
    => ((ReadOnlySpan<TSource>)firstArr).SequenceEqual(secondArr);

@stephentoub
Copy link
Member Author

I'd rather add an ROS-based overload to MemoryExtensions that takes a comparer rather than being constrained and then light up with that here.

@stephentoub
Copy link
Member Author

How common is using Enumerable.SequenceEqual for byte arrays in particular? How performance critical are such applications in the first place?

We've hit this (and probably are still hitting this) in dotnet/runtime, in particular in our tests. In the past we've had some innocuous looking tests that did Assert.Equal(expectedResults, actualResults); where these were byte[] arrays, and the test suites ended up taking literally minutes longer than they should have, because under the covers this was doing an Enumerable.SequenceEqual on them.

For all others, wouldn't the performance inconsistency between byte arrays and all other primitive types be a source of confusion?

I would like to expand this to use MemoryExtensions for any array, but that's not feasible right now. So, yes, with this there's an inconsistency between byte[] and other types, but there's already an inconsistency between Enumerable.SequenceEqual and MemoryExtensions.SequenceEqual, and this ends up removing that inconsistency for at least one type that, in my experience, has been the most problematic, at least in our uses.

@stephentoub
Copy link
Member Author

I opened #48304.

@GrabYourPitchforks
Copy link
Member

I would like to expand this to use MemoryExtensions for any array, but that's not feasible right now.

If you have a List<byte> you can get the underlying Span<byte> via the new CollectionMarshal API, so it's possible to special-case List<T> here as well if you want. Or did I misunderstand your statement here?

@stephentoub
Copy link
Member Author

stephentoub commented Feb 15, 2021

Or did I misunderstand your statement here?

My statement was that I'd like to be able to write:

if (first is TSource[] firstArray && second is TSource[] secondArray)
{
    return ((ReadOnlySpan<TSource>)firstArray).SequenceEqual(secondArray); // or with a `, comparer` argument
}

in order to work for any array, not just byte[], but that's not possible today as here TSource is unconstrained and the T generic parameter to MemoryExtensions.SequenceEqual is constrained with where T : IEquatable<T>.

@stephentoub
Copy link
Member Author

@eiriktsarpalis, would you like me to close this PR or merge it? Either way, we can update the code if the discussed API is added.

@eiriktsarpalis
Copy link
Member

From my perspective it would be preferable if we solved this in the general case, special casing on the element type feels odd to me.

@stephentoub
Copy link
Member Author

it would be preferable if we solved this in the general case

Agreed. If we do that, we'll replace this, obviously. But if we don't end up doing that (or don't end up doing that in .NET 6), your preference would then be to do nothing here at all? If so, we can close this.

@eiriktsarpalis
Copy link
Member

I don't have a strong opinion, if it is expected to improve our test performance then it makes perfect sense to merge.

@stephentoub
Copy link
Member Author

Ok.

@stephentoub stephentoub merged commit 24f1acd into dotnet:master Feb 23, 2021
@stephentoub stephentoub deleted the sequenceequalbyte branch February 23, 2021 01:43
@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants