-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Add Span<T>.Sort(...) by changing Array.Sort internals to be Span based #24419
Conversation
ref byte byteRef = ref Unsafe.As<T, byte>(ref array.GetPinnableReference()); | ||
fixed (byte* ptr = &byteRef) | ||
{ | ||
if (Array.TrySZSort(new IntPtr(ptr), typeof(T), IntPtr.Zero, typeof(T), index, index + length - 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a way to either call TrySZSort
here for spans as suggested here
fixed (byte* keysPtr = &keysRef) | ||
fixed (byte* itemsPtr = &itemsRef) | ||
{ | ||
if (Array.TrySZSort(new IntPtr(keysPtr), typeof(TKey), new IntPtr(itemsPtr), typeof(TValue), index, index + length - 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a way to either call TrySZSort
here for spans as suggested here
This general approach makes sense to me for 3.0, as long as it can be done without regressing array sorting perf. If array perf doesn't regress, then at least we end up with something functional for spans, and it can be optimized for subsequent releases. |
I agree with @stephentoub that a simple, safe, less-optimized implementation is a good way to start on this. |
The array to Span conversion throws for co-variant array. There may be a bit of unsafe code needed to avoid throwing for Sort on co-variant arrays. |
|
||
if (length > 1) | ||
{ | ||
#if CORECLR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be fine to skip this optimization in the initial implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the #if CORECLR, note the comment though, this will mean Span sort might be different for equal keys since the different Sort variants are not 100% identical as discussed in #16986
Could you please collect some performance numbers on this? |
@stephentoub @jkotas good :) Your initial thoughts mirror mine, we are facing two issues with changing the "backbone" of sort with Span-based versions:
@jkotas yes exactly, which is why I was wondering if this was a viable path, could you expand on the "bit of unsafe code" needed? I assume we have to convert to a Span with the same type as the underlying array, but what about the custom comparers then?
@stephentoub right that is the question. Lets for now assume that the creation of the Spans for the input arrays has neglible overhead even for small numbers of elements. The biggest problem here then is whether the Span based sorting code will not regress performance. I haven't done a lot One thing I can say though, from the experience from the previous PRs, if Span API methods does not call |
@jkotas I assume there are existing benchmarks in place I can use or? |
I think we need to convert
There is basic one in https://github.com/dotnet/performance repo: |
@jkotas I should have updated to do that for all array to span conversion. Let me know if it looks ok. :) |
src/System.Private.CoreLib/shared/System/Collections/Generic/ArraySortHelper.cs
Show resolved
Hide resolved
src/System.Private.CoreLib/shared/System/Collections/Generic/ArraySortHelper.cs
Outdated
Show resolved
Hide resolved
@jkotas now that I have added the proper Span Sort(...) methods, the first issues arise. And things are coming back to me from a year ago :) The span Sort methods take a Overall, the array sorting API and code is not exactly consistent in API surface and underlying code across situations. The Span sort API is. But we then face how to use existing sort code and if we can change that or have to add/duplicate sorting code, which means bigger changes. How do you think we should proceed? I think we should limit changes and do initial benchmarks as soon as possible to be sure these changes do not lead to performance regressions. But would like at least to have "complete" code that supports both Array and Span APIs so changes needed for that are present and hence benchmarked. |
That's because delegate calls are slightly faster than interface calls. I suppose you can replace the delegate with That said, the whole thing with |
@jkotas ah it appears the master had changed which was the reason for nullables changes, I have merged with latest master to resolve conflict. |
ci build fails with:
i.e. |
I also have problems building locally after merging with master... even
|
@@ -1578,7 +1578,9 @@ public static void Sort<T>(T[] array, int index, int length, System.Collections. | |||
return; | |||
} | |||
|
|||
ArraySortHelper<TKey, TValue>.Default.Sort(keys, items, index, length, comparer); | |||
var spanKeys = new Span<TKey>(ref Unsafe.As<byte, TKey>(ref keys.GetRawArrayData()), keys.Length); | |||
var spanItems = new Span<TValue>(ref Unsafe.As<byte, TValue>(ref items!.GetRawArrayData()), items!.Length); // TODO-NULLABLE: https://github.com/dotnet/csharplang/issues/538 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: please change the comment to:
// TODO-NULLABLE: Remove ! when [DoesNotReturn] respected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -154,7 +152,8 @@ private static void SwapIfGreater(T[] keys, Comparison<T> comparer, int a, int b | |||
} | |||
} | |||
|
|||
private static void Swap(T[] a, int i, int j) | |||
[MethodImpl(MethodImplOptions.AggressiveInlining)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this was found to important on Swap, I'm curious that it wasn't found to be important on SwapIfGreater? Is that because SwapIfGreater already wasn't getting inlined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because SwapIfGreater already wasn't getting inlined?
@stephentoub yes. And depending on input SwapIfGreater
is not called nearly as often. Depends a lot on specific input data "median of three killer sequence" would probably change that statement. There are plenty of things that can be done to improve the performance of existing sorting code, as my previous PRs did. Although, .NET Core has also moved since so some things might no longer be needed.
|
||
/// <summary> | ||
/// Sorts the elements in the entire <see cref="Span{T}" /> | ||
/// using the <see cref="IComparable" /> implementation of each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This is going to link to the non-generic IComparable rather than the generic one. Same for other occurrences.
/// Sorts the elements in the entire <see cref="Span{T}" /> | ||
/// using the <typeparamref name="TComparer" />. | ||
/// </summary> | ||
public static void Sort<T, TComparer>(this Span<T> span, TComparer comparer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be TComparer? comparer
instead of TComparer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually the desired / approved shape, having Sort be generic on both the T and the comparer type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved shape, having Sort be generic on both the T and the comparer type?
@stephentoub yes, this is the approved API. It was specifically designed to allow for value type comparers and hence specific code paths for that. Yes that means "code bloat" that is the purpose of it. :) In fact, it was a specific design goal of mine for the API after struggling for years with the restrictive and somewhat inconsistent Array.Sort
API, where we have often had to simply resort to put stl::sort
in a native dll and interop to it, given we had native memory and a memory copy would be too expensive, and had specific sorting needs.
There already is specific code paths and "code bloat" for value types in current sorting code for IComparable<T>
where T : struct
, nothing new here, this simply adds orthogonality to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was specifically designed to allow for value type comparers and hence specific code paths for that
Is that really a common case? This implementation also doesn't appear to be benefiting from that at all, and worse, hides the fact that it's allocating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that really a common case?
@stephentoub in my view yes. I have seen enough Array.Reverse
too. In my view there is no downside to this, most users will simply use Sort()
or Sort((x, y) => x.CompareTo(y))
, advanced users who need this can use the value type comparer.
doesn't appear to be benefiting from that at all, and worse, hides the fact that it's allocating.
@stephentoub we agreed to do this step wise. One step at a time. I intend to fix this in later PRs, this will require code duplication etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephentoub note that Array.Sort
already does allocs for a custom comparer, converting it to Comparison<T>
. I have noted this before and shown this in benchmarks. Can't run the latest ones now since I can't build coreclr anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be best to omit this shape of the API from this step.
The value prop of this special API is better performance. If the implementation does not actually provide better performance, it is just going to cause confusion.
/// One or more elements do not implement the <see cref="IComparable" /> interface. | ||
/// </exception> | ||
public static void Sort<T>(this Span<T> span) => | ||
Sort(span, (IComparer<T>)null!); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This !
shouldn't be necessary once the below change is made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(It'll need to be (IComparer?)
for the cast.)
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.comparison); | ||
if (span.Length > 1) | ||
{ | ||
ArraySortHelper<T>.Sort(span, 0, span.Length, comparison!); // TODO-NULLABLE: https://github.com/dotnet/csharplang/issues/538 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment nit as earlier.
/// element of the <see cref= "Span{TKey}"/>. | ||
/// </summary> | ||
public static void Sort<TKey, TValue>(this Span<TKey> keys, Span<TValue> items) => | ||
Sort(keys, items, (IComparer<TKey>)null!); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, the ! shouldn't be necessary.
/// using the <typeparamref name="TComparer" />. | ||
/// </summary> | ||
public static void Sort<TKey, TValue, TComparer>(this Span<TKey> keys, | ||
Span<TValue> items, TComparer comparer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, this TComparer comparer
should be TComparer? comparer
.
{ | ||
// TODO: Add Comparison<T> overload that is not available in existing Sort code. | ||
// Hence Comparison<T> is wrapped in ComparisonComparer<T> for now. | ||
var comparisonComparer = new ComparisonComparer<TKey>(comparison!); // TODO-NULLABLE: https://github.com/dotnet/csharplang/issues/538 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment nit.
// might differ in both performance and sorting for equal keys | ||
|
||
// TODO: Add TComparer support until then value type comparers will be boxed | ||
// and performance will be affected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a TODO for when... as part of this PR, or subsequently? If subsequently, please make sure there's an open issue for it and include the issue # here in the TODO. But this also contributes to my question about whether the TComparer needs to be generic.
{ | ||
// Span based Sort does not call `TrySZSort`, but instead | ||
// uses the managed generic code only. This means Span based Sort | ||
// might differ in both performance and sorting for equal keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't affect this PR, but have you measure to see what the difference is between an array-based sort that does use TrySZSort and a span-based sort that doesn't? How far off are they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephentoub comparing int
vs IntStruct
can indicate this (showing the reverse in this case).
int
Method | Toolchain | Size | Mean | Error | StdDev | Median | Min | Max | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Array | \coreclr-base | 512 | 5.665 us | 0.0367 us | 0.0343 us | 5.670 us | 5.608 us | 5.725 us | 1.00 | 0.00 | - | - | - | - |
Array | \coreclr | 512 | 5.697 us | 0.0393 us | 0.0368 us | 5.693 us | 5.636 us | 5.779 us | 1.01 | 0.01 | - | - | - | - |
IntStruct
Method | Toolchain | Size | Mean | Error | StdDev | Median | Min | Max | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Array | \coreclr-base\ | 512 | 3.785 us | 0.1062 us | 0.1222 us | 3.714 us | 3.684 us | 4.021 us | 1.00 | 0.00 | - | - | - | - |
Array | \coreclr\ | 512 | 3.641 us | 0.0299 us | 0.0280 us | 3.646 us | 3.581 us | 3.701 us | 0.96 | 0.03 | - | - | - | - |
Quite a difference here in favor of the managed code path.
Note that this is not just about perf, native sorting code does a NaNPrepass
(if I remember correctly) for floats, that the span version would then not do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above is for
BenchmarkDotNet=v0.11.3.1003-nightly, OS=Windows 10.0.17763.503 (1809/October2018Update/Redstone5)
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.0.100-preview5-011568
[Host] : .NET Core 3.0.0-preview5-27626-15 (CoreCLR 4.6.27622.75, CoreFX 4.700.19.22408), 64bit RyuJIT
Job-JTKJDV : .NET Core ? (CoreCLR 4.6.27803.0, CoreFX 4.700.19.22901), 64bit RyuJIT
Job-DBBNCI : .NET Core ? (CoreCLR 4.6.27808.0, CoreFX 4.700.19.22901), 64bit RyuJIT
Runtime=Core InvocationCount=10000 IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 UnrollFactor=1
WarmupCount=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephentoub you can see some more benchmarks in the gist here:
https://gist.github.com/nietras/ccf8199a9722a396731a0597d55916bf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephentoub after merging with master I have rerun the benchmarks with higher invocation count (25000). This is more inline with what I expected with regards to native (int
) vs managed code (IntStruct
). native being a bit faster with the caveats that this is based on using IntStruct
as a replacement for int
.
coreclr-base = 5af77fa (master)
coreclr = 11dd34a (from this PR branch)
Int32
BenchmarkDotNet=v0.11.3.1003-nightly, OS=Windows 10.0.17763.503 (1809/October2018Update/Redstone5)
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.0.100-preview5-011568
[Host] : .NET Core 3.0.0-preview5-27626-15 (CoreCLR 4.6.27622.75, CoreFX 4.700.19.22408), 64bit RyuJIT
Job-VKBMMT : .NET Core ? (CoreCLR 4.700.19.30301, CoreFX 4.700.19.27908), 64bit RyuJIT
Job-SABOCD : .NET Core ? (CoreCLR 4.700.19.30301, CoreFX 4.700.19.27908), 64bit RyuJIT
Runtime=Core InvocationCount=25000 IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 UnrollFactor=1
WarmupCount=1
Method | Size | Mean | Error | StdDev | Median | Min | Max | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Array | \coreclr-base | 512 | 3.495 us | 0.1296 us | 0.1387 us | 3.469 us | 3.323 us | 3.753 us | 1.00 | 0.00 | - | - | - | - |
Array | \coreclr | 512 | 3.640 us | 0.2213 us | 0.2460 us | 3.573 us | 3.317 us | 4.222 us | 1.05 | 0.08 | - | - | - | - |
Array_ComparerClass | \coreclr-base | 512 | 19.367 us | 0.1069 us | 0.0948 us | 19.357 us | 19.207 us | 19.560 us | 1.00 | 0.00 | - | - | - | 64 B |
Array_ComparerClass | \coreclr | 512 | 19.474 us | 0.1196 us | 0.1060 us | 19.452 us | 19.321 us | 19.695 us | 1.01 | 0.01 | - | - | - | 64 B |
Array_ComparerStruct | \coreclr-base | 512 | 22.870 us | 0.1593 us | 0.1490 us | 22.820 us | 22.696 us | 23.192 us | 1.00 | 0.00 | - | - | - | 88 B |
Array_ComparerStruct | \coreclr | 512 | 23.718 us | 0.0634 us | 0.0593 us | 23.709 us | 23.614 us | 23.859 us | 1.04 | 0.01 | - | - | - | 88 B |
Array_Comparison | \coreclr-base | 512 | 18.615 us | 0.0971 us | 0.0861 us | 18.598 us | 18.519 us | 18.826 us | 1.00 | 0.00 | - | - | - | - |
Array_Comparison | \coreclr | 512 | 19.760 us | 0.2582 us | 0.2415 us | 19.647 us | 19.556 us | 20.369 us | 1.06 | 0.01 | - | - | - | - |
List | \coreclr-base | 512 | 3.385 us | 0.0916 us | 0.0980 us | 3.335 us | 3.298 us | 3.607 us | 1.00 | 0.00 | - | - | - | - |
List | \coreclr | 512 | 3.759 us | 0.4917 us | 0.5465 us | 3.534 us | 3.279 us | 5.224 us | 1.10 | 0.17 | - | - | - | - |
IntStruct
BenchmarkDotNet=v0.11.3.1003-nightly, OS=Windows 10.0.17763.503 (1809/October2018Update/Redstone5)
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.0.100-preview5-011568
[Host] : .NET Core 3.0.0-preview5-27626-15 (CoreCLR 4.6.27622.75, CoreFX 4.700.19.22408), 64bit RyuJIT
Job-VKBMMT : .NET Core ? (CoreCLR 4.700.19.30301, CoreFX 4.700.19.27908), 64bit RyuJIT
Job-SABOCD : .NET Core ? (CoreCLR 4.700.19.30301, CoreFX 4.700.19.27908), 64bit RyuJIT
Runtime=Core InvocationCount=25000 IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 UnrollFactor=1
WarmupCount=1
Method | Size | Mean | Error | StdDev | Median | Min | Max | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Array | \coreclr-base | 512 | 3.729 us | 0.0498 us | 0.0416 us | 3.731 us | 3.635 us | 3.786 us | 1.00 | 0.00 | - | - | - | - |
Array | \coreclr | 512 | 3.925 us | 0.2357 us | 0.2522 us | 3.913 us | 3.654 us | 4.505 us | 1.07 | 0.08 | - | - | - | - |
Array_ComparerClass | \coreclr-base | 512 | 20.788 us | 0.0535 us | 0.0474 us | 20.782 us | 20.669 us | 20.867 us | 1.00 | 0.00 | - | - | - | 64 B |
Array_ComparerClass | \coreclr | 512 | 21.212 us | 0.0889 us | 0.0788 us | 21.192 us | 21.101 us | 21.398 us | 1.02 | 0.00 | - | - | - | 64 B |
Array_ComparerStruct | \coreclr-base | 512 | 23.963 us | 0.0966 us | 0.0856 us | 23.949 us | 23.830 us | 24.183 us | 1.00 | 0.00 | - | - | - | 88 B |
Array_ComparerStruct | \coreclr | 512 | 22.890 us | 0.1253 us | 0.1047 us | 22.894 us | 22.684 us | 23.034 us | 0.96 | 0.01 | - | - | - | 88 B |
Array_Comparison | \coreclr-base | 512 | 20.161 us | 0.1313 us | 0.1164 us | 20.157 us | 19.995 us | 20.401 us | 1.00 | 0.00 | - | - | - | - |
Array_Comparison | \coreclr | 512 | 21.104 us | 0.0972 us | 0.0910 us | 21.101 us | 20.995 us | 21.292 us | 1.05 | 0.01 | - | - | - | - |
List | \coreclr-base | 512 | 4.891 us | 1.0011 us | 1.1528 us | 4.679 us | 3.741 us | 7.906 us | 1.00 | 0.00 | - | - | - | - |
List | \coreclr | 512 | 4.047 us | 0.1858 us | 0.1988 us | 4.076 us | 3.765 us | 4.503 us | 0.84 | 0.17 | - | - | - | - |
Are you missing
You can try deleting |
@jkotas @stephentoub added a gist with the latest benchmark run (without string). https://gist.github.com/nietras/c206d4a7d9f173cc9271eabc8d01b1e6 This is for: coreclr-base = 5af77fa (master) Notably,
I'll try to run it with more invocations later at some point. |
I'm taking a look at fixing this up. |
@@ -1578,7 +1578,9 @@ public static void Sort<T>(T[] array, int index, int length, System.Collections. | |||
return; | |||
} | |||
|
|||
ArraySortHelper<TKey, TValue>.Default.Sort(keys, items, index, length, comparer); | |||
var spanKeys = new Span<TKey>(ref Unsafe.As<byte, TKey>(ref keys.GetRawArrayData()), keys.Length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be GetRawSzArrayData()
@stephentoub sounds good :) It took a long time before the changes/fixes to benchmarks got added to https://github.com/dotnet/performance/ and at that time I unfortunately did not have time to revisit this. Sorry about that. Let me know if you have any questions or need review or similar :) |
After the exercise in futility of my previous PRs #16986 (and dotnet/corefx#26859), where I tried adding
Sort(...)
in as performant a way as possible in C# code only (usingref
s throughout and with the intent to replace existing array sort code too), and these being declined due to security concerns, this PR tries to addSort(...)
for spans by doing as minimal as possible changes to existingArray.Sort
code while trying to avoid yet another duplication of sorting code.The PR starts with exploring whether these minimal changes are a valid path forward, hence changes are cursory (just to explore the path forward). What is missing is whether it is possible to make an overload for
TrySZSort
that works forSpan<T>
, is it?CC: @danmosemsft @jkotas
This is a work in progress and here are some TODOs:
https://github.com/dotnet/performance/blob/2296c92879ed0744cec9b983ac28ac208e7018bd/src/benchmarks/micro/corefx/System.Collections/Sort.cs#L40 .
TComparer
forTKey
only (currently based onComparison<TKey>
)Comparison<TKey>
forTKey, TValue
case (no such version exists)ComparisonComparer<T>
, later PR can add proper support