-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Add structural equality and order comparison for common collections #77209
Comments
Tagging subscribers to this area: @dotnet/area-system-collections Issue DetailsBackground and motivationThis proposal attempts to consolidate a number of issues requesting API Proposalnamespace System.Collections.Generic;
public static class EqualityComparer
{
// Equality comparison using sequence equality
public static IEqualityComparer<IEnumerable<T>> CreateEnumerableComparer<T>(IEqualityComparer<T> elementComparer = null);
public static IEqualityComparer<ReadOnlyMemory<T>> CreateMemoryComparer<T>(IEqualityComparer<T> elementComparer = null);
public static IEqualityComparer<IReadOnlyList<T>> CreateListComparer<T>(IEqualityComparer<T> elementComparer = null);
// Equality comparison à la HashSetEqualityComparer
// cf. https://github.com/dotnet/runtime/blob/5ef4c68b08b5f55df91ce62cca86babfdb321162/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSetEqualityComparer.cs
public static IEqualityComparer<IReadOnlySet<T>> CreateSetComparer<T>(IEqualityComparer<T> elementComparer = null);
public static IEqualityComparer<IReadOnlyDictionary<TKey, TValue>> CreateDictionaryComparer<TKey, TValue>(IEqualityComparer<TKey> keyComparer= null, IEqualityComparer<TValue> valueComparer = null);
}
public static class Comparer
{
// Comparison using Lexicographic ordering
public static IComparer<IEnumerable<T>> CreateEnumerableComparer<T>(IComparer<T> elementComparer = null);
public static IComparer<ReadOnlyMemory<T>> CreateMemoryComparer<T>(IComparer<T> elementComparer = null);
public static IComparer<IReadOnlyList<T>> CreateListComparer<T>(IComparer<T> elementComparer = null);
// Optional additions -- cheap (O(n) time, O(1) space) when specialized to SortedSet/SortedDictionary and friends,
// expensive (O(n log n) time, O(n) space) for hashtable implementations.
public static IComparer<IReadOnlySet<T>> CreateSetComparer<T>(IComparer<T> elementComparer = null);
public static IComparer<IReadOnlyDictionary<TKey, TValue>> CreateDictionaryComparer<TKey, TValue>(IComparer<TKey> keyComparer= null, IComparer<TValue> valueComparer = null);
} API UsageIEqualityComparer<string[]> stringArrayComparer = EqualityComparer.CreateEnumerableComparer<string>(StringComparer.InvariantCultureIgnoreCase);
var dictionary = new Dictionary<string[], object>(stringArrayComparer);
dictionary.Add(new string[] { "key" }, null);
dictionary.ContainsKey(new string[] { "KEY" }); // true IComparer<string[]> stringArrayComparer = Comparer.CreateEnumerableComparer<string>(StringComparer.InvariantCultureIgnoreCase);
var dictionary = new SortedDictionary<string[], object>(stringArrayComparer);
dictionary.Add(new string[] { "key" }, null);
dictionary.ContainsKey(new string[] { "KEY" }); // true Alternative DesignsMaking the combinators generic on the collection type would allow construction-time specialization, e.g. public static IEqualityComparer<TEnumerable> CreateEnumerableComparer<TEnumerable, TElement>(IEqualityComparer<TElement> elementComparer = null)
where TEnumerable : IEnumerable<TElement>
{
elementComparer ??= EqualityComparer<T>.Default;
if (typeof(TEnumerable) = typeof(TElement[]))
{
return (IEqualityComparer<TEnumerable>)(object)(new ArraySequenceComparer<TElement>(elementComparer));
}
if (typeof(TEnumerable) = typeof(List<TElement>))
{
return (IEqualityComparer<TEnumerable>)(object)(new ListSequenceComparer<TElement>(elementComparer));
}
return new EnumerableSequenceComparer<TElement>(elementComparer);
} But obviously that would make calling the combinators more complicated: IEqualityComparer<int[]> arrayComparer = EqualityComparer.CreateEnumerableComparer<int[], int>(); RisksNo response
|
Looks great! A couple quick notes:
Essentially, the issue for the second point is: record MyModel(int X, string Y, SomeEquatableType Z, ImmutableArray<AnotherEquatableType> W); Records like these are extremely common in incremental generation steps, and they will silently break incrementality because the codegen for records will just compare those arrays by reference equality for the underlying array, which of course will not work. Right now the only solution is to manually implement full equality for the type, which defeats the whole point of using a record. I'm not sure I know how a solution for this could look like though, as this might just be considered a limitation with Roslyn's codegen for records. @RikkiGibson any thoughts here? I'd be happy to help opening a proposal for csharplang if needed, but I guess my question here is more about, I'm not actually sure what (if anything) could be done to improve the situation here 🤔 |
I'm hoping that having a generic specialization for a
It is unfortunate, but this it's beyond the scope of this proposal. We might continue this conversation either in #77183 or csharplang if necessary. |
Awesome to see this happening. What about https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1?view=net-6.0 The sets of derived types have quite a few unique items on both sides. As an example, JSON types caught my eye. |
I wanted to avoid exposing combinators for overtly mutable types, since using these with hash tables can be dangerous. I do get that read-only does not imply immutable, and that mutable types implement read-only collection interfaces, but I suspect there's self-documentation value in constraining the APIs to read-only types. It's the same reason why I didn't add a Assuming we use the API variant in "Alternative Designs", it might be possible that we specialize
As of .NET 7 it is possible to wrap an |
I see your reasoning. The Changing this in later releases could be a compatibility problem. Maybe the documentation should say:
That way, there is documented leeway to optimize this. |
Casting-based specialization adds constant overhead for all cases but that might be better than falling off a performance cliff when people compare byte arrays and such through A complication with this is that hash codes must be the same even if different
This effectively forces per-element hashing and hashcode aggregation, which implies certain inefficiencies. The |
Note that the return type is Obviously this means that different hashcodes would be returned depending on what type |
namespace System.Collections.Generic;
public static class EqualityComparer
{
// Equality comparison using sequence equality
public static IEqualityComparer<TEnumerable> CreateEnumerableComparer<TEnumerable, T>(IEqualityComparer<T>? elementComparer = null)
where TEnumerable : IEnumerable<T> { }
public static IEqualityComparer<TSet> CreateSetComparer<TSet, T>(IEqualityComparer<T>? elementComparer = null)
where TSet : IReadOnlySet<T> { }
public static IEqualityComparer<TDictionary> CreateDictionaryComparer<TDictionary, TKey, TValue>(IEqualityComparer<TKey>? keyComparer = null, IEqualityComparer<TValue>? valueComparer = null)
where TDictionary : IReadOnlyDictionary<TKey, TValue> { }
}
public static partial class Comparer
{
// Comparison using lexicographic ordering
public static IComparer<TEnumerable> CreateEnumerableComparer<TEnumerable, T>(IComparer<T>? elementComparer = null)
where TEnumerable : IEnumerable<T> { }
} |
Relevant issue: #23950. I'm marking that as ready-for-review so we can revisit in light of this discussion. |
I love the optimizations that implementation does, but falling back to O(N^2) comparisons seems suboptimal. Wouldn't a 2*N approach like this be just as correct? Its premise is: if (and only if) each set considers the other to satisfy each of its own elements, then the sets must be equal. With constant-time lookups, that can be evaluated in linear time. In fact, the latter implementation offers another advantage over |
This is an exciting proposal.
Is this still the plan? I don't see an overload for it in the last update, but hopefully...
...applies here?
I believe this one might be worth optimizing for as well. Specifically, I give you It would not surprise me if some other internal collection implementations similarly restrict themselves to I realize that we have to select our special cases carefully. But since such (In my own implementation, I chose to runtime-special-case |
Speaking of Also, I believe that my earlier comment on set equality applies to lookups as well. (Here is an example implementation.) |
Be careful, the
|
Yeah, we should not be adding new non-generic EqualityComparer and Comparer types to System.Collections.Generic. This needs to be revisited. |
I think we could just cut the |
We should not add We already have |
I don't think there is a
I don't think that would work well for the dictionary combinator, which requires at least two type parameters bound by generic constraints. It certainly wouldn't work with the current design of using an added generic parameter for the collection type. |
You're right. But even so, as there is a non-generic Comparer in System.Collections, it's strange to add a non-generic EqualityComparer to System.Collections.Generic which already has a generic one. I'd much rather see Create overloads or CreateXx methods on the existing type. It's where devs now look for this kind of functionality; they shouldn't need to look to If others on fdc think it's fine, fine. But we need to re-review it. |
Using non-generic static classes to host factory methods for generic types is standard practice as far I can tell (the static classes in System.Collections.Immutable jump to mind).
Retrofitting the existing generic class with combinators creates problems when type constraints come into play, for example the following is not valid C#: public partial class EqualityComparer<T>
{
public static EqualityComparer<T> CreateEnumerableComparer<TElement>(IEqualityComparer<TElement>? elementComparer = null)
where T : IEnumerable<TElement> { }
} We could just ignore the class-level type parameter: public partial class EqualityComparer<T>
{
public static EqualityComparer<TEnumerable> CreateEnumerableComparer<TEnumerable, TElement>(IEqualityComparer<TElement>? elementComparer = null)
where TEnumerable : IEnumerable<TElement> { }
} but this would be too awkward to use.
I'd be inclined to think of |
Yes, but it's not standard to have have both a non-generic and generic of the same name both with factory methods. |
It provides a Create method that lets you fully customize it. That is not default equality. |
Hence why this needs work. The 99% case here is needing the APIs on the generic class, and this proposal suggests creating a non-generic right next to it with very niche support. So a dev comes along, types EqualityComparer, dots of it, and is now in a pit of failure for the 99% case. Essentially this niche proposal makes using the existing golden path functionality harder to discover. |
This was added recently. It might be possible to reconsider its location so that all factories are located consistently in the non-generic class: public class EqualityComparer<T>
{
- public EqualityComparer<T> Create(Func<T?, T?, bool> equals, Func<T, int>? getHashCode = null);
}
+public static class EqualityComparer
+{
+ public EqualityComparer<T> Create<T>(Func<T?, T?, bool> equals, Func<T, int>? getHashCode = null);
+
+ public static EqualityComparer<TEnumerable> CreateEnumerableComparer<TEnumerable, T>(IEqualityComparer<T>? elementComparer = null)
+ where TEnumerable : IEnumerable<T> { }
+
+ public static EqualityComparer<TSet> CreateSetComparer<TSet, T>(IEqualityComparer<T>? elementComparer = null)
+ where TSet : IReadOnlySet<T> { }
+
+ public static EqualityComparer<TDictionary> CreateDictionaryComparer<TDictionary, TKey, TValue>(IEqualityComparer<TKey>? keyComparer = null, IEqualityComparer<TValue>? valueComparer = null)
+ where TDictionary : IReadOnlyDictionary<TKey, TValue> { }
+} |
Keying on buffers using sequence equality is fairly common, we currently don't have a built-in implementation for it. |
It's way, way less common than just needing a simple comparer for a T, and if it were just about sequence equality, it could be added to the existing type. |
You could, but it would necessarily forsake the
I'm not sure how existence of similarly named types constitutes a pit of failure. It is fairly common for a particular named component to distribute functionality across a generic/nongeneric, class/interface matrix. |
Background and motivation
This proposal attempts to consolidate a number of issues requesting
IEqualityComparer<T>
andIComparer<T>
combinators acting on common collection types, e.g. #33873, #44796, #77183 (comment) and #19644. Defining structural equality/order comparison on collection types is a common requirement, for example when working with incremental source generators.API Proposal
API Usage
Alternative Designs
Making the combinators generic on the collection type would allow construction-time specialization and help with devirtualization, e.g.
But obviously that would make calling the combinators more complicated:
Full alternative proposal:
Risks
No response
The text was updated successfully, but these errors were encountered: