Improve ImmutableArray.RemoveRange #65186

lateapexearlyspeed · 2022-02-11T10:55:44Z

Fixed #65069
Will mark as ready when it is done.

ghost · 2022-02-11T10:55:51Z

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixed #65069
Will mark as ready when it is done.

Author:	lateapexearlyspeed
Assignees:	-
Labels:	`area-System.Collections`
Milestone:	-

eiriktsarpalis · 2022-02-11T13:09:17Z

src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/ImmutableArray_1.cs

+        private struct MultiSet
+        {
+            private readonly IEqualityComparer<T> _equalityComparer;
+            private readonly Dictionary<int, LinkedList<(T value, int count)>> _dictionary = new();


Since you're encoding a hashtable implementation here, wouldn't it make sense to use an array of buckets directly to avoid all the linked list allocations?

Got it, fixed to array-based implementation.
I referred the basic concept (bucket headers array and buckets array) of generic Dictionary and HashSet and simplify something based on ImmutableArray.RemoveRange scenario, for example: no need to reuse free holes because Remove() will only happen after all Add() are called.
Please correct my understanding.

Call .TryRemove() after all Add() calls are called and not remove item entries from array, so not optimize free-holes from Remove immutableArrray.RemoveRange should know item count so assume it is not possible to exceed initial capacity (no Resize)

lateapexearlyspeed · 2022-02-17T02:23:09Z

@eiriktsarpalis it is ready for review now, also please correct my understand of MultiSet implementation for ImmutableArray.RemoveRange.

eiriktsarpalis · 2022-02-21T15:08:48Z

src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/ImmutableArray_1.cs

-            }
-
-            return self.RemoveAtRange(indicesToRemove);
+            ICollection<int>? indicesToRemove = items.TryGetCount(out _) ? self.FindOrderedIndicesToRemoveByMultiSet(items, equalityComparer) : self.FindOrderedIndicesToRemove(items, equalityComparer);


I'm not sure I understand why we need to fall back to the old algorithm if we can't determine the size?

See previous comment(61196)
Here we use TryGetCount to test type.

Maybe add a comment explaining this? What about collections with very large counts?

Maybe add a comment explaining this?

Added.

What about collections with very large counts?

Not sure, I just feel the fall back code is for true enumerable data which sounds like long streaming concept. But if it is already a large reality collection data, that will mean that it was successful to allocate large memory (so same order of magnitude). Please give suggestion.

One possible approach might be to fall back to the slow implementation once the multiset exceeds a certain capacity dynamically:

private ICollection<int>? FindOrderedIndicesToRemoveByMultiSet(IEnumerable<T> items, IEqualityComparer<T>? equalityComparer) { var multiSet = new MultiSet(equalityComparer); int i = 0; foreach (T item in items) { multiSet.Add(item); if (++i == 100) // threshold way above what I would expect is the typical input enumerable. { return FindOrderedIndicesToRemove(items, equalityComparer); } } return GetOrderedIndicesToRemoveFor(multiSet); }

I would strongly encourage you to make any decision based on benchmark numbers, however.

// Not check size public ICollection<int>? Fast_NoLimit_FindOrderedIndicesToRemoveByMultiSet(IEnumerable<T> items, IEqualityComparer<T>? equalityComparer) { var multiSet = new MultiSet(equalityComparer); foreach (T item in items) { multiSet.Add(item); } return GetOrderedIndicesToRemoveFor(multiSet); } // Firstly try to get collection count property directly then fall back to iterate to trigger size limit dynamically public ICollection<int>? Fast_TryGetCountThenIterateCheckCount(IEnumerable<T> items, IEqualityComparer<T>? equalityComparer) { if (items.TryGetCount(out int count)) { if (count <= ThrottleValue) { return Fast_NoLimit_FindOrderedIndicesToRemoveByMultiSet(items, equalityComparer); } return SlowFindOrderedIndicesToRemove(items, equalityComparer); } return Fast_IterateCheckCount(items, equalityComparer); } // Firstly try to get collection count property directly then fall back to iterate to trigger size limit compared with actual dictionary.count (dedup same value purpose) dynamically public ICollection<int>? Fast_TryGetCountThenIterateCheckCountByDicLength(IEnumerable<T> items, IEqualityComparer<T>? equalityComparer) { if (items.TryGetCount(out int count)) { if (count <= ThrottleValue) { return Fast_NoLimit_FindOrderedIndicesToRemoveByMultiSet(items, equalityComparer); } return SlowFindOrderedIndicesToRemove(items, equalityComparer); } return Fast_IterateCheckCountByDicLength(items, equalityComparer); } // Iterate to trigger size limit compared with actual dictionary.count (dedup same value purpose) dynamically public ICollection<int>? Fast_IterateCheckCountByDicLength(IEnumerable<T> items, IEqualityComparer<T>? equalityComparer) { var multiSet = new MultiSet(equalityComparer); foreach (T item in items) { if (multiSet.ItemCount > ThrottleValue) { return SlowFindOrderedIndicesToRemove(items, equalityComparer); } multiSet.Add(item); } return GetOrderedIndicesToRemoveFor(multiSet); } // Iterate to trigger size limit dynamically public ICollection<int>? Fast_IterateCheckCount(IEnumerable<T> items, IEqualityComparer<T>? equalityComparer) { var multiSet = new MultiSet(equalityComparer); int i = 0; foreach (T item in items) { if (++i > ThrottleValue) { return SlowFindOrderedIndicesToRemove(items, equalityComparer); } multiSet.Add(item); } return GetOrderedIndicesToRemoveFor(multiSet); } private readonly struct MultiSet { public int ItemCount => _dictionary.Count; }

Benchmark condition:
ImmutableArray length = 100, Items length = 100

Items is true enumerable

Items is array instance

Result:
Pure fast method is ~1x faster than slow method
If not hit throttleValue, pure fast is almost same time as all other fast methods with throttling check
If hit throttleValue so that fall back to slow path, then all fast methods with throttling check are much slower than original slow method, except "Fast_TryGetCountThenIterateCheckCount()" and "Fast_TryGetCountThenIterateCheckCountByDicLength()" when items is Array instance because they can early find situation to fall back to slow path. However they are still slower than original slow method.

So if we need to have size limit, maybe Fast_TryGetCountThenIterateCheckCountByDicLength() is better, but not sure what throttling value is better, LOH triggering size is 85k but need to find out it after Dictionary already scales array size (EnsureCapacity). Need guidance for that, thanks.
Otherwise, pure fast method can give shortest time in all cases.

Thanks, I think Fast_TryGetCountThenIterateCheckCountByDicLength is ok, although I would be interested in seeing the numbers for various sizes of input enumerables, particularly for low values like 1, 2 or 10 elements. It might be the case that for low enough values creating a MultiSet actually makes things slower. You can execute benchmarks for multiple sizes using BDN's ParamsAttribute.

Under condition of 100 length ImmutableArray, fast method is slower until items length up to ~30.

Hmmm, these numbers make me think that this optimization is likely not worth our while. I don't think it's worth optimizing for inputs > 30, it's unlikely that this method would be used with so large inputs. I'd probably just close this PR. Thanks for driving the performance investigation!

src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/ImmutableArray_1.cs

eiriktsarpalis

Would it be possible to write a few benchmarks capturing performance gains (if any) brought by this change?

@stephentoub do you think this performance improvement justifies the inclusion & maintenance of yet another hashtable implementation? It should be noted that it was added since the regular Dictionary does not support null keys which is necessary here.

…ntry with 0 item count.

stephentoub · 2022-02-22T22:54:04Z

@stephentoub do you think this performance improvement justifies the inclusion & maintenance of yet another hashtable implementation?

If no one (besides me asking whether it was an issue) has complained about the algorithmic complexity involved, probably not.

lateapexearlyspeed · 2022-02-23T03:06:41Z

If no one (besides me asking whether it was an issue) has complained about the algorithmic complexity involved, probably not.

@eiriktsarpalis @stephentoub Then is there any chance to proceed ? Maybe put this MultiSet into ImmutableArray class to specify it is a particular hashtable implementation for ImmutableArray, or use solution of previous commit (reuse Dictionary) ?

eiriktsarpalis · 2022-02-23T11:16:35Z

@lateapexearlyspeed come to think of it, you might be able to work around the null key restriction using something like the following:

public readonly struct MaybeNullKeyWrapper<TKey>
{
    public TKey? Key { get; init; }
}

public class MaybeNullKeyComparer<TKey> : IEqualityComparer<MaybeNullKeyWrapper<TKey>>
{
    private readonly IEqualityComparer<TKey> _keyComparer;

    public MaybeNullKeyComparer(IEqualityComparer<TKey> keyComparer)
    {
        _keyComparer = keyComparer ?? EqualityComparer<TKey>.Default;
    }

    public int GetHashCode(MaybeNullKeyWrapper<TKey> wrapper) => _keyComparer.GetHashCode(wrapper);
    public bool Equals(MaybeNullKeyWrapper<TKey> left, MaybeNullKeyWrapper<TKey> right) => _keyComparer.Equals(left.Key, right.Key);
}

It should then be possible to use a regular dictionary without issue:

var multiset = new Dictionary<MaybeNullKeyWrapper<T>, int>(new MaybeNullKeyComparer(equalityComparer));
...

This reverts commit 62cb66b. # Conflicts: # src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/ImmutableArray_1.cs

eiriktsarpalis · 2022-02-24T09:29:04Z

src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/MultiSet.cs

+
+            public int GetHashCode(NullableKeyWrapper obj)
+            {
+                return obj.Key == null ? 0 : _keyComparer.GetHashCode(obj.Key);


The null check would violate custom equality comparers that equate null with non-null keys.

Yes, but struggle here is as following, please help decide:

See regular HashSet, it uses same way to handle null (no matter comparer) so it will also hit issue you mentioned unless custom comparer's GetHashCode(null) == 0 also.

See definition of IEqualityComparer<T>.GetHashCode([DisallowNull] T obj): interface declaration not allow null argument and compiler nullable check will also find it; although we know Default compare supports and return 0 at least.

interface declaration not allow null argument and compiler nullable check will also find it; although we know Default compare supports and return 0 at least.

I think that's fine, I would probably just add a suppression. A good guide is the current behavior of ImmutableArray.Index, which works with null elements if no equality comparer is specified and will surface any null exceptions a custom equality comparer might throw (it only exercises the Equals method, not GetHashCode).

src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/MultiSet.cs

eiriktsarpalis · 2022-02-25T13:43:01Z

src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/ImmutableArray_1.cs

-            }
-
-            return self.RemoveAtRange(indicesToRemove);
+            // Avoid to allocate for true enumerable


Suggested change

// Avoid to allocate for true enumerable

// Avoid building a multiset for enumerables of unknown size.

src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/ImmutableArray_1.cs

eiriktsarpalis · 2022-03-01T11:30:00Z

src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/ImmutableArray_1.cs

+            {
+#if NET6_0_OR_GREATER
+                ref int count = ref CollectionsMarshal.GetValueRefOrAddDefault(_dictionary, item, out _);
+                count++;


Might need to guard against overflow?

Suggested change

count++;

checked { count++; }

eiriktsarpalis · 2022-03-01T11:31:35Z

src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/ImmutableArray_1.cs

+                ref int count = ref CollectionsMarshal.GetValueRefOrAddDefault(_dictionary, item, out _);
+                count++;
+#else
+            _dictionary[item] = _dictionary.TryGetValue(item, out int count) ? count + 1 : 1;


Suggested change

_dictionary[item] = _dictionary.TryGetValue(item, out int count) ? count + 1 : 1;

_dictionary[item] = _dictionary.TryGetValue(item, out int count) ? checked(count + 1) : 1;

lateapexearlyspeed · 2022-03-04T05:01:42Z

Close this because the new implementation cannot win old implementation under low item number case.

eiriktsarpalis · 2022-03-04T10:01:05Z

Thanks for driving this investigation @lateapexearlyspeed. It was a good learning exercise!

lateapexearlyspeed added 2 commits February 11, 2022 10:27

Initial commit: empty MultiSet.

c8e6676

Implement MultiSet.

b00ba1c

dotnet-issue-labeler bot added the area-System.Collections label Feb 11, 2022

ghost added the community-contribution Indicates that the PR has been added by a community member label Feb 11, 2022

eiriktsarpalis reviewed Feb 11, 2022

View reviewed changes

lateapexearlyspeed added 3 commits February 14, 2022 17:31

Array-based multiset implementation: support extending size.

0cfee91

Use MultiSet in ImmutableArray.RemoveRange overloads; Add tests.

de7c2c9

lateapexearlyspeed marked this pull request as ready for review February 16, 2022 13:25

Remove ValueTuple package.

c5534ef

eiriktsarpalis reviewed Feb 21, 2022

View reviewed changes

src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/ImmutableArray_1.cs Outdated Show resolved Hide resolved

eiriktsarpalis reviewed Feb 21, 2022

View reviewed changes

src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/ImmutableArray_1.cs Outdated Show resolved Hide resolved

eiriktsarpalis reviewed Feb 21, 2022

View reviewed changes

src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/ImmutableArray_1.cs Outdated Show resolved Hide resolved

eiriktsarpalis reviewed Feb 21, 2022

View reviewed changes

src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/ImmutableArray_1.cs Outdated Show resolved Hide resolved

eiriktsarpalis reviewed Feb 21, 2022

View reviewed changes

Fix comment: separate MultiSet.cs file; add remark for not removing e…

683f7b1

…ntry with 0 item count.

lateapexearlyspeed added 3 commits February 24, 2022 11:18

Revert "Array-based multiset implementation with assumptions:"

9624f8a

This reverts commit 62cb66b. # Conflicts: # src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/ImmutableArray_1.cs

Fix comment: introduce NullableKeyWrapper and back to use Dictionary.

c85cde1

Pass T typed 'item' into Dictionary more naturally.

f526fed

eiriktsarpalis reviewed Feb 24, 2022

View reviewed changes

src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/MultiSet.cs Outdated Show resolved Hide resolved

eiriktsarpalis reviewed Feb 24, 2022

View reviewed changes

src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/MultiSet.cs Outdated Show resolved Hide resolved

Fix comment: Move back MultiSet; define T? key.

757fdef

eiriktsarpalis reviewed Feb 25, 2022

View reviewed changes

src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/ImmutableArray_1.cs Show resolved Hide resolved

runfoapp bot mentioned this pull request Feb 25, 2022

System.Runtime.Serialization.Xml.Tests failed #59926

Closed

Fix comment: inline EqualityComparer<T>.Default.

b703751

eiriktsarpalis reviewed Mar 1, 2022

View reviewed changes

lateapexearlyspeed closed this Mar 4, 2022

ghost locked as resolved and limited conversation to collaborators Apr 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve ImmutableArray.RemoveRange #65186

Improve ImmutableArray.RemoveRange #65186

lateapexearlyspeed commented Feb 11, 2022

ghost commented Feb 11, 2022

eiriktsarpalis Feb 11, 2022

lateapexearlyspeed Feb 15, 2022

lateapexearlyspeed commented Feb 17, 2022 •

edited

Loading

eiriktsarpalis Feb 21, 2022

lateapexearlyspeed Feb 22, 2022

eiriktsarpalis Feb 24, 2022

lateapexearlyspeed Feb 25, 2022

eiriktsarpalis Feb 25, 2022

lateapexearlyspeed Mar 1, 2022 •

edited

Loading

eiriktsarpalis Mar 1, 2022

lateapexearlyspeed Mar 2, 2022

eiriktsarpalis Mar 2, 2022

eiriktsarpalis left a comment

stephentoub commented Feb 22, 2022

lateapexearlyspeed commented Feb 23, 2022

eiriktsarpalis commented Feb 23, 2022 •

edited

Loading

eiriktsarpalis Feb 24, 2022

lateapexearlyspeed Feb 25, 2022

eiriktsarpalis Feb 25, 2022

lateapexearlyspeed Mar 1, 2022

eiriktsarpalis Feb 25, 2022

eiriktsarpalis Mar 1, 2022

eiriktsarpalis Mar 1, 2022

lateapexearlyspeed commented Mar 4, 2022

eiriktsarpalis commented Mar 4, 2022

	// Avoid to allocate for true enumerable
	// Avoid building a multiset for enumerables of unknown size.

	_dictionary[item] = _dictionary.TryGetValue(item, out int count) ? count + 1 : 1;
	_dictionary[item] = _dictionary.TryGetValue(item, out int count) ? checked(count + 1) : 1;

Improve ImmutableArray.RemoveRange #65186

Improve ImmutableArray.RemoveRange #65186

Conversation

lateapexearlyspeed commented Feb 11, 2022

ghost commented Feb 11, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lateapexearlyspeed commented Feb 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lateapexearlyspeed Mar 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eiriktsarpalis left a comment

Choose a reason for hiding this comment

stephentoub commented Feb 22, 2022

lateapexearlyspeed commented Feb 23, 2022

eiriktsarpalis commented Feb 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lateapexearlyspeed commented Mar 4, 2022

eiriktsarpalis commented Mar 4, 2022

lateapexearlyspeed commented Feb 17, 2022 •

edited

Loading

lateapexearlyspeed Mar 1, 2022 •

edited

Loading

eiriktsarpalis commented Feb 23, 2022 •

edited

Loading