Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removes the outer loop for CountBy #106111

Closed
wants to merge 6 commits into from

Conversation

AlexRadch
Copy link
Contributor

Close #106110

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Aug 8, 2024
@WeihanLi
Copy link
Contributor

WeihanLi commented Aug 8, 2024

maybe better with a benchmark comparison

@AlexRadch
Copy link
Contributor Author

maybe better with a benchmark comparison

Where is the source code for Linq benchmarks?

Co-authored-by: Weihan Li <weihanli@outlook.com>
@am11
Copy link
Member

am11 commented Aug 8, 2024

maybe better with a benchmark comparison

Where is the source code for Linq benchmarks?

https://github.com/dotnet/performance/tree/main/src/benchmarks/micro/libraries/System.Linq

@eiriktsarpalis
Copy link
Member

See #106110 (comment)

return ((IEnumerable<KeyValuePair<TKey, int>>)[]).GetEnumerator();
}

return BuildCountDictionary(enumerator, keySelector, keyComparer).GetEnumerator();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue I'm finding with this approach is that this is returning an enumerator whose implementation is public. Couldn't this result users taking a dependency on that implementation details which might prevent us from evolving the method in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A dictionary enumerator does not have any useful method or property, and a dictionary enumerator is hidden by the IEnumerator interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow, wouldn't this cast work?

(Dictionary<T, int>.Enumerator)source.CountBy(x => x).GetEnumerator();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow, wouldn't this cast work?

Yes, it will work. But Dictionary.Enumerator does not have any specific method or property.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the implementation to not use dictionary would result in this cast failing for anybody who might depend on this. This is why we never return enumerable/enumerator implementations that are public unless they are specifically present in the type signature of the method (as is the case with ToList() or the inner array in Chunk()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen a code analyzer that would check this rule, I doubt it is so, why isn't there a code analyzer for this rule?


Assert.Throws<InvalidOperationException>(() => enumerator.MoveNext());
Assert.Throws<InvalidOperationException>(() => countBy.GetEnumerator().MoveNext());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you meant to assert this?

Suggested change
Assert.Throws<InvalidOperationException>(() => countBy.GetEnumerator().MoveNext());
Assert.Throws<InvalidOperationException>(() => countBy.GetEnumerator());

I don't think this is an acceptable behavioural change. Generally speaking GetEnumerator() calls don't do any meaningful work other than instantiate the enumerator; everything else should be reserved for the first MoveNext() call, including any nested GetEnumerator() calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think countBy.GetEnumerator().MoveNext() is independent of how the implementation is done; If the implementation changes, the test will not have to change.

Copy link
Member

@eiriktsarpalis eiriktsarpalis Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MoveNext() part is irrelevant in this case, it is countBy.GetEnumerator() that is throwing when it previously wasn't. GetEnumerator shouldn't be doing any work or throwing exceptions, however with this change it is clear that GetEnumerator is doing all the work (since it is this method that is enumerating the entire source enumerable to build the dictionary).

Copy link
Contributor Author

@AlexRadch AlexRadch Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, in the IQueryable interface it is the GetEnumerator method that does the work of initializing the query, creating an SQL query, sending the SQL query to the server, passing the query parameters, creating a cursor to obtain the result using the MoveNext method, etc. And it can also throw exceptions.

Therefore, I think that the IEnumerable.GetEnumerator method can work even before the first call to MoveNext, including throwing exceptions.

I think that the work can be done both as in the GetEnumerator method and as in the first call to MoveNext. These are already implementation details, so I wrote the tests so that they do not depend on the subtleties of the implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an apples-to-apples comparison. IQueryable uses a very different execution model, especially when building SQL queries. But even if we look at how the IEnumerable implementation of built-in query provider works the same invariant holds:

IQueryable<int> queryable = new ThrowingGetEnumerator().AsQueryable()
    .Select(x => x + 1)
    .Where(x => x > 0);

IEnumerator<int> enumerator = queryable.GetEnumerator(); // No exception
enumerator.MoveNext(); // Exception

public class ThrowingGetEnumerator : IEnumerable<int>
{
    public IEnumerator<int> GetEnumerator() => throw new NotSupportedException();
    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

These are already implementation details, so I wrote the tests so that they do not depend on the subtleties of the implementation.

I strongly disagree that that it's an implementation detail. Changing the behavior can break user code that depends on this method not enumerating the source or throwing exceptions; it might not be important if you just foreach over the result, but it is in code that explicitly handles enumerators in bespoke ways.

Copy link
Contributor Author

@AlexRadch AlexRadch Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that this indicates an issue with the GroupBy implementation. GetEnumerator() should not be incurring side-effects in principle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess GetEnumerator can start to execute work before the first call MoveNext and it depends on implementation.

Copy link
Member

@eiriktsarpalis eiriktsarpalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the benefits of this approach outweigh its trade-offs.

Copy link
Member

@eiriktsarpalis eiriktsarpalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we should take this change, on account of the behavioural breaking changes and leaking of implementation detail. Thank you for your contribution regardless.

@stephentoub
Copy link
Member

Closing per #106111 (review). Thanks, though!

@stephentoub stephentoub closed this Sep 5, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Oct 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Linq community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unnecessary outer loop in CountBy
5 participants