Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Analyzer] Prefer .Length/Count/IsEmpty over Any() #75933

Closed
Tracked by #77391
stephentoub opened this issue Sep 20, 2022 · 22 comments · Fixed by dotnet/roslyn-analyzers#6236
Closed
Tracked by #77391

[Analyzer] Prefer .Length/Count/IsEmpty over Any() #75933

stephentoub opened this issue Sep 20, 2022 · 22 comments · Fixed by dotnet/roslyn-analyzers#6236
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Collections code-analyzer Marks an issue that suggests a Roslyn analyzer code-fixer Marks an issue that suggests a Roslyn code fixer partner-impact This issue impacts a partner who needs to be kept updated
Milestone

Comments

@stephentoub
Copy link
Member

stephentoub commented Sep 20, 2022

If .Any() is used on an expression strongly-type as an ICollection<T> or an implementing type with a Count property (or a T[] with a Length property), the code should be using .Count != 0/.Length != 0 instead (or using !IsEmpty if it has an IsEmpty property).

(I really thought we had this analyzer, but I can't find any record of it now.)

@stephentoub stephentoub added area-System.Collections code-analyzer Marks an issue that suggests a Roslyn analyzer code-fixer Marks an issue that suggests a Roslyn code fixer labels Sep 20, 2022
@stephentoub stephentoub added this to the 8.0.0 milestone Sep 20, 2022
@ghost
Copy link

ghost commented Sep 20, 2022

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

If .Any() is used on an expression strongly-type as an ICollection/ICollection<T> or an implementing type with a Count property (or a T[] with a Length property), the code should be using .Count != 0/.Length != 0 instead (or using !IsEmpty if it has an IsEmpty property).

(I really thought we had this analyzer, but I can't find any record of it now.)

Author: stephentoub
Assignees: -
Labels:

area-System.Collections, code-analyzer, code-fixer

Milestone: 8.0.0

@stephentoub stephentoub added the api-ready-for-review API is ready for review, it is NOT ready for implementation label Sep 27, 2022
@terrajobst
Copy link
Member

terrajobst commented Sep 27, 2022

Video

  • Looks good as proposed

Severity: Info
Category: Performance

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Sep 27, 2022
@CollinAlpert
Copy link
Contributor

I'd like to do this one.

@stephentoub
Copy link
Member Author

Excellent. Thanks!

@CollinAlpert
Copy link
Contributor

I just found this Analyzer. Should I rather extend that one's functionality or create a new Analyzer only for .Length and .IsEmpty?

@stephentoub
Copy link
Member Author

stephentoub commented Oct 24, 2022

If extending that analyzer results in cleaner code overall, then I think that's probably fine, e.g. if you'd otherwise be duplicating the majority of that analyzer's code and the new analysis requires just a few additional lines. But regardless of whether it's augmenting an existing analyzer or adding a new one, it should have its own new diagnostic ID; which component raises that ID is an implementation detail.

@fowl2
Copy link

fowl2 commented Oct 25, 2022

Isn't there some way to make the runtime performance of Any() equivalent to .Length != 0 etc. rather than encouraging code churn and obscuring intent of all the calling code?'

Default interface implementation plus inlining attribute? Some sort of "stronger" inlining attribute? Enhancing source generators?

@stephentoub
Copy link
Member Author

stephentoub commented Oct 25, 2022

Isn't there some way to make the runtime performance of Any() equivalent to .Length != 0 etc. rather than encouraging code churn and obscuring intent of all the calling code?'

I don't believe if (array.Length != 0) is obscuring intent. If anything, I personally think if (array.Any()) is more obscure... I have to mentally translate that, and my thought process is along the lines of "array.Any()? Oh, does the array contain anything? Ok, so this is checking whether the length is not 0? Or... is it checking for non-default/non-null values?" As for runtime performance, Any() will do a type check for ICollection<T> and return its .Count != 0, but there's still the Any method invocation (it's unlikely to be inlined nor would forcing it be a good idea in the majority case), there's a type check, and there's an interface invocation. It's a whole lot better to just access .Length/Count, and IMHO, clearer on the intent as well.

@fowl2
Copy link

fowl2 commented Nov 3, 2022

I don't believe if (array.Length != 0) is obscuring intent. If anything, I personally think if (array.Any()) is more obscure... I have to mentally translate that, and my thought process is along the lines of "array.Any()? Oh, does the array contain anything?

I guess this is more about what you're familiar with then. To me, Any() expresses "Is there anything in this collection?" better than "Is the Length a particular value? Or is it Count? Is it 'not is empty'?".

Or... is it checking for non-default/non-null values?"

Is there a common library or other platform that uses a Any() method in this way?

As for runtime performance, Any() [...]

Yeah I was hoping for something to make compile it down to the same runtime cost no matter which syntax is used. I guess that would have to be in the compiler proper at this point and probably not considered to be a big enough gain for the complexity there.

A fixer like this pushes the complexity cost out by scattering it as code churn across the vastly greater quantity of codebases.

@RenderMichael
Copy link
Contributor

Would it work to try using TryGetNonEnumeratedCount() in Any(), or would that impact performance too much where that method returns false?

@Lancelotbronner
Copy link

Could bool IsEmpty => Count/Length is 0; be added to arrays and collections? I've always wondered why it was missing and now that spans have an IsEmpty I wonder even more.

@pedrofr
Copy link

pedrofr commented Nov 29, 2022

Could bool IsEmpty => Count/Length is 0; be added to arrays and collections? I've always wondered why it was missing and now that spans have an IsEmpty I wonder even more.

In the same vein, can't IEnumerable (the non generic one) have an extension method, bundled with Linq, called bool IsEmpty()? Would it be possibly inlined?

@stephentoub
Copy link
Member Author

For new API requests, please open new issues. Thanks.

@Applesauce314
Copy link
Contributor

Applesauce314 commented Nov 29, 2022

Isn't there some way to make the runtime performance of Any() equivalent to .Length != 0 etc. rather than encouraging code churn and obscuring intent of all the calling code?'

Default interface implementation plus inlining attribute? Some sort of "stronger" inlining attribute? Enhancing source generators?

I get that a code analyzer is a lot easier to implement than then improving the runtime performance of Any(), but the interaction with CA1827 - Do not use Count/LongCount when Any can be used will need to be carefully considered.

CA1827 plus this new rule will result in these instruction to programmers:

  • use Any instead of Count
  • use Count or Length instead of Any

obviously in the background the rules are actually:

  • use Any() instead of Count() (the function call one)
  • use Count or Length instead of Any() (the property ones)
    But I am sure this will confuse more than one person.

I general I have been using Any() instead of Count or Length as it makes the "is there anything in this list? (as in a bunch of items, not a C# list)" check have the same appearance whether you are checking a List<T> or an IEnumerable<T>. But I have always wondered if I was paying a performance penalty for this.

@pedrofr
Copy link

pedrofr commented Nov 29, 2022

@stephentoub, my comment wasn't exactly a request, but rather exactly what @Applesauce314 is talking about.

The proposed analyzer would be confusing considering CA1827 and the fact that different, but similar in theme, types would have different best practices: Length/Count/Any() for T[]/ICollection<T>/IEnumerable<T>, respectively.

In my opinion this analyzer shouldn't exist if a common, objective and performant API is possible.

@aKzenT
Copy link

aKzenT commented Dec 16, 2022

I agree that discouraging Any() in favour of Length and Count hurts readability and hides real user intent. It is also reasonable that there are implementations of collection interfaces where computing Count is more expensive than just checking if a list is empty. E.g. think of some kind of linked list implementation. So hiding real attempt to get performance gains in limitted cases is not a good solution.

On the other hand checking if a collection is empty or not is a very common scenario that should have very low overhead in performance terms. So if there is no easy way to teach the compiler/runtime to optimize the call away, I think having a dedicated IsEmpty property is the best solution overall and would also solve the issue arround confusing and seemingly conflicting rules.

To this end I have created an API proposal to get the discussion started: #79769

@Timovzl
Copy link

Timovzl commented Dec 19, 2022

I agree that discouraging Any() in favour of Length and Count hurts readability and hides real user intent.

@aKzenT, I'm with Stephen Toub on this one:

I don't believe if (array.Length != 0) is obscuring intent. If anything, I personally think if (array.Any()) is more obscure... I have to mentally translate that, and my thought process is along the lines of "array.Any()? Oh, does the array contain anything? Ok, so this is checking whether the length is not 0? Or... is it checking for non-default/non-null values?"

Moreover, the Length or Count property gives a clear hint that this information is directly accessible. As a counterexample, an IEnumerable<T> does not expose such a property, since it may use deferred enumeration and/or may not support multiple enumeration. Therefore, the presence and use of a Length or Count property gives me confirmation that no enumeration is happening and that the count can indeed be easily checked.

It is also reasonable that there are implementations of collection interfaces where computing Count is more expensive than just checking if a list is empty. E.g. think of some kind of linked list implementation.

Any such implementation that I have seen exposes a Count property that it simply updates at modification time. But for the sake of argument, if a collection is so specific that it does not, then we are probably dealing with extreme performance (or concurrency) constraints, and using that collection type's dedicated features (most likely in the form of its own IsEmpty property) is in order.

Wouldn't a general IsEmpty property expose yet another way of doing things? It also reduces consistency with checking for other population counts, such as Count == 1.

@sfiruch
Copy link
Contributor

sfiruch commented Jan 11, 2023

This wouldn't be even an issue in an ideal world, right? The performance of .Any() should be strictly better than the alternative. Is it impossible to add an explicit .Any() method to arrays and List<T>?

The general approach of using analyzers to work around usability issues in the frameworks seems wrong to me.

@stephentoub
Copy link
Member Author

stephentoub commented Jan 11, 2023

Is it impossible to add an explicit .Any() method to arrays and List?

Impossible? No, but we're then marching down a path of adding one for every collection type, which isn't viable, and in the limit actually is impossible since we don't control the implementations of nor can we reference most collection implementations in the world.

The performance of .Any() should be strictly better than the alternative.

Why? Please share the implementation of Any() that would achieve this.

@sfiruch
Copy link
Contributor

sfiruch commented Jan 12, 2023

Is it impossible to add an explicit .Any() method to arrays and List?

Impossible? No, but we're then marching down a path of adding one for every collection type, which isn't viable, and in the limit actually is impossible since we don't control the implementations of nor can we reference most collection implementations in the world.

The alternatives appear to be worse:
a) implement analyzer rules for most collection implementations in the world
b) implement a general rule in an analyzer which might be wrong for half of most collection implementations in the world. I have several collection implementations in my code bases where counting is more expensive than checking whether the collection is empty.

The performance of .Any() should be strictly better than the alternative.

Why? Please share the implementation of Any() that would achieve this.

You're absolutely right. I should've written equal or better instead of strictly better. The reasoning is obvious: .Any() computes less information than .Length or .Count. Practically speaking, adding

public bool Any() => Count!=0;

to the most common collection types appears to be a simple solution, which appears to have only upsides:

  • Usability: No analyzer/training/documentation/configuration/brainpower required
  • Efficiency: It's efficient no matter which way developers check
  • Productivity: Developers are not asked to change existing code
  • Effort: Low (but I'm probably missing some reflection-related compatibility nightmares)

In my view, an analyzer should be only considered if there's not easy way to make the framework do "the right thing" otherwise. Otherwise we should consider a "replace all your code with hand-optimized assembly" analyzer 😉

@stephentoub
Copy link
Member Author

stephentoub commented Jan 12, 2023

I have several collection implementations in my code bases where counting is more expensive than checking whether the collection is empty.

Regardless of this discusson, that's asking for perf problems as your consumers assume Count is cheap. And if you're your only consumer and you don't like the suggestion offered by this diagnostic id, you can turn off the suggestion.

adding to the most common collection types appears to be a simple solution, which appears to have only upsides:

It does not have only upside. For example, how do you propose handling ICollection<T>.Any()? If the answer is "don't", that's a large gap as it's very common to pass around IList<T>, ICollection<T>, etc., and if the answer is DIMs, that's very likely a breaking change in the making due the nature of DIMs and the diamond pattern. And if the answer is add an extension on each of the interfaces, that's also likely to lead to breaking changes due to ambiguity.

Besides that, every single collection needs a new Any() method? That's not upside. And why stop there? What about the rest of LINQ? Add each of those hundreds of methods to every collection?

Usability

You've cited usability as a benefit here. I continue to disagree with that notion. I understand some devs think that .Any() is clearer than Count > 0 or the like; I and others see the opposite. That's especially true with a collection like List<bool>, where I see Any() as being ambiguous and Count > 0 as being unambiguous.

Efficiency

It's efficient if the collection exposes it. The vast majority of collection types will not, nor will any new methods be added to the core collections in past releases, in netstandard2.0, etc. And now anyone reading code needs to know the exact type of the variable in question and know whether foo.Any() will resolve to an instance method or LINQ extension.

Productivity

No one is being asked to do anything. The tool is making a suggestion. Not a warning, not an error, a suggestion. If you don't like the suggestion, you can ignore it, disable it. The suggestion isn't even made in command-line tooling and builds. (And if everyone who defines a collection needs to also add an Any() method, that's a hit in productivity.) If you don't care about perf this much, disable all the perf analyzer suggestions for that matter.

@sfiruch
Copy link
Contributor

sfiruch commented Jan 12, 2023

Regardless of this discusson, that's asking for perf problems as your consumers assume Count is cheap. And if you're your only consumer and you don't like the suggestion offered by this diagnostic id, you can turn off the suggestion.

Right, but in practice there are cases like ConcurrentDictionary, BitArray or lazily produced collections. For those .Any() should be faster than .Count.

You've cited usability as a benefit here. I continue to disagree with that notion. I understand some devs think that .Any() is clearer than Count > 0 or the like; I and others see the opposite. That's especially true with a collection like List<bool>, where I see Any() as being ambiguous and Count > 0 as being unambiguous.

Here I wasn't talking about the specifics of .Any() vs. Count, but instead about the usability of .NET itself. When there are multiple obvious ways to write something, all these ways should perform well. Better analyzers, training or documentation are just work-arounds for a more fundamental problem. Such problems should be fixed one time in the framework, instead of requiring thousands of changes in thousands of repos, and brain cells from your average developer.

Personally, I find .Any() to be clearer and better express my intent. Please note, that I'm not the only one with this view. But it's fine to disagree.

No solution will work perfectly all the time, but shouldn't we try to remove sharp corners and focus on the pit of success?

Besides that, every single collection needs a new Any() method? That's not upside. And why stop there? What about the rest of LINQ? Add each of those hundreds of methods to every collection?

Not every single collection. Commonly used collections should have a performant .Any() implementation. It's a one time cost, for developers which are intimately familiar with the performance details of .NET.

Ideally, all of LINQ would be performant. There are ways to achieve this (ask Joe Duffy or check https://github.com/NetFabric/LinqBenchmarks).

@ghost ghost locked as resolved and limited conversation to collaborators Mar 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Collections code-analyzer Marks an issue that suggests a Roslyn analyzer code-fixer Marks an issue that suggests a Roslyn code fixer partner-impact This issue impacts a partner who needs to be kept updated
Projects
None yet
Development

Successfully merging a pull request may close this issue.