Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

any/all short circuiting #426

Closed
sbrugman opened this issue May 16, 2022 · 8 comments · Fixed by #427
Closed

any/all short circuiting #426

sbrugman opened this issue May 16, 2022 · 8 comments · Fixed by #427

Comments

@sbrugman
Copy link
Contributor

sbrugman commented May 16, 2022

Description

List comprehension can postpone the evaluation of any/all, which can hurt performance for larger iterables. Surprisingly, this was not yet included here as rule.

Example:

def hi():
    print('hi')
    return True

>>> any(hi() for num in [1, 2, 3, 4])
hi

>>> any([hi() for num in [1, 2, 3, 4]])
hi
hi
hi
hi

From this answer

Proposal

Extend the current set of rules with C418 and C419 (any, all), similar to C403 and C404

@adamchainz
Copy link
Owner

Thanks for the suggestion.

In the past I removed some rules that suggested using generators over comprehensions, because the increase in laziness is not always a 100% compatible change: https://github.com/adamchainz/flake8-comprehensions/blob/main/HISTORY.rst#340-2021-03-18 . In the case of any/all, any side effects from the comprehension would be lost. I prefer to have rules don't have any false positives.

Also generator expressions are not always faster. For small-ish collections, list comprehensions often win.

I'd be open to having a set of default-disabled rules that one opts into though. Perhaps there could be a setting ("suggest-generators" or something) that users add with the understanding that semantics could change...

Thoughts?

@sbrugman
Copy link
Contributor Author

Thanks for the useful pointers.

The scope of C407 used to be much broader, considering all builtins - frequently resulting in false positives. For any/all, there are clear cases where the generator expression is preferable.

The choice for generator or comprehension should indeed be made consciously (and flake8-comprehensions should help with pointing that out). Opting-in and having a proper disclaimer seems like a great solution.

Notes on performance, for future reference:

A fully exhausted generator is slower than an equivalent comprehension. However, any and all will terminate early when the first True or False value respectively is encountered. As a rough estimate, if a fully exhausted generator is 50% slower (reasonable based on benchmarks), compared to an equivalent comprehension, then on average, a generator may be preferable when an early termination value is expected before 2/3rds of the sequence is processed. Note that the generator variant is more memory efficient.

@Skylion007
Copy link
Contributor

@adamchainz Here is some performance bench-marking specifically for any / all: https://www.katrin-affolter.ch/Programming/performance_of_all_and_any

@Skylion007
Copy link
Contributor

FYI for any who wants to use this rule, it's implemented in flake8-pie as PIE802.

@adamchainz
Copy link
Owner

Thanks @Skylion007 , since i'ts implemented elsewhere I think we can close this. I think I'll avoid adding rules that change laziness to this plugin, it'll just be easier to maintain and understand.

@adamchainz
Copy link
Owner

Actually there's a PR open, I think we cxan review and add this.

@adamchainz adamchainz reopened this Mar 19, 2023
@Skylion007
Copy link
Contributor

There is one counter example here, but the potential speedups are potentially worth it: astral-sh/ruff#3259 (comment)

@adamchainz
Copy link
Owner

The counterexample isn't for any/all ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants