Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs in W0640 cell-var-from-loop checker #4827

Merged

Conversation

david-yz-liu
Copy link
Contributor

Type of Changes

Type
βœ“ πŸ› Bug fix

Description

I reviewed some of the issues related to W0640, and since they were all related to the same method (VariablesChecker._check_late_binding_closure) I tackled them in one go, reviewing the existing code and adding some additional tests too.

  1. False positive for cell-var-from-loopΒ #3107. The current code checks for names which are used as default arguments as follows:

    https://github.com/PyCQA/pylint/blob/b8021d33a4f34b410d73fa389621e2c4ae5e22c0/pylint/checkers/variables.py#L1820-L1821

    This only works for names used as simple default arguments, e.g. lambda x=i : x, but not ones with a more complex node structure like lambda x=i[0] : x. The obvious change is to use utils.is_default_argument, but I realized that there are cases where a default argument is a cell var defined from a loop, it just requires two levels of function nesting, as in the following (new) test case:

    def bad_case10():
        """Detect when a loop variable is the default argument for a nested function"""
        lst = []
        for i in range(10):
            def func():
                def func2(arg=i):  # [cell-var-from-loop]
                    return arg
    
                return func2
    
            lst.append(func)
        return lst

    So I modified the algorithm a little to "jump up" a scope level when encountering a name in a default argument, rather than just returning right away.

  2. cell-var-from-loop misses case with nested loopΒ #2846 The method currently uses the following to decide to skip the cell-var-from-loop check:

    https://github.com/PyCQA/pylint/blob/b8021d33a4f34b410d73fa389621e2c4ae5e22c0/pylint/checkers/variables.py#L1817-L1819

    This results in a false negative when the cell variable appears inside a comprehension (itself inside a function body), e.g.

    TEST_B = [
        (lambda: [n for _ in range(3)])
        for n in range(3)
    ]

    My fix here was simply to use node.frame() rather than node.scope(), since the former skips comprehension scopes. A bit further below, I also added an explicit node.name in node_scope.locals to skip the remaining work if the node is known to not be a cell variable at all.

  3. I modified the algorithm so that instead of taking an assignment_node from the caller (which was the first AssignName in the "current consumer" in VariablesChecker.visit_name), it instead uses astroid's lookup method to find assignments associated with the name, taking into account basic control flow. This fixes a false negative like the following (also added as new test case):

    def bad_case9():
        """Detect when loop variable shadows an outer assignment."""
        lst = []
        i = 100
        for i in range(10):
            lst.append(lambda: i)  # [cell-var-from-loop]
        return lst
  4. Previously, this method was called in the following location in visit_name:

    https://github.com/PyCQA/pylint/blob/b8021d33a4f34b410d73fa389621e2c4ae5e22c0/pylint/checkers/variables.py#L1054-L1057

    This meant that if both undefined-variable and used-before-assignment were disabled, there would be many false negatives for cell-var-from-loop as well. I'm pretty sure this was unintentional. For example:

    $ python -m pylint --disable=undefined-variable,used-before-assignment tests/functional/c/cellvar_escaping_loop.py
    ************* Module functional.c.cellvar_escaping_loop
    tests\functional\c\cellvar_escaping_loop.py:70:27: W0640: Cell variable i defined in loop (cell-var-from-loop)
    
    ------------------------------------------------------------------
    Your code has been rated at 9.85/10 (previous run: 9.91/10, -0.06)

    I moved the check out of the if statement. (This is in a separate commit.)

I also did a little bit of code cleanup to use checker.utils a bit more, hope that's okay. But if not happy to revert some of those changes.

Closes #2846
Closes #3107

Comment: Also, I think #2121 is fixed on main?

1. Handle cell var appearing in node inside function.
2. Handle cell var appearing in non-trivial default argument expression.
3. Use astroid's lookup method to account for variable shadowing.
Previously, this check would have many false negatives when both unused-variable and
used-before-assignment were disabled.
@coveralls
Copy link

coveralls commented Aug 10, 2021

Pull Request Test Coverage Report for Build 1122866118

  • 14 of 14 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.007%) to 92.63%

Totals Coverage Status
Change from base Build 1121519958: 0.007%
Covered Lines: 13284
Relevant Lines: 14341

πŸ’› - Coveralls

@Pierre-Sassoulas Pierre-Sassoulas self-requested a review August 11, 2021 07:29
Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this MR ! Closes multiple issues, very well tested and also super clean, which is on par with that what you usually do πŸ˜„

@@ -455,6 +455,11 @@ def is_ancestor_name(
return False


def is_being_called(node: astroid.node_classes.NodeNG) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this style of doing things better than having super big conditional without explanation. Even though I know this won't be re-used it's nice to have a function if only for clarity.


Special cases where we don't care about the error:
1. When the node's function is immediately called, e.g. (lambda: i)()
2. When the node's function is returned from within the loop, e.g. return lambda: i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ‘

Comment on lines +161 to +162
x # [cell-var-from-loop]
+ y) # [cell-var-from-loop]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
x # [cell-var-from-loop]
+ y) # [cell-var-from-loop]
x+ y) # [cell-var-from-loop, cell-var-from-loop]

I don't know if it's intentional or if you want to differentiate x and y to see that a message is send for each, but you could also do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pierre-Sassoulas I actually wasn't aware you could test for multiple error occurrences in the same line, good to know! πŸ˜…

Since you've merged in this PR I'm assuming you don't actually need me to make this change, but I'll keep it in mind for the future.

@Pierre-Sassoulas Pierre-Sassoulas added this to the 2.10.0 milestone Aug 12, 2021
@Pierre-Sassoulas Pierre-Sassoulas merged commit 678d4f1 into pylint-dev:main Aug 12, 2021
@david-yz-liu david-yz-liu deleted the cell-var-from-loop-fixes branch August 12, 2021 12:38
@kowalcj0
Copy link

Hi,

I'm little confused by this warning which pylint 2.12.2 found in the following example.py:

"""cell-var-from-loop example."""

grouped_messages = {
    "category_a": [
        {"key_a": 3},
        {"key_a": 1},
        {"key_a": 2},
    ],
    "category_b": [
        {"key_b": 5},
        {"key_b": 3},
        {"key_b": 8},
    ],
}
sorting_keys = {
    "category_a": "key_a",
    "category_b": "key_b",
}

sorted_messages = {}

for category, messages in grouped_messages.items():
    sorted_messages[category] = sorted(
        (message for message in messages),
        key=lambda msg: msg[sorting_keys[category]]
    )

print(sorted_messages)
$ pylint example.py 
************* Module example
example.py:25:41: W0640: Cell variable category defined in loop (cell-var-from-loop)

------------------------------------------------------------------
Your code has been rated at 8.33/10 (previous run: 8.33/10, +0.00)

I got rid of this warning by replacing key=lambda msg: msg[sorting_keys[category]] with key=operator.itemgetter(sorting_keys[category]).
TBH I'm not sure why pylint complains about category being defined in loop.
Especially, because both versions of the code, give exactly the same result:

python3 example.py 
{'category_a': [{'key_a': 1}, {'key_a': 2}, {'key_a': 3}], 'category_b': [{'key_b': 3}, {'key_b': 5}, {'key_b': 8}]}

@DanielNoord
Copy link
Collaborator

This looks like a false positive to me. In any case, could you open a new issue for this? Continuing the discussion in closed PRs can lead to the discussion being lost (as they are not easily visible in the issue list).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug πŸͺ² Enhancement ✨ Improvement to a component
Projects
None yet
Development

Successfully merging this pull request may close these issues.

False positive for cell-var-from-loop cell-var-from-loop misses case with nested loop
5 participants