Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup variables checker 2 #9985

Closed

Conversation

nickdrozd
Copy link
Contributor

Type of Changes

Type
βœ“ πŸ”¨ Refactoring

Description

Continuing #9978

Variables checker code is difficult to read because the lines are too long. One easy way to shorten lines is to turn static methods into free functions. This saves a lot of indentation and makes the functions easier to call. Also shorten the name of VariableVisitConsumerAction, because that is 27 characters long.

Other problems that make the code hard to read are that there are too many statements and variables are defined in too many places. Things can be consolidated by using ternary expressions and walrus operator. Some variables can be done away with entirely.

This MR should make no functional changes, except that in a few cases some variables are defined later. This saves a tiny bit of time.

@nickdrozd nickdrozd added the Skip news πŸ”‡ This change does not require a changelog entry label Sep 29, 2024
Copy link

codecov bot commented Sep 29, 2024

Codecov Report

Attention: Patch coverage is 98.53372% with 5 lines in your changes missing coverage. Please review.

Project coverage is 95.79%. Comparing base (c0ecd70) to head (6839c62).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
pylint/checkers/variables.py 98.53% 5 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #9985      +/-   ##
==========================================
- Coverage   95.80%   95.79%   -0.02%     
==========================================
  Files         174      174              
  Lines       18934    18870      -64     
==========================================
- Hits        18140    18076      -64     
  Misses        794      794              
Files with missing lines Coverage Ξ”
pylint/checkers/variables.py 97.08% <98.53%> (-0.14%) ⬇️

Copy link
Contributor

πŸ€– According to the primer, this change has no effect on the checked open source code. πŸ€–πŸŽ‰

This comment was generated for commit 6839c62

@nickdrozd
Copy link
Contributor Author

Some of the changed lines are not covered. But I think those lines were already not covered.

@jacobtylerwalls jacobtylerwalls self-requested a review September 30, 2024 11:52
Copy link
Member

@jacobtylerwalls jacobtylerwalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with extracting pure functions, might be wise to get a +1 from someone else too on that one.

I would prefer not to add more multiline ternaries.

@@ -73,7 +73,11 @@
)


class VariableVisitConsumerAction(Enum):
def register(linter: PyLinter) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other checkers seem to all place register() at the end of the file, so I suggest keeping that pattern here.

Comment on lines +238 to +253
name
if import_module_name == "*"
else
# Most likely something like 'xml.etree',
# which will appear in the .locals as 'xml'.
# Only pick the name if it wasn't consumed.
(
import_module_name
if (
import_module_name.startswith(name)
and import_module_name.find(".") > -1
)
or name in imports
else None
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to avoid adding multi-line ternaries. I find this refactor worse than the original form. Some of the shorter ones are at best a wash.

Comment on lines +1605 to +1613
confidence=(
INFERENCE
if node.name in current_consumer.names_under_always_false_test
else (
CONTROL_FLOW
if node.name in current_consumer.consumed_uncertain
else HIGH
)
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one would probably be a wash for me, were it not for the fact that now we lose resolution on coverage stats because this is now a single statement.

Comment on lines -2523 to +1792
and not self._in_lambda_or_comprehension_body(node, frame)
and not _in_lambda_or_comprehension_body(node, frame)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of change is okay with me: the VariablesChecker and NamesConsumer are long enough that having all these methods in a closure makes it harder to see what they do. But let's get a +1 from someone else.

)


def _is_only_type_assignment(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go through with this, let's add this commit to the git ignore revs list, and we'll be sure to merge it with a merge commit to avoid rewriting the hash.

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your enthusiasm and your work on pylint, Nick.

I was waiting for Jacob's opinion on this. If he thinks the result is better, then let's merge.

My first instinct is that I don't see a lot of value, we're moving code around but it's not reducing the complexity overall. Some changes are opinionated (ternary even when it's on dozen of lines). I don't think favoring ternaries was discussed beforehand (the ternary checker is an extension). Last time I reviewed 1000+ line was for the unicode checker, which was adding a massive new feature. Here we basically have a style MR. Reviewing so much changes in details is very hard. I have to trust that the code is moved without modifications, which would allow for underhanded changes if we're in a xz scenario. The git blame is also going to be harder to read and investigate.

I'd rather focus on bug fixes / features / simplifications that reduce complexity intrinsically, than on pure style refactor without necessity / consensus.

@jacobtylerwalls
Copy link
Member

Thank you for your enthusiasm and your work on pylint, Nick.

+1


I agree with Pierre: I had to round up to get to -0 on the staticmethod refactors, so I'm really more of a -0.5 (but -1 on the ternaries). I think the staticmethods are not what makes the checker hard to work on: it's the fact that the overall strategy is opaque. There is no documentation anywhere about what the overall strategy is. It's like solving an escape room where the themes of the clues keep changing.

@nickdrozd
Copy link
Contributor Author

I heard all this the last time I tried making changes. Too hard to review, confuses the Git history, too much churn, need consensus, just style change, etc. So I stopped trying to make changes.

What happened since then? Pretty much nothing. There are still a bunch of used-before-assignment bugs, and Pylint is still not type-correct. Why haven't these things been fixed yet? What is taking so long?

I will tell you what I think is the problem: the code is way too complicated. Nobody knows how it works and everybody is afraid to touch it. "Small, targeted changes" only serve to exacerbate this problem, because it just means adding a little bit here and there without clarifying the overall picture.

Look at what lints are disabled in this file:

7 matches for "disable" in buffer: variables.py
   1179:# pylint: disable=too-many-public-methods
   1765:    # pylint: disable = too-many-return-statements, too-many-branches
   2312:                    isinstance(  # pylint: disable=too-many-boolean-expressions
   2527:    # pylint: disable-next=too-many-branches,too-many-statements
   2548:            not astmts  # pylint: disable=too-many-boolean-expressions
   2674:    # pylint: disable = too-many-branches
   3183:    # pylint: disable = too-many-branches

These are flashing red alarms warning that the complexity is out of control.

By the way, in the course of working on this MR, I found an undocumented side effect that affects a warning from a totally different checker. That's not good!

Anyway, in the short term, I would like to get #9964 finished. And that in turn depends on #8893 (at least).

So is there a plan to fix #8893 in a timely fashion? And more broadly, is there a plan / timeline to get Pylint and Astroid correctly typed and compiled with Mypyc?

@jacobtylerwalls
Copy link
Member

I don't think anyone is defending the complexity of the variables checker. I'm hearing skepticism that moving methods out of a closure meaningfully reduces that complexity.

So is there a plan to fix #8893 in a timely fashion? And more broadly, is there a plan / timeline to get Pylint and Astroid correctly typed and compiled with Mypyc?

If you want help fixing #8893, you are welcome to ask for advice, but suggesting that we are not doing things in a timely fashion doesn't promote volunteerism. And no, there are no fixed timelines, but as of today both astroid and pylint are cut over to 4.0-dev so we can start considering breaking API changes that might be necessary to make them type-correct. We have a number of new contributors that are taking an interest in astroid's internals, and I find that welcome and invigorating.

If you have ideas about further variables checker refactors, you're welcome to open issues and sound them out.

@jacobtylerwalls
Copy link
Member

By the way, in the course of working on this MR, I found an undocumented side effect that affects a warning from a totally different checker. That's not good!

Thanks for finding this. Can you call this out, ideally opening an issue, to increase its visibility? Thanks!

@Pierre-Sassoulas
Copy link
Member

It's not impossible to merge a refactor in pylint. But it should be consensual, or discussed beforehand, or reasonable and small. Here's an example of one that took 2mn to be reviewed, 10mn to be merged : #4872 (many such examples in the history). If you open a PR that remove 200 lines of crap and the tests are still green, it's going to be merged before dawn.

Your intention are laudable, but I feel you're pushing for opinionated changes that do not make the situation better and I feel like we're wasting time bikesheding around discussing ternaries, class function that could be staticmethod or independent functions, or typing import guards. Actually making the variable checker easier to develop on is going to be HARD. No one feels productive by reading code for 2 hours straight in a style they don't like then discussing with the stakeholder for a month asynchronously, but this is the way. Moving code around and opening style MR is not going to cut it. Jacob is the one with the more experience in this part of the code, he's going to know what the pain points are and the possible way to make this better or where to investigate. Did you try to discuss the design with him beforehand ?

@nickdrozd
Copy link
Contributor Author

I don't think anyone is defending the complexity of the variables checker. I'm hearing skepticism that moving methods out of a closure meaningfully reduces that complexity.

There are a thousand and one changes that need to be made to simplify things. It's easy to point to each individual change and say that it doesn't improve anything. But in aggregate the small changes add up. And further, each change facilitates further changes. For example, if a variable is defined in one place instead of four, it's much easier to move that definition to a different place.

... suggesting that we are not doing things in a timely fashion doesn't promote volunteerism.

The usual pattern in open source is that some pushy user comes in demanding changes. "When can we expect such and such to be addressed?" The extremely reasonable maintainer response is "Nobody has worked on it yet, but you are welcome to."

That is very much not what is happening here. Instead, I have been repeatedly told no no no to all sorts of different changes. So given that my volunteer efforts have been turned down over and over for years, what choice do I have but to switch to demanding-user mode?

I have a plan to fix things. But my plan is not accepted. Okay, fine, so what is the alternative? And why is it taking so long?

It's not like I'm looking for problems where there are none. If these issues were fixed already, we wouldn't be having this conversation. I'm more than happy to do the work myself. And if someone else wants to do it, that's great too. But the work isn't getting done, and that makes me not so happy.

I feel like we're wasting time bikesheding around discussing ternaries, class function that could be staticmethod or independent functions, or typing import guards.

Yes, I certainly agree with that πŸ˜†

Let's talk about these static methods. Why are they static methods instead of free functions? Other than status quo bias, what is the advantage of using static methods? Here are the downsides: it adds extra indentation and it requires more text to make calls. What are the upsides?

Did you try to discuss the design with him beforehand?

He has described the checker as "forbidding". That is not a good sign.

Moving code around and opening style MR is not going to cut it.

Okay, is there a different approach you would prefer? I've been using atomic commits that make clear specific changes. Would more you prefer more dramatic changes done all at once? I would be happy to do that too.

@jacobtylerwalls
Copy link
Member

He has described the checker as "forbidding". That is not a good sign.

What exactly is not a good sign? The status quo of the checker, or my comments?

@jacobtylerwalls
Copy link
Member

The extremely reasonable maintainer response is "Nobody has worked on it yet, but you are welcome to."

I gave a version of that response on #8893 earlier today.

@nickdrozd
Copy link
Contributor Author

What exactly is not a good sign? The status quo of the checker, or my comments?

Sorry, I meant the status quo is not good. Your assessment of the checker as "forbidding" and "opaque" is totally accurate and well-informed! And for exactly that reason, I think the cautious conservative approach is not going to be successful. A big overhaul will be required, and the first step to that is clearing out years of cruft. Once that is done, it will be a lot easier to have a fruitful design discussion, because it will be possible to see what's currently going on.

@Pierre-Sassoulas
Copy link
Member

I have a plan to fix things. But my plan is not accepted.

I'm sorry can you tell us more about your plan ? Because all I see in this MR is a change of style that is absolutely impossible to review in a reasonable time in the current format. I don't think it's necessary at all to fix anything. It's also not something that is making the code better in a meaningful way, and not a bug fix of the numerous existing false positives / false negatives. To quote Jacob:

I think the staticmethods are not what makes the checker hard to work on: it's the fact that the overall strategy is opaque

So what would your new improved strategy be like ? Feel free to start the checker from scratch and provide a better implementation, it's going to be WAY WAY more efficient than changing the coding style of the whole checker to your taste first.

@jacobtylerwalls
Copy link
Member

At the risk of veering a little off-topic, my prescription for the variables checker would be for somebody to write a killer docstring that summarizes (in a bulleted list of about half a dozen lines of pseudocode) what it actually does.

Right now when PRs come in for the variables checker, I do my best to say "please don't add this special case for walrus/if/else/except over here, please add it over there where we happen to handle very similar cases", but as we all know, that risks adding bloat on top of bloat. If we had that six line summary of the strategy, we would probably notice "ugh, we are repeating O(n) scans here, here, and here" or "we're rechecking type guarding blocks here, here, and here", etc., and then we'd be able to get people start rowing in the same direction in terms of agreeing on refactors.

@jacobtylerwalls
Copy link
Member

@zenlyj πŸ‘‹ might also have valuable input on variables checker refactors or on bug fix strategies for #8893. @zenlyj, any thoughts?

@zenlyj
Copy link
Contributor

zenlyj commented Sep 30, 2024

i was just about to submit a PR for #8893. at first glance, it seems like a regression from #8431, but i'll need to confirm

@zenlyj
Copy link
Contributor

zenlyj commented Oct 1, 2024

on the topic of refactoring, i agree that the variables checker could benefit from refactoring to improve code quality / readability.

i've made a few contributions to this checker in the past, but i am still not confident to say that i have a complete understanding of its internals. @nickdrozd's recent work on trying to kick things off and make changes in this area is laudable and a step in the right direction imo.

but to go even further and make things clean and easier to contribute, one would need to deep dive and have a good understanding of how things are working. that means going through and reviewing years of development, patches on top of patches and finally rewriting it in a way that makes better sense.

from my limited experience in working on this checker, i vaguely recall seeing undefined-variable and used-before-assignment checks having tight coupling, sometimes even interdependent. i am unsure of that is so, perhaps for performance reasons? but if these two checks can be split up and encapsulated into separate functions (without any cost in performance of course), it will help to reduce cognitive overload significantly.

@nickdrozd nickdrozd closed this Oct 1, 2024
@nickdrozd nickdrozd deleted the cleanup-variables-checker branch October 7, 2024 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Skip news πŸ”‡ This change does not require a changelog entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants