Cleanup variables checker 2 #9985

nickdrozd · 2024-09-29T18:28:51Z

Type of Changes

	Type
✓	🔨 Refactoring

Description

Continuing #9978

Variables checker code is difficult to read because the lines are too long. One easy way to shorten lines is to turn static methods into free functions. This saves a lot of indentation and makes the functions easier to call. Also shorten the name of VariableVisitConsumerAction, because that is 27 characters long.

Other problems that make the code hard to read are that there are too many statements and variables are defined in too many places. Things can be consolidated by using ternary expressions and walrus operator. Some variables can be done away with entirely.

This MR should make no functional changes, except that in a few cases some variables are defined later. This saves a tiny bit of time.

codecov · 2024-09-29T18:38:45Z

Codecov Report

Attention: Patch coverage is 98.53372% with 5 lines in your changes missing coverage. Please review.

Project coverage is 95.79%. Comparing base (c0ecd70) to head (6839c62).
Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
pylint/checkers/variables.py	98.53%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #9985      +/-   ##
==========================================
- Coverage   95.80%   95.79%   -0.02%     
==========================================
  Files         174      174              
  Lines       18934    18870      -64     
==========================================
- Hits        18140    18076      -64     
  Misses        794      794

Files with missing lines	Coverage Δ
pylint/checkers/variables.py	`97.08% <98.53%> (-0.14%)`	⬇️

github-actions · 2024-09-29T18:51:18Z

🤖 According to the primer, this change has no effect on the checked open source code. 🤖🎉

This comment was generated for commit 6839c62

nickdrozd · 2024-09-29T20:40:03Z

Some of the changed lines are not covered. But I think those lines were already not covered.

jacobtylerwalls

I'm good with extracting pure functions, might be wise to get a +1 from someone else too on that one.

I would prefer not to add more multiline ternaries.

jacobtylerwalls · 2024-09-30T12:16:31Z

pylint/checkers/variables.py

@@ -73,7 +73,11 @@
 )


-class VariableVisitConsumerAction(Enum):
+def register(linter: PyLinter) -> None:


The other checkers seem to all place register() at the end of the file, so I suggest keeping that pattern here.

jacobtylerwalls · 2024-09-30T12:17:27Z

pylint/checkers/variables.py

+                    name
+                    if import_module_name == "*"
+                    else
+                    # Most likely something like 'xml.etree',
+                    # which will appear in the .locals as 'xml'.
+                    # Only pick the name if it wasn't consumed.
+                    (
+                        import_module_name
+                        if (
+                            import_module_name.startswith(name)
+                            and import_module_name.find(".") > -1
+                        )
+                        or name in imports
+                        else None
+                    )
+                )


I would prefer to avoid adding multi-line ternaries. I find this refactor worse than the original form. Some of the shorter ones are at best a wash.

jacobtylerwalls · 2024-09-30T12:24:40Z

pylint/checkers/variables.py

+            confidence=(
+                INFERENCE
+                if node.name in current_consumer.names_under_always_false_test
+                else (
+                    CONTROL_FLOW
+                    if node.name in current_consumer.consumed_uncertain
+                    else HIGH
+                )
+            ),


This one would probably be a wash for me, were it not for the fact that now we lose resolution on coverage stats because this is now a single statement.

jacobtylerwalls · 2024-09-30T12:26:12Z

pylint/checkers/variables.py

-            and not self._in_lambda_or_comprehension_body(node, frame)
+            and not _in_lambda_or_comprehension_body(node, frame)


This kind of change is okay with me: the VariablesChecker and NamesConsumer are long enough that having all these methods in a closure makes it harder to see what they do. But let's get a +1 from someone else.

jacobtylerwalls · 2024-09-30T12:26:53Z

pylint/checkers/variables.py

+    )
+
+
+def _is_only_type_assignment(


If we go through with this, let's add this commit to the git ignore revs list, and we'll be sure to merge it with a merge commit to avoid rewriting the hash.

Pierre-Sassoulas

Thank you for your enthusiasm and your work on pylint, Nick.

I was waiting for Jacob's opinion on this. If he thinks the result is better, then let's merge.

My first instinct is that I don't see a lot of value, we're moving code around but it's not reducing the complexity overall. Some changes are opinionated (ternary even when it's on dozen of lines). I don't think favoring ternaries was discussed beforehand (the ternary checker is an extension). Last time I reviewed 1000+ line was for the unicode checker, which was adding a massive new feature. Here we basically have a style MR. Reviewing so much changes in details is very hard. I have to trust that the code is moved without modifications, which would allow for underhanded changes if we're in a xz scenario. The git blame is also going to be harder to read and investigate.

I'd rather focus on bug fixes / features / simplifications that reduce complexity intrinsically, than on pure style refactor without necessity / consensus.

jacobtylerwalls · 2024-09-30T13:21:01Z

Thank you for your enthusiasm and your work on pylint, Nick.

+1

I agree with Pierre: I had to round up to get to -0 on the staticmethod refactors, so I'm really more of a -0.5 (but -1 on the ternaries). I think the staticmethods are not what makes the checker hard to work on: it's the fact that the overall strategy is opaque. There is no documentation anywhere about what the overall strategy is. It's like solving an escape room where the themes of the clues keep changing.

nickdrozd · 2024-09-30T13:58:25Z

I heard all this the last time I tried making changes. Too hard to review, confuses the Git history, too much churn, need consensus, just style change, etc. So I stopped trying to make changes.

What happened since then? Pretty much nothing. There are still a bunch of used-before-assignment bugs, and Pylint is still not type-correct. Why haven't these things been fixed yet? What is taking so long?

I will tell you what I think is the problem: the code is way too complicated. Nobody knows how it works and everybody is afraid to touch it. "Small, targeted changes" only serve to exacerbate this problem, because it just means adding a little bit here and there without clarifying the overall picture.

Look at what lints are disabled in this file:

7 matches for "disable" in buffer: variables.py
   1179:# pylint: disable=too-many-public-methods
   1765:    # pylint: disable = too-many-return-statements, too-many-branches
   2312:                    isinstance(  # pylint: disable=too-many-boolean-expressions
   2527:    # pylint: disable-next=too-many-branches,too-many-statements
   2548:            not astmts  # pylint: disable=too-many-boolean-expressions
   2674:    # pylint: disable = too-many-branches
   3183:    # pylint: disable = too-many-branches

These are flashing red alarms warning that the complexity is out of control.

By the way, in the course of working on this MR, I found an undocumented side effect that affects a warning from a totally different checker. That's not good!

Anyway, in the short term, I would like to get #9964 finished. And that in turn depends on #8893 (at least).

So is there a plan to fix #8893 in a timely fashion? And more broadly, is there a plan / timeline to get Pylint and Astroid correctly typed and compiled with Mypyc?

jacobtylerwalls · 2024-09-30T14:25:44Z

I don't think anyone is defending the complexity of the variables checker. I'm hearing skepticism that moving methods out of a closure meaningfully reduces that complexity.

So is there a plan to fix #8893 in a timely fashion? And more broadly, is there a plan / timeline to get Pylint and Astroid correctly typed and compiled with Mypyc?

If you want help fixing #8893, you are welcome to ask for advice, but suggesting that we are not doing things in a timely fashion doesn't promote volunteerism. And no, there are no fixed timelines, but as of today both astroid and pylint are cut over to 4.0-dev so we can start considering breaking API changes that might be necessary to make them type-correct. We have a number of new contributors that are taking an interest in astroid's internals, and I find that welcome and invigorating.

If you have ideas about further variables checker refactors, you're welcome to open issues and sound them out.

jacobtylerwalls · 2024-09-30T14:35:53Z

By the way, in the course of working on this MR, I found an undocumented side effect that affects a warning from a totally different checker. That's not good!

Thanks for finding this. Can you call this out, ideally opening an issue, to increase its visibility? Thanks!

Pierre-Sassoulas · 2024-09-30T15:59:51Z

It's not impossible to merge a refactor in pylint. But it should be consensual, or discussed beforehand, or reasonable and small. Here's an example of one that took 2mn to be reviewed, 10mn to be merged : #4872 (many such examples in the history). If you open a PR that remove 200 lines of crap and the tests are still green, it's going to be merged before dawn.

Your intention are laudable, but I feel you're pushing for opinionated changes that do not make the situation better and I feel like we're wasting time bikesheding around discussing ternaries, class function that could be staticmethod or independent functions, or typing import guards. Actually making the variable checker easier to develop on is going to be HARD. No one feels productive by reading code for 2 hours straight in a style they don't like then discussing with the stakeholder for a month asynchronously, but this is the way. Moving code around and opening style MR is not going to cut it. Jacob is the one with the more experience in this part of the code, he's going to know what the pain points are and the possible way to make this better or where to investigate. Did you try to discuss the design with him beforehand ?

nickdrozd · 2024-09-30T16:31:12Z

I don't think anyone is defending the complexity of the variables checker. I'm hearing skepticism that moving methods out of a closure meaningfully reduces that complexity.

There are a thousand and one changes that need to be made to simplify things. It's easy to point to each individual change and say that it doesn't improve anything. But in aggregate the small changes add up. And further, each change facilitates further changes. For example, if a variable is defined in one place instead of four, it's much easier to move that definition to a different place.

... suggesting that we are not doing things in a timely fashion doesn't promote volunteerism.

The usual pattern in open source is that some pushy user comes in demanding changes. "When can we expect such and such to be addressed?" The extremely reasonable maintainer response is "Nobody has worked on it yet, but you are welcome to."

That is very much not what is happening here. Instead, I have been repeatedly told no no no to all sorts of different changes. So given that my volunteer efforts have been turned down over and over for years, what choice do I have but to switch to demanding-user mode?

I have a plan to fix things. But my plan is not accepted. Okay, fine, so what is the alternative? And why is it taking so long?

It's not like I'm looking for problems where there are none. If these issues were fixed already, we wouldn't be having this conversation. I'm more than happy to do the work myself. And if someone else wants to do it, that's great too. But the work isn't getting done, and that makes me not so happy.

I feel like we're wasting time bikesheding around discussing ternaries, class function that could be staticmethod or independent functions, or typing import guards.

Yes, I certainly agree with that 😆

Let's talk about these static methods. Why are they static methods instead of free functions? Other than status quo bias, what is the advantage of using static methods? Here are the downsides: it adds extra indentation and it requires more text to make calls. What are the upsides?

Did you try to discuss the design with him beforehand?

He has described the checker as "forbidding". That is not a good sign.

Moving code around and opening style MR is not going to cut it.

Okay, is there a different approach you would prefer? I've been using atomic commits that make clear specific changes. Would more you prefer more dramatic changes done all at once? I would be happy to do that too.

jacobtylerwalls · 2024-09-30T16:39:20Z

He has described the checker as "forbidding". That is not a good sign.

What exactly is not a good sign? The status quo of the checker, or my comments?

jacobtylerwalls · 2024-09-30T16:42:02Z

The extremely reasonable maintainer response is "Nobody has worked on it yet, but you are welcome to."

I gave a version of that response on #8893 earlier today.

nickdrozd · 2024-09-30T17:55:19Z

What exactly is not a good sign? The status quo of the checker, or my comments?

Sorry, I meant the status quo is not good. Your assessment of the checker as "forbidding" and "opaque" is totally accurate and well-informed! And for exactly that reason, I think the cautious conservative approach is not going to be successful. A big overhaul will be required, and the first step to that is clearing out years of cruft. Once that is done, it will be a lot easier to have a fruitful design discussion, because it will be possible to see what's currently going on.

Pierre-Sassoulas · 2024-09-30T19:03:04Z

I have a plan to fix things. But my plan is not accepted.

I'm sorry can you tell us more about your plan ? Because all I see in this MR is a change of style that is absolutely impossible to review in a reasonable time in the current format. I don't think it's necessary at all to fix anything. It's also not something that is making the code better in a meaningful way, and not a bug fix of the numerous existing false positives / false negatives. To quote Jacob:

I think the staticmethods are not what makes the checker hard to work on: it's the fact that the overall strategy is opaque

So what would your new improved strategy be like ? Feel free to start the checker from scratch and provide a better implementation, it's going to be WAY WAY more efficient than changing the coding style of the whole checker to your taste first.

jacobtylerwalls · 2024-09-30T19:23:34Z

At the risk of veering a little off-topic, my prescription for the variables checker would be for somebody to write a killer docstring that summarizes (in a bulleted list of about half a dozen lines of pseudocode) what it actually does.

Right now when PRs come in for the variables checker, I do my best to say "please don't add this special case for walrus/if/else/except over here, please add it over there where we happen to handle very similar cases", but as we all know, that risks adding bloat on top of bloat. If we had that six line summary of the strategy, we would probably notice "ugh, we are repeating O(n) scans here, here, and here" or "we're rechecking type guarding blocks here, here, and here", etc., and then we'd be able to get people start rowing in the same direction in terms of agreeing on refactors.

jacobtylerwalls · 2024-09-30T19:33:43Z

@zenlyj 👋 might also have valuable input on variables checker refactors or on bug fix strategies for #8893. @zenlyj, any thoughts?

zenlyj · 2024-09-30T19:54:56Z

i was just about to submit a PR for #8893. at first glance, it seems like a regression from #8431, but i'll need to confirm

zenlyj · 2024-10-01T06:06:12Z

on the topic of refactoring, i agree that the variables checker could benefit from refactoring to improve code quality / readability.

i've made a few contributions to this checker in the past, but i am still not confident to say that i have a complete understanding of its internals. @nickdrozd's recent work on trying to kick things off and make changes in this area is laudable and a step in the right direction imo.

but to go even further and make things clean and easier to contribute, one would need to deep dive and have a good understanding of how things are working. that means going through and reviewing years of development, patches on top of patches and finally rewriting it in a way that makes better sense.

from my limited experience in working on this checker, i vaguely recall seeing undefined-variable and used-before-assignment checks having tight coupling, sometimes even interdependent. i am unsure of that is so, perhaps for performance reasons? but if these two checks can be split up and encapsulated into separate functions (without any cost in performance of course), it will help to reduce cognitive overload significantly.

nickdrozd added 3 commits September 29, 2024 11:19

Move static methods to free functions

01d0226

Shorten name of ConsumerAction

fb5ab45

Use some conditional expressions and inline some variables

6839c62

nickdrozd added the Skip news 🔇 This change does not require a changelog entry label Sep 29, 2024

jacobtylerwalls self-requested a review September 30, 2024 11:52

jacobtylerwalls requested changes Sep 30, 2024

View reviewed changes

Pierre-Sassoulas reviewed Sep 30, 2024

View reviewed changes

nickdrozd closed this Oct 1, 2024

nickdrozd deleted the cleanup-variables-checker branch October 7, 2024 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup variables checker 2 #9985

Cleanup variables checker 2 #9985

nickdrozd commented Sep 29, 2024

codecov bot commented Sep 29, 2024 •

edited

Loading

github-actions bot commented Sep 29, 2024

nickdrozd commented Sep 29, 2024

jacobtylerwalls left a comment

jacobtylerwalls Sep 30, 2024

jacobtylerwalls Sep 30, 2024

jacobtylerwalls Sep 30, 2024

jacobtylerwalls Sep 30, 2024

jacobtylerwalls Sep 30, 2024

Pierre-Sassoulas left a comment

jacobtylerwalls commented Sep 30, 2024

nickdrozd commented Sep 30, 2024

jacobtylerwalls commented Sep 30, 2024

jacobtylerwalls commented Sep 30, 2024

Pierre-Sassoulas commented Sep 30, 2024

nickdrozd commented Sep 30, 2024

jacobtylerwalls commented Sep 30, 2024

jacobtylerwalls commented Sep 30, 2024

nickdrozd commented Sep 30, 2024

Pierre-Sassoulas commented Sep 30, 2024

jacobtylerwalls commented Sep 30, 2024

jacobtylerwalls commented Sep 30, 2024

zenlyj commented Sep 30, 2024

zenlyj commented Oct 1, 2024

		and not self._in_lambda_or_comprehension_body(node, frame)
		and not _in_lambda_or_comprehension_body(node, frame)

		)


		def _is_only_type_assignment(

Cleanup variables checker 2 #9985

Cleanup variables checker 2 #9985

Conversation

nickdrozd commented Sep 29, 2024

Type of Changes

Description

codecov bot commented Sep 29, 2024 • edited Loading

Codecov Report

github-actions bot commented Sep 29, 2024

nickdrozd commented Sep 29, 2024

jacobtylerwalls left a comment

Choose a reason for hiding this comment

jacobtylerwalls Sep 30, 2024

Choose a reason for hiding this comment

jacobtylerwalls Sep 30, 2024

Choose a reason for hiding this comment

jacobtylerwalls Sep 30, 2024

Choose a reason for hiding this comment

jacobtylerwalls Sep 30, 2024

Choose a reason for hiding this comment

jacobtylerwalls Sep 30, 2024

Choose a reason for hiding this comment

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

jacobtylerwalls commented Sep 30, 2024

nickdrozd commented Sep 30, 2024

jacobtylerwalls commented Sep 30, 2024

jacobtylerwalls commented Sep 30, 2024

Pierre-Sassoulas commented Sep 30, 2024

nickdrozd commented Sep 30, 2024

jacobtylerwalls commented Sep 30, 2024

jacobtylerwalls commented Sep 30, 2024

nickdrozd commented Sep 30, 2024

Pierre-Sassoulas commented Sep 30, 2024

jacobtylerwalls commented Sep 30, 2024

jacobtylerwalls commented Sep 30, 2024

zenlyj commented Sep 30, 2024

zenlyj commented Oct 1, 2024

codecov bot commented Sep 29, 2024 •

edited

Loading