-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup variables checker 2 #9985
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9985 +/- ##
==========================================
- Coverage 95.80% 95.79% -0.02%
==========================================
Files 174 174
Lines 18934 18870 -64
==========================================
- Hits 18140 18076 -64
Misses 794 794
|
π€ According to the primer, this change has no effect on the checked open source code. π€π This comment was generated for commit 6839c62 |
Some of the changed lines are not covered. But I think those lines were already not covered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with extracting pure functions, might be wise to get a +1 from someone else too on that one.
I would prefer not to add more multiline ternaries.
@@ -73,7 +73,11 @@ | |||
) | |||
|
|||
|
|||
class VariableVisitConsumerAction(Enum): | |||
def register(linter: PyLinter) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other checkers seem to all place register()
at the end of the file, so I suggest keeping that pattern here.
name | ||
if import_module_name == "*" | ||
else | ||
# Most likely something like 'xml.etree', | ||
# which will appear in the .locals as 'xml'. | ||
# Only pick the name if it wasn't consumed. | ||
( | ||
import_module_name | ||
if ( | ||
import_module_name.startswith(name) | ||
and import_module_name.find(".") > -1 | ||
) | ||
or name in imports | ||
else None | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to avoid adding multi-line ternaries. I find this refactor worse than the original form. Some of the shorter ones are at best a wash.
confidence=( | ||
INFERENCE | ||
if node.name in current_consumer.names_under_always_false_test | ||
else ( | ||
CONTROL_FLOW | ||
if node.name in current_consumer.consumed_uncertain | ||
else HIGH | ||
) | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one would probably be a wash for me, were it not for the fact that now we lose resolution on coverage stats because this is now a single statement.
and not self._in_lambda_or_comprehension_body(node, frame) | ||
and not _in_lambda_or_comprehension_body(node, frame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of change is okay with me: the VariablesChecker and NamesConsumer are long enough that having all these methods in a closure makes it harder to see what they do. But let's get a +1 from someone else.
) | ||
|
||
|
||
def _is_only_type_assignment( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go through with this, let's add this commit to the git ignore revs list, and we'll be sure to merge it with a merge commit to avoid rewriting the hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your enthusiasm and your work on pylint, Nick.
I was waiting for Jacob's opinion on this. If he thinks the result is better, then let's merge.
My first instinct is that I don't see a lot of value, we're moving code around but it's not reducing the complexity overall. Some changes are opinionated (ternary even when it's on dozen of lines). I don't think favoring ternaries was discussed beforehand (the ternary checker is an extension). Last time I reviewed 1000+ line was for the unicode checker, which was adding a massive new feature. Here we basically have a style MR. Reviewing so much changes in details is very hard. I have to trust that the code is moved without modifications, which would allow for underhanded changes if we're in a xz scenario. The git blame is also going to be harder to read and investigate.
I'd rather focus on bug fixes / features / simplifications that reduce complexity intrinsically, than on pure style refactor without necessity / consensus.
+1 I agree with Pierre: I had to round up to get to -0 on the staticmethod refactors, so I'm really more of a -0.5 (but -1 on the ternaries). I think the staticmethods are not what makes the checker hard to work on: it's the fact that the overall strategy is opaque. There is no documentation anywhere about what the overall strategy is. It's like solving an escape room where the themes of the clues keep changing. |
I heard all this the last time I tried making changes. Too hard to review, confuses the Git history, too much churn, need consensus, just style change, etc. So I stopped trying to make changes. What happened since then? Pretty much nothing. There are still a bunch of I will tell you what I think is the problem: the code is way too complicated. Nobody knows how it works and everybody is afraid to touch it. "Small, targeted changes" only serve to exacerbate this problem, because it just means adding a little bit here and there without clarifying the overall picture. Look at what lints are disabled in this file:
These are flashing red alarms warning that the complexity is out of control. By the way, in the course of working on this MR, I found an undocumented side effect that affects a warning from a totally different checker. That's not good! Anyway, in the short term, I would like to get #9964 finished. And that in turn depends on #8893 (at least). So is there a plan to fix #8893 in a timely fashion? And more broadly, is there a plan / timeline to get Pylint and Astroid correctly typed and compiled with Mypyc? |
I don't think anyone is defending the complexity of the variables checker. I'm hearing skepticism that moving methods out of a closure meaningfully reduces that complexity.
If you want help fixing #8893, you are welcome to ask for advice, but suggesting that we are not doing things in a timely fashion doesn't promote volunteerism. And no, there are no fixed timelines, but as of today both astroid and pylint are cut over to 4.0-dev so we can start considering breaking API changes that might be necessary to make them type-correct. We have a number of new contributors that are taking an interest in astroid's internals, and I find that welcome and invigorating. If you have ideas about further variables checker refactors, you're welcome to open issues and sound them out. |
Thanks for finding this. Can you call this out, ideally opening an issue, to increase its visibility? Thanks! |
It's not impossible to merge a refactor in pylint. But it should be consensual, or discussed beforehand, or reasonable and small. Here's an example of one that took 2mn to be reviewed, 10mn to be merged : #4872 (many such examples in the history). If you open a PR that remove 200 lines of crap and the tests are still green, it's going to be merged before dawn. Your intention are laudable, but I feel you're pushing for opinionated changes that do not make the situation better and I feel like we're wasting time bikesheding around discussing ternaries, class function that could be staticmethod or independent functions, or typing import guards. Actually making the variable checker easier to develop on is going to be HARD. No one feels productive by reading code for 2 hours straight in a style they don't like then discussing with the stakeholder for a month asynchronously, but this is the way. Moving code around and opening style MR is not going to cut it. Jacob is the one with the more experience in this part of the code, he's going to know what the pain points are and the possible way to make this better or where to investigate. Did you try to discuss the design with him beforehand ? |
There are a thousand and one changes that need to be made to simplify things. It's easy to point to each individual change and say that it doesn't improve anything. But in aggregate the small changes add up. And further, each change facilitates further changes. For example, if a variable is defined in one place instead of four, it's much easier to move that definition to a different place.
The usual pattern in open source is that some pushy user comes in demanding changes. "When can we expect such and such to be addressed?" The extremely reasonable maintainer response is "Nobody has worked on it yet, but you are welcome to." That is very much not what is happening here. Instead, I have been repeatedly told no no no to all sorts of different changes. So given that my volunteer efforts have been turned down over and over for years, what choice do I have but to switch to demanding-user mode? I have a plan to fix things. But my plan is not accepted. Okay, fine, so what is the alternative? And why is it taking so long? It's not like I'm looking for problems where there are none. If these issues were fixed already, we wouldn't be having this conversation. I'm more than happy to do the work myself. And if someone else wants to do it, that's great too. But the work isn't getting done, and that makes me not so happy.
Yes, I certainly agree with that π Let's talk about these static methods. Why are they static methods instead of free functions? Other than status quo bias, what is the advantage of using static methods? Here are the downsides: it adds extra indentation and it requires more text to make calls. What are the upsides?
He has described the checker as "forbidding". That is not a good sign.
Okay, is there a different approach you would prefer? I've been using atomic commits that make clear specific changes. Would more you prefer more dramatic changes done all at once? I would be happy to do that too. |
What exactly is not a good sign? The status quo of the checker, or my comments? |
I gave a version of that response on #8893 earlier today. |
Sorry, I meant the status quo is not good. Your assessment of the checker as "forbidding" and "opaque" is totally accurate and well-informed! And for exactly that reason, I think the cautious conservative approach is not going to be successful. A big overhaul will be required, and the first step to that is clearing out years of cruft. Once that is done, it will be a lot easier to have a fruitful design discussion, because it will be possible to see what's currently going on. |
I'm sorry can you tell us more about your plan ? Because all I see in this MR is a change of style that is absolutely impossible to review in a reasonable time in the current format. I don't think it's necessary at all to fix anything. It's also not something that is making the code better in a meaningful way, and not a bug fix of the numerous existing false positives / false negatives. To quote Jacob:
So what would your new improved strategy be like ? Feel free to start the checker from scratch and provide a better implementation, it's going to be WAY WAY more efficient than changing the coding style of the whole checker to your taste first. |
At the risk of veering a little off-topic, my prescription for the variables checker would be for somebody to write a killer docstring that summarizes (in a bulleted list of about half a dozen lines of pseudocode) what it actually does. Right now when PRs come in for the variables checker, I do my best to say "please don't add this special case for walrus/if/else/except over here, please add it over there where we happen to handle very similar cases", but as we all know, that risks adding bloat on top of bloat. If we had that six line summary of the strategy, we would probably notice "ugh, we are repeating O(n) scans here, here, and here" or "we're rechecking type guarding blocks here, here, and here", etc., and then we'd be able to get people start rowing in the same direction in terms of agreeing on refactors. |
on the topic of refactoring, i agree that the variables checker could benefit from refactoring to improve code quality / readability. i've made a few contributions to this checker in the past, but i am still not confident to say that i have a complete understanding of its internals. @nickdrozd's recent work on trying to kick things off and make changes in this area is laudable and a step in the right direction imo. but to go even further and make things clean and easier to contribute, one would need to deep dive and have a good understanding of how things are working. that means going through and reviewing years of development, patches on top of patches and finally rewriting it in a way that makes better sense. from my limited experience in working on this checker, i vaguely recall seeing |
Type of Changes
Description
Continuing #9978
Variables checker code is difficult to read because the lines are too long. One easy way to shorten lines is to turn static methods into free functions. This saves a lot of indentation and makes the functions easier to call. Also shorten the name of
VariableVisitConsumerAction
, because that is 27 characters long.Other problems that make the code hard to read are that there are too many statements and variables are defined in too many places. Things can be consolidated by using ternary expressions and walrus operator. Some variables can be done away with entirely.
This MR should make no functional changes, except that in a few cases some variables are defined later. This saves a tiny bit of time.