-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
W391: spurious warnings with python 3.12 beta #1142
Comments
yeah there's a quite annoying regression with the tokenizer in the latests version -- cc @pablogsal this is going to impact every user of pycodestyle, flake8, etc. |
If this is related to the fact that the last dedent now is reported on existing lines, this is now documented behaviour so this needs to be fixed in tools. Check https://docs.python.org/3.12/whatsnew/3.12.html#changes-in-the-python-api:
|
@pablogsal "needs to be fixed in tools" -- it would be nice if this didn't change without a good reason. the impact to tools is pretty widespread and this is going to be a pain in my ass for years of people not upgrading and misreporting this. same thing happened the last time cpython changed the end-of-file reporting in 3.7 and I still get pings about it the beta period is a good time to report regressions and ideally they'd be taken seriously |
Sorry Anthony, I am very sorry to hear that you are upset about this. Let me answer your points.
There is a good reason: we are using the C tokenizer now under the hood (the Python tokenizer based on regular expressions is gone now) to support PEP 701. Changing this in the C tokenizer is very very tricky because it is also used by the regular compilation pipeline and we certainly cannot alter that or the parser. We also think this is more correct behavior (but that's not the real reason we changed it). In any case, I don't want to enter a discussion around this here, I am just telling you that this is not a random decision.
Yes, but this is not an unexpected regression: this is documented behaviour now in the "porting to Python 3.12". Please, don't assume we are not taking things seriously because we don't agree. So unfortunately this needs to be fixed in tools as I mentioned. I am sorry that this is causing you a lot of trouble. |
Nevertheless, I will discuss this with the team in case we can devise an easy solution, just in case. |
CC: @lysnikolaou thoughts? |
@asottile opened python/cpython#104976 to track this |
Ok we fixed this upstream. Can someone confirm that this solves the problem? |
@pablogsal, the issue with W391 doesn't appear to be fixed :/ Running the flake8-pyi test suite with a fresh build of CPython 3.13 (python/cpython@949f0f5) currently results in every test failing due to spurious W391 warnings being emitted from pycodestyle on all of our test data. (We have instructions on how to run our tests here if you want to try to reproduce -- it's a pretty simple setup.) |
Then I don't know what's going on because after my fix the tokenizer emits exactly the same output as in 3.11 except for PEP701 related changes. If this is something in our side we would need a reproducer that doesn't use any 3rd party code just using the tokenize module. |
Check this out:
So we get the
So the output of the tokenizer is the same in that file for 3.11 and 3.12 so not sure where is the problem then. |
@pablogsal I'm seeing pretty different results running On Python 3.11:
On Python 3.13a0:
So that's a lot of NEWLINE tokens where the value was Possibly relevant: I'm running my tests on a Windows machine! |
Oh, that may be making all the difference! |
Well, then if there is anything wrong here in our side (which still is unclear) this may be something harder to fix as this is deep into the C tokenizer and certainly, we probably cannot change it because everything else relies on this :( I may try to give this a go, but meanwhile maybe @lysnikolaou has some spare cycles. |
This is what I get in Linux for 3.11:
|
Which is I think the same as you are getting for 3.12 in Windows. I would argue that the old windows version may be wrong because is emitting:
but this file has 8 lines! So that |
(FYI I just edited my comment above -- I managed to get the 3.11 and 3.13 outputs switched around 🤦♂️.) To clarify: on 3.11, the NEWLINE tokesn have So yes, looks like the output on Windows py312 now matches the output on Linux py311. Which is different to what the output was on Windows py311. |
Not really, no? We get some weird end Well, I bet the problem is this:
while in Linux we get only
|
Can you open a bug against CPython with this? |
Right, sorry, I should have checked more thoroughly; I just glanced at it. Thanks for the correction.
Sure. (By the way, if you hadn't noticed by now, I know next to nothing about the tokenizer! flake8-pyi works entirely by analyzing the AST rather than using the |
I've opened python/cpython#105017. |
If this is related to the DEDENT tokens (and some of the code and comments within In that case, a Python-version-agnostic fix was possible (see sphinx-doc/sphinx#11440), although that has been closed following the change-of-plan to retain the previous end-of-file dedent line numbering behaviour. |
The thing is that now we emit exactly the same tokens as in 3.11 in exactly the same situations when valid code is provided so I don't understand where the problem may be happening now as I can also see the failures on Linux |
Ok, yep, that's a puzzler. Does the The reason I ask: the smallest standalone test case I've found so far is this While narrowing in on that, I found that running the py311
py312
(in particular it seems odd that the Certainly not proof of anything yet, but it could be a clue; identical output from |
But that is basically what we give out so if the output is identical then the bug cannot be in the tokenizer. |
Possibly? I see two (in fact, three) scenarios:
OR
OR
Edit: update after Pablo found the true cause: these theories were off-base, although the second theory might seem similar to the true reason. Basically the tokens and their order in the stream was unchanged, but in Py3.11, the stream was produced from a chunked input, one line at a time, whereas in Py3.12, the stream was produced across the entire input. This caused some line-related logic in the |
(The new W391 errors definitely bisect to python/cpython@6715f91, so we know something changed in that commit to cause the new errors.) |
Yeah that I can believe but I don't understand what can it be if the tokens themselves are the same and come out in the same order. I mean these files where we get the errors are quite simple and we can analyse them easily so is not like there is unknown constructs that cause failures. |
I'm currently seeing how much of pycodestyle I can delete locally while still demonstrating a behaviour change between py311 and CPython |
Although the
The results of that are different between Py3.11 and Py3.12-dev. (for each Python version, they're the same on both Linux and Windows).
|
That's not possible or at least I cannot see how that can be possible. The output of python -m tokenize is shown in the order the tokens are emitted and I have been comparing the output with 3.11 using diff |
Nope, I should have checked more carefully: the ordering seems the same, but the |
The token-value changes seems like they should be safe since |
That's expected: you should use the constants in the tokenize module, not the numerical values themselves |
Ok I know what's going on after investigating the situation. The problem is that we now consume the whole buffer before calling the C tokenizer but Line 1968 in f6500e2
This incorrectly changes Line 250 in f6500e2
for the first NEWLINE token. This is very very unlikely to be fixed upstream because there is no guarantees over how we are going to consume the buffer and also we would need to change deeply the C tokenizer APIs, so I fear is something that needs to be changed in |
@pablogsal thank you so much for taking the time to look into this! |
@pablogsal nice find - that also explains why the grouping of the warnings output by This might be pedantic/annoying to mention, but I think it's better that I do and then am corrected than not mention it: the Py3.12-dev documentation does currently indicate that the |
Is not changing because it doesn't say how they are consumed. The contract is: you need to pass down a callable that emits one line at a time (at a time meaning every time is called) but note that it doesn't say that we will consume it one line at a time every time we emit a new token. There is absolutely no mention over how or when that callable will be called internally because that's an implementation detail. The only specification is that every time is called you need to give us a new line of input. So the contract is not changing so we don't need to change the documentation |
Ok, thank you - I had a sense that I might have misunderstood, and that makes things clearer 👍 (my understanding: the file-producer-side must emit a single line of code per read-call, but cannot assume that the tokenizer has completed emitting tokens from previous lines) |
Correct. I understand that the assumption is very natural because it makes it more efficient but note that that was never in the contract. |
Can someone try |
Very happy to confirm that the flake8-pyi test fully suite passes with python/cpython#105070! (Aka, all the pycodestyle errors I was seeing in our test suite are gone!) |
As I was about to report the issue, I'm happy to see that it is already reported, analyzed and fixed. Congratulations to everyone involved! What I do not quite understand though is why this did not get caught by the CI earlier. Would it be worth adding a:
in .github/workflows/main.yml ? |
we run a weekly build which is already enough CO2 |
Thanks for the information @asottile , I did not find this in the yml file. |
Ok, I think we can close this now after I merged python/cpython#105070 Someone somewhere owes me a metaphorical beer 😉 |
I just hit pycodestyle 2.10.0 hitting this with python 3.12.7. |
well of course if you're using a version that was released years before 3.12 |
It appears there's an incompatibility with pycodestyle and python 3.12.0b1 (latest beta version).
You can check by running this in a docker container, this creates a minimal example showing this warning:
Output is:
Same with the git version of pycodestyle. This does not happen with python 3.11.
The text was updated successfully, but these errors were encountered: