-
-
Notifications
You must be signed in to change notification settings - Fork 33.1k
gh-63161: Fix tokenize detect_encoding() for non-ASCII coding #139235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test for the coded cookie on the second line (and non-ascii first line).
Also add a test with specified ASCII encoding, but non-ASCII content that can still be decoded as UTF-8. E.g. '#coding=ascii €'.encode('utf-8')
and corresponding for two lines.
@serhiy-storchaka: I added more tests, please review the updated PR. Is it what you wanted? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for update. In two-line cases please use non-ASCII data in the first line, before the codec cookie. Test that the tokenizer uses correct encoding to decode comments in first lines.
It may be already tested elsewhere, but I would also add tests for non-ASCII data in the first and in the second comment lines, when no codec cookie is present (so UTF-8 should be used). For valid and invalid UTF-8.
I expect that the tokenizer correctly decodes files that match the explicit or implicit encoding, and reject files that do not match. And the interpreter should work the same.
Ok, I added more tests. Please review the updated PR. |
@serhiy-storchaka: It seems like you're working on the same area these days and you have more advanced fix. I can abandon this PR, no? |
Agree. Sorry, but I already had tests for the core interpreter and the model of how it should work. I only needed to beat the code until it started to pass the tests. |
Uh oh!
There was an error while loading. Please reload this page.