Skip to content

gh-94823: Improve coverage in tokenizer.c:valid_utf8 #94856

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 16, 2022

Conversation

mdboom
Copy link
Contributor

@mdboom mdboom commented Jul 14, 2022

When loading a source file from disk, there is a separate UTF-8 validator
distinct from the one in unicode_decode_utf8. This exercises that code path
with the same set of invalid inputs as we use for testing the "other" UTF-8
decoder.

Automerge-Triggered-By: GH:ericsnowcurrently

@bedevere-bot bedevere-bot added tests Tests in the Lib/test dir awaiting review labels Jul 14, 2022
@brandtbucher brandtbucher added needs backport to 3.11 only security fixes labels Jul 15, 2022
When loading a source file from disk, there is a separate UTF-8 validator
distinct from the one in `unicode_decode_utf8`. This exercises that code path
with the same set of invalid inputs as we use for testing the "other" UTF-8
decoder.
@mdboom mdboom force-pushed the tokenizer-valid-utf8-coverage branch from 6db7d84 to e52f328 Compare July 15, 2022 18:11
Copy link
Member

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! It mostly looks good. I noticed a couple of small typos.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@mdboom
Copy link
Contributor Author

mdboom commented Aug 16, 2022

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@ericsnowcurrently: please review the changes made to this pull request.

Copy link
Member

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@miss-islington
Copy link
Contributor

Thanks @mdboom for the PR 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Aug 16, 2022
…94856)

When loading a source file from disk, there is a separate UTF-8 validator
distinct from the one in `unicode_decode_utf8`. This exercises that code path
with the same set of invalid inputs as we use for testing the "other" UTF-8
decoder.
(cherry picked from commit f215d7c)

Co-authored-by: Michael Droettboom <mdboom@gmail.com>
@bedevere-bot bedevere-bot removed the needs backport to 3.11 only security fixes label Aug 16, 2022
@bedevere-bot
Copy link

GH-96029 is a backport of this pull request to the 3.11 branch.

@ericsnowcurrently
Copy link
Member

Thanks for the test!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip news tests Tests in the Lib/test dir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants