-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-126807: pygettext: Do not attempt to extract messages from function definitions. #126808
Conversation
Fixes a bug where pygettext would attempt to extract a message from a code like this: def _(x): pass This is because pygettext only looks at one token at a time and '_(x)' looks like a function call. However, since 'x' is not a string literal, it would erroneously issue a warning. This commit fixes that by keeping track of the previous token and checking if it's 'def' or 'class'.
Tools/i18n/pygettext.py
Outdated
@@ -1,11 +1,10 @@ | |||
#! /usr/bin/env python3 | |||
# -*- coding: iso-8859-1 -*- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no other file that uses this encoding, I think it's safe (and more practical) to use utf-8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not related change, so please keep the coding cookie.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! I'll revert :) Would you accept a separate (perhaps not backported) PR that removes the coding and the commented-out code or do you think it's not worth it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll accept it if there are pygettext tests for files with non-UTF-8 encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I'll add it to my todo list :)
Tools/i18n/pygettext.py
Outdated
if ( | ||
ttype == tokenize.NAME and tstring in opts.keywords | ||
and (not self.__prev_token or not _is_def_or_class_keyword(self.__prev_token)) | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new logic is, if we see one of the gettext keywords and the previous token is not def
or class
, only then we transition to __keywordseen
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that no warnings are emitted if option --docstrings
is used. I think that we can use a similar approach. We can add
if ttype == tokenize.NAME and tstring in ('class', 'def'):
self.__state = self.__ignorenext
return
where __ignorenext
simply sets self.__state = self.__waiting
.
Tools/i18n/pygettext.py
Outdated
@@ -1,11 +1,10 @@ | |||
#! /usr/bin/env python3 | |||
# -*- coding: iso-8859-1 -*- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not related change, so please keep the coding cookie.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 👍
Tools/i18n/pygettext.py
Outdated
@@ -1,11 +1,10 @@ | |||
#! /usr/bin/env python3 | |||
# -*- coding: iso-8859-1 -*- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll accept it if there are pygettext tests for files with non-UTF-8 encoding.
Thanks @tomasr8 for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13. |
…unction definitions. (pythonGH-126808) Fixes a bug where pygettext would attempt to extract a message from a code like this: def _(x): pass This is because pygettext only looks at one token at a time and '_(x)' looks like a function call. However, since 'x' is not a string literal, it would erroneously issue a warning. (cherry picked from commit 9a45638) Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
GH-126846 is a backport of this pull request to the 3.13 branch. |
…unction definitions. (pythonGH-126808) Fixes a bug where pygettext would attempt to extract a message from a code like this: def _(x): pass This is because pygettext only looks at one token at a time and '_(x)' looks like a function call. However, since 'x' is not a string literal, it would erroneously issue a warning. (cherry picked from commit 9a45638) Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
GH-126847 is a backport of this pull request to the 3.12 branch. |
…function definitions. (GH-126808) (GH-126847) Fixes a bug where pygettext would attempt to extract a message from a code like this: def _(x): pass This is because pygettext only looks at one token at a time and '_(x)' looks like a function call. However, since 'x' is not a string literal, it would erroneously issue a warning. (cherry picked from commit 9a45638) Co-authored-by: Tomas R <tomas.roun8@gmail.com>
…function definitions. (GH-126808) (GH-126846) Fixes a bug where pygettext would attempt to extract a message from a code like this: def _(x): pass This is because pygettext only looks at one token at a time and '_(x)' looks like a function call. However, since 'x' is not a string literal, it would erroneously issue a warning. (cherry picked from commit 9a45638) Co-authored-by: Tomas R <tomas.roun8@gmail.com>
Fixes a bug where pygettext would attempt to extract a message from a code like this:
This is because pygettext only looks at one token at a time and
_(x)
looks like a function call.However, since
x
is not a string literal, it would erroneously issue a warning.This PR fixes that by keeping track of the previous token and checking if it's
def
orclass
.