gh-126807: pygettext: Do not attempt to extract messages from function definitions. #126808

tomasr8 · 2024-11-13T21:35:56Z

Fixes a bug where pygettext would attempt to extract a message from a code like this:

def _(x): pass

This is because pygettext only looks at one token at a time and _(x) looks like a function call.

However, since x is not a string literal, it would erroneously issue a warning.

This PR fixes that by keeping track of the previous token and checking if it's def or class.

Issue: pygettext: false positives when extracting messages #126807

Fixes a bug where pygettext would attempt to extract a message from a code like this: def _(x): pass This is because pygettext only looks at one token at a time and '_(x)' looks like a function call. However, since 'x' is not a string literal, it would erroneously issue a warning. This commit fixes that by keeping track of the previous token and checking if it's 'def' or 'class'.

Lib/test/test_tools/test_i18n.py

tomasr8 · 2024-11-13T21:38:18Z

Tools/i18n/pygettext.py

@@ -1,11 +1,10 @@
 #! /usr/bin/env python3
-# -*- coding: iso-8859-1 -*-


There's no other file that uses this encoding, I think it's safe (and more practical) to use utf-8.

This is not related change, so please keep the coding cookie.

Got it! I'll revert :) Would you accept a separate (perhaps not backported) PR that removes the coding and the commented-out code or do you think it's not worth it?

I'll accept it if there are pygettext tests for files with non-UTF-8 encoding.

Fair enough, I'll add it to my todo list :)

tomasr8 · 2024-11-13T21:39:42Z

Tools/i18n/pygettext.py

+        if (
+            ttype == tokenize.NAME and tstring in opts.keywords
+            and (not self.__prev_token or not _is_def_or_class_keyword(self.__prev_token))
+        ):


The new logic is, if we see one of the gettext keywords and the previous token is not def or class, only then we transition to __keywordseen.

serhiy-storchaka

Note that no warnings are emitted if option --docstrings is used. I think that we can use a similar approach. We can add

            if ttype == tokenize.NAME and tstring in ('class', 'def'):
                self.__state = self.__ignorenext
                return

where __ignorenext simply sets self.__state = self.__waiting.

serhiy-storchaka · 2024-11-13T23:02:00Z

Tools/i18n/pygettext.py

@@ -1,11 +1,10 @@
 #! /usr/bin/env python3
-# -*- coding: iso-8859-1 -*-


This is not related change, so please keep the coding cookie.

Tools/i18n/pygettext.py

Lib/test/test_tools/test_i18n.py

serhiy-storchaka

LGTM. 👍

serhiy-storchaka · 2024-11-14T21:44:18Z

Tools/i18n/pygettext.py

@@ -1,11 +1,10 @@
 #! /usr/bin/env python3
-# -*- coding: iso-8859-1 -*-


I'll accept it if there are pygettext tests for files with non-UTF-8 encoding.

miss-islington-app · 2024-11-14T22:17:45Z

Thanks @tomasr8 for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

…unction definitions. (pythonGH-126808) Fixes a bug where pygettext would attempt to extract a message from a code like this: def _(x): pass This is because pygettext only looks at one token at a time and '_(x)' looks like a function call. However, since 'x' is not a string literal, it would erroneously issue a warning. (cherry picked from commit 9a45638) Co-authored-by: Tomas R. <tomas.roun8@gmail.com>

bedevere-app · 2024-11-14T22:17:53Z

GH-126846 is a backport of this pull request to the 3.13 branch.

…unction definitions. (pythonGH-126808) Fixes a bug where pygettext would attempt to extract a message from a code like this: def _(x): pass This is because pygettext only looks at one token at a time and '_(x)' looks like a function call. However, since 'x' is not a string literal, it would erroneously issue a warning. (cherry picked from commit 9a45638) Co-authored-by: Tomas R. <tomas.roun8@gmail.com>

bedevere-app · 2024-11-14T22:17:59Z

GH-126847 is a backport of this pull request to the 3.12 branch.

…function definitions. (GH-126808) (GH-126847) Fixes a bug where pygettext would attempt to extract a message from a code like this: def _(x): pass This is because pygettext only looks at one token at a time and '_(x)' looks like a function call. However, since 'x' is not a string literal, it would erroneously issue a warning. (cherry picked from commit 9a45638) Co-authored-by: Tomas R <tomas.roun8@gmail.com>

…function definitions. (GH-126808) (GH-126846) Fixes a bug where pygettext would attempt to extract a message from a code like this: def _(x): pass This is because pygettext only looks at one token at a time and '_(x)' looks like a function call. However, since 'x' is not a string literal, it would erroneously issue a warning. (cherry picked from commit 9a45638) Co-authored-by: Tomas R <tomas.roun8@gmail.com>

…unction definitions. (pythonGH-126808) Fixes a bug where pygettext would attempt to extract a message from a code like this: def _(x): pass This is because pygettext only looks at one token at a time and '_(x)' looks like a function call. However, since 'x' is not a string literal, it would erroneously issue a warning.

tomasr8 added 3 commits November 13, 2024 22:27

Add news entry

c8d1538

Remove 'coding:' directive

6cc0833

bedevere-app bot added the awaiting review label Nov 13, 2024

bedevere-app bot mentioned this pull request Nov 13, 2024

pygettext: false positives when extracting messages #126807

Closed

tomasr8 commented Nov 13, 2024

View reviewed changes

tomasr8 requested a review from serhiy-storchaka November 13, 2024 21:40

tomasr8 added needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Nov 13, 2024

serhiy-storchaka reviewed Nov 13, 2024

View reviewed changes

tomasr8 added 5 commits November 14, 2024 21:54

Simplify test

6e0cd50

Revert unrelated changes

249db28

Use an extra state instead of prev_token

e563331

Remove unused function

7b37aa2

Fix character encoding

fa0772e

serhiy-storchaka approved these changes Nov 14, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Nov 14, 2024

serhiy-storchaka enabled auto-merge (squash) November 14, 2024 21:47

serhiy-storchaka merged commit 9a45638 into python:main Nov 14, 2024
40 checks passed

bedevere-app bot removed the awaiting merge label Nov 14, 2024

bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Nov 14, 2024

bedevere-app bot removed the needs backport to 3.12 bug and security fixes label Nov 14, 2024

tomasr8 deleted the pygettext-126807 branch November 14, 2024 22:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-126807: pygettext: Do not attempt to extract messages from function definitions. #126808

gh-126807: pygettext: Do not attempt to extract messages from function definitions. #126808

tomasr8 commented Nov 13, 2024 •

edited by bedevere-app bot

Loading

tomasr8 Nov 13, 2024

serhiy-storchaka Nov 13, 2024

tomasr8 Nov 14, 2024

serhiy-storchaka Nov 14, 2024

tomasr8 Nov 14, 2024

tomasr8 Nov 13, 2024

serhiy-storchaka left a comment

serhiy-storchaka Nov 13, 2024

serhiy-storchaka left a comment

serhiy-storchaka Nov 14, 2024

miss-islington-app bot commented Nov 14, 2024

bedevere-app bot commented Nov 14, 2024

bedevere-app bot commented Nov 14, 2024

		@@ -1,11 +1,10 @@
		#! /usr/bin/env python3
		# -- coding: iso-8859-1 --

gh-126807: pygettext: Do not attempt to extract messages from function definitions. #126808

gh-126807: pygettext: Do not attempt to extract messages from function definitions. #126808

Conversation

tomasr8 commented Nov 13, 2024 • edited by bedevere-app bot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miss-islington-app bot commented Nov 14, 2024

bedevere-app bot commented Nov 14, 2024

bedevere-app bot commented Nov 14, 2024

tomasr8 commented Nov 13, 2024 •

edited by bedevere-app bot

Loading