extract: Wrong lineno when arguments are spread over multiple lines and contain a function call #1123

oomsveta · 2024-09-06T15:44:15Z

Overview

Consider this piece of code:

_(
    "hello",
    dummy_fn_call()
)

Extracting it yields the following result:

(3, 'hello', [], None)

As you may have noticed, the lineno (the first component of the tuple) is wrong. It should be either 1 or 2 (depending on whether you expect the lineno of gettext or the lineno of the value), but not 3.

This is because the code setting message_lineno is executed every time an opening parenthesis is encountered and funcname is set:

babel/babel/messages/extract.py

Lines 526 to 533 in f91754b

    
           elif tok == OP and value == '(': 
        
               if in_def: 
        
                   # Avoid false positives for declarations such as: 
        
                   # def gettext(arg='message'): 
        
                   in_def = False 
        
                   continue 
        
               if funcname: 
        
                   message_lineno = lineno

while funcname is only changed if the name is in keywords (i.e., if the name is one of the gettext names):

babel/babel/messages/extract.py

Lines 615 to 616 in f91754b

    
           elif tok == NAME and value in keywords: 
        
               funcname = value

So, in the case where you have a gettext call -setting funcname- with another function call inside its arguments -resulting in an opening parenthesis-, you meet the two conditions to change the value of lineno for something that might be wrong, for example, if you spread the arguments to gettext over multiple lines

Steps to Reproduce

Copy-paste this to your python REPL:

from io import BytesIO
from babel.messages.extract import extract

file = b"""_(
    "hello",
    dummy_fn_call()
)"""

list(extract("python", BytesIO(file)))

Actual Results

[(3, 'hello', [], None)]

Expected Results

[(1, 'hello', [], None)]

or maybe

[(2, 'hello', [], None)]

depending on whether you expect the lineno of gettext or the lineno of the value

Additional Information

The first and last test cases of test_comments_with_calls_that_spawn_multiple_lines should fail, assuming you decide to consider the lineno to be the lineno of the gettext call. It currently fails if you replace the call to len in the test cases with something that isn't a function call

The text was updated successfully, but these errors were encountered:

oomsveta · 2024-09-06T15:55:25Z

I also wrote a quick-and-dirty fix:

diff --git a/babel/messages/extract.py b/babel/messages/extract.py
index 48573ee..182a74b 100644
--- a/babel/messages/extract.py
+++ b/babel/messages/extract.py
@@ -502,6 +502,7 @@ def extract_python(
     :param options: a dictionary of additional options (optional)
     :rtype: ``iterator``
     """
+    last_name_token = ""
     funcname = lineno = message_lineno = None
     stack_depth = -1
     buf = []
@@ -521,6 +522,8 @@ def extract_python(
     current_fstring_start = None
 
     for tok, value, (lineno, _), _, _ in tokens:
+        if tok == NAME:
+            last_name_token = value
         if stack_depth == -1 and tok == NAME and value in ('def', 'class'):
             in_def = True
         elif tok == OP and value == '(':
@@ -530,7 +533,8 @@ def extract_python(
                 in_def = False
                 continue
             if funcname:
-                message_lineno = lineno
+                if last_name_token in keywords:
+                    message_lineno = lineno
                 stack_depth += 1
         elif in_def and tok == OP and value == ':':
             # End of a class definition without parens
diff --git a/tests/messages/test_extract.py b/tests/messages/test_extract.py
index 7d3a05a..852cdea 100644
--- a/tests/messages/test_extract.py
+++ b/tests/messages/test_extract.py
@@ -97,10 +97,10 @@ add_notice(req, ngettext("Bar deleted.",
         messages = list(extract.extract_python(buf, ('ngettext', '_'), ['NOTE:'],
 
                                                {'strip_comment_tags': False}))
-        assert messages[0] == (3, 'ngettext', ('Catalog deleted.', 'Catalogs deleted.', None), ['NOTE: This Comment SHOULD Be Extracted'])
+        assert messages[0] == (2, 'ngettext', ('Catalog deleted.', 'Catalogs deleted.', None), ['NOTE: This Comment SHOULD Be Extracted'])
         assert messages[1] == (6, '_', 'Locale deleted.', ['NOTE: This Comment SHOULD Be Extracted'])
         assert messages[2] == (10, 'ngettext', ('Foo deleted.', 'Foos deleted.', None), ['NOTE: This Comment SHOULD Be Extracted'])
-        assert messages[3] == (15, 'ngettext', ('Bar deleted.', 'Bars deleted.', None), ['NOTE: This Comment SHOULD Be Extracted', 'NOTE: And This One Too'])
+        assert messages[3] == (14, 'ngettext', ('Bar deleted.', 'Bars deleted.', None), ['NOTE: This Comment SHOULD Be Extracted', 'NOTE: And This One Too'])
 
     def test_declarations(self):
         buf = BytesIO(b"""\

tomasr8 · 2024-09-06T21:24:18Z

Thanks for the investigation!

Just for reference, GNU's gettext sets the line number to 2:

> xgettext hello.py
> cat messages.po 
...

#: line.py:2
msgid "hello"
msgstr ""

I think it makes sense to try to be consistent with xgettext here, i.e. use lineno=2. Would you like to open a PR?

oomsveta · 2024-09-12T11:27:17Z

I think it makes sense to try to be consistent with xgettext here, i.e. use lineno=2.
@tomasr8

Agreed, and in addition to being consistent with xgettext, it makes more sense to use the number of the line with the actual message.
It might be a bit tricky to implement, though: I imagine it'll require pairing each message with its lineno, which would change the current API and break the custom extract methods. What do you think?

Would you like to open a PR?

I don't mind giving it a try, but I'm not sure I'll be able to free up enough time to look into it at the moment

tomasr8 · 2024-09-13T14:26:20Z

I imagine it'll require pairing each message with its lineno, which would change the current API and break the custom extract methods.

How would this break custom extractors?

When a nested function call was inside a gettext call, but on a separate line, the message got extracted with the line number of that function call. We updated the line number whenever we encountered an opening parenthesis after a gettext function name. This commit fixes that by only updating the line number on the first argument inside a gettext call. Compared with `xgettext`, some test cases were thus wrong and corrected in this commit. Fixes python-babel#1123

When a gettext call had a nested function call on a new line, the extract function would use that nested call's line number when extracting the terms for the gettext call. The reason is that we set the line number on any encounter of an opening parenthesis after a gettext keyword. This does not work if either we have a nested call, or our first term starts on a new line. This commit fixes that by only setting the line number when we encounter the first argument inside a gettext call. Existing tests were adapted to work according to `xgettext` with regards to the line numbers. Fixes python-babel#1123

Currently the Python extractor does not support deeply nested gettext calls (deeper than as a direct argument to the top-level gettext call). e.g. ```py _("Hello %s", _("Person")) _("Hello %s", random_function(", ".join([_("Person 1"), _("Person 2")]))) ``` The extraction code was refactored quite a bit to simplify the flow and support this use-case. Fixes python-babel#1125 (meanwhile also fixes python-babel#1123)

When a gettext call had a nested function call on a new line, the extract function would use that nested call's line number when extracting the terms for the gettext call. The reason is that we set the line number on any encounter of an opening parenthesis after a gettext keyword. This does not work if either we have a nested call, or our first term starts on a new line. This commit fixes that by only setting the line number when we encounter the first argument inside a gettext call. Existing tests were adapted to work according to `xgettext` with regards to the line numbers. Fixes python-babel#1123

oomsveta mentioned this issue Sep 12, 2024

extract: "Deeply nested" gettext calls aren't extracted #1125

Open

dylankiss mentioned this issue Sep 18, 2024

Fix extracted lineno with nested calls #1126

Merged

dylankiss mentioned this issue Sep 19, 2024

Allow extracting (deeply) nested calls in Python and Javascript #1127

Open

akx closed this as completed in #1126 Oct 19, 2024

akx closed this as completed in bef3a73 Oct 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extract: Wrong lineno when arguments are spread over multiple lines and contain a function call #1123

extract: Wrong lineno when arguments are spread over multiple lines and contain a function call #1123

oomsveta commented Sep 6, 2024 •

edited

Loading

oomsveta commented Sep 6, 2024

tomasr8 commented Sep 6, 2024 •

edited

Loading

oomsveta commented Sep 12, 2024

tomasr8 commented Sep 13, 2024

extract: Wrong lineno when arguments are spread over multiple lines and contain a function call #1123

extract: Wrong lineno when arguments are spread over multiple lines and contain a function call #1123

Comments

oomsveta commented Sep 6, 2024 • edited Loading

Overview

Steps to Reproduce

Actual Results

Expected Results

Additional Information

oomsveta commented Sep 6, 2024

tomasr8 commented Sep 6, 2024 • edited Loading

oomsveta commented Sep 12, 2024

tomasr8 commented Sep 13, 2024

oomsveta commented Sep 6, 2024 •

edited

Loading

tomasr8 commented Sep 6, 2024 •

edited

Loading