Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pygettext --docstrings doesn't actually extract module docstring due to tokenize returning ENCODING token #95731

Closed
Jackenmen opened this issue Aug 5, 2022 · 1 comment
Labels
type-bug An unexpected behavior, bug, or error

Comments

@Jackenmen
Copy link
Contributor

Bug report

When running pygettext --docstrings file.py on Python 3.7 and above, the module docstring does not get extracted.

Reproduction steps:

  1. Create repro.py with the following contents (actually you can omit everything but the first three lines):
"""
Module docstring
"""

class X:
    """class docstring"""

    def method(self):
        """method docstring"""


def function():
    """function docstring"""
  1. Try running: python pygettext.py --docstrings repro.py
  2. Look at the messages.pot that was created and see that it doesn't contain the module docstring:
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2022-08-06 00:54+0200\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: pygettext.py 1.5\n"


#: repro.py:6
#, docstring
msgid "class docstring"
msgstr ""

#: repro.py:9
#, docstring
msgid "method docstring"
msgstr ""

#: repro.py:13
#, docstring
msgid "function docstring"
msgstr ""

The reason for this appears to be that pygettext doesn't account for token.ENCODING which was added in Python 3.7.

A simple solution for this would be to skip tokenize.ENCODING here:

elif ttype not in (tokenize.COMMENT, tokenize.NL):
self.__freshmodule = 0
return

This actually reveals another bug which is caused by the return in the line 340 - detection of module docstring causes pygettext to swallow one token without handling it. This means that for a code like this:

class X:
    """class docstring"""

pygettext will not extract the docstring of class X once the solution gets applied if proper care isn't taken. I'm mentioning it so that the fix is tested with both of these cases.

Your environment

  • CPython versions tested on: 3.7.13 (installed from deadsnakes ppa), 3.10.4 (default Python on my system)
  • Operating system and architecture: Ubuntu 22.04 LTS
    The pygettext.py script was taken directly from this repository, I'm not sure that my distro even has a package that ships it.
@Jackenmen Jackenmen added the type-bug An unexpected behavior, bug, or error label Aug 5, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 15, 2022
…H-95732)

(cherry picked from commit 120b4ab)

Co-authored-by: Jakub Kuczys <me@jacken.men>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 15, 2022
…H-95732)

(cherry picked from commit 120b4ab)

Co-authored-by: Jakub Kuczys <me@jacken.men>
miss-islington added a commit that referenced this issue Oct 15, 2022
(cherry picked from commit 120b4ab)

Co-authored-by: Jakub Kuczys <me@jacken.men>
JelleZijlstra pushed a commit that referenced this issue Oct 16, 2022
…) (#98281)

gh-95731: Fix module docstring extraction in pygettext (GH-95732)
(cherry picked from commit 120b4ab)

Co-authored-by: Jakub Kuczys <me@jacken.men>
@Jackenmen
Copy link
Contributor Author

Now that backports are merged, I think this can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

1 participant