Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pygettext: use an AST parser instead of a tokenizer #104400

Closed
tomasr8 opened this issue May 11, 2023 · 7 comments
Closed

pygettext: use an AST parser instead of a tokenizer #104400

tomasr8 opened this issue May 11, 2023 · 7 comments
Labels
triaged The issue has been accepted as valid by a triager. type-feature A feature request or enhancement

Comments

@tomasr8
Copy link
Member

tomasr8 commented May 11, 2023

Follow up on this forum discussion

This is a part 1/X of improving pygettext. Replacing the tokenizer that powers the message extraction with a parser will simplify the code (no more counting brackets and f-string madness) and make it much easier to extend it with new features later down the road.

This change should also come with a healthy dose of new tests to verify the implementation.

PR coming shortly ;)

Linked PRs

@terryjreedy
Copy link
Member

We replaced the custom parser in pyclbr with an ast visitor a couple of years ago. It was shorter and clearer and agreed to be a definite improvement. For this type of application, any possible slowdown is irrelevant.

@iritkatriel iritkatriel added the extension-modules C modules in the Modules dir label Dec 1, 2023
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 3, 2024
(cherry picked from commit dcae5cd)

Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 3, 2024
(cherry picked from commit dcae5cd)

Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
serhiy-storchaka pushed a commit that referenced this issue Nov 3, 2024
(cherry picked from commit dcae5cd)

Co-authored-by: Tomas R <tomas.roun8@gmail.com>
serhiy-storchaka pushed a commit that referenced this issue Nov 3, 2024
(cherry picked from commit dcae5cd)

Co-authored-by: Tomas R <tomas.roun8@gmail.com>
picnixz pushed a commit to picnixz/cpython that referenced this issue Dec 8, 2024
ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025
AA-Turner added a commit that referenced this issue Feb 2, 2025
The ``fintl`` module is never installed or tested, meaning that the
fallback identity function is unconditionally used for ``_()``.
This means we can simplify, converting the docstring to a real
docstring, and converting some other strings to f-strings.

We also convert the module to UTF-8, sort imports,
and remove the history comment, which was last updated in 2002.
Consult the git history for a more accurate summary of changes.
AA-Turner pushed a commit that referenced this issue Feb 4, 2025
#129672)

* Update the module docstring
* Move ``key_for`` inside the class
* Move ``write_pot_file`` outside the class
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this issue Feb 7, 2025
The ``fintl`` module is never installed or tested, meaning that the
fallback identity function is unconditionally used for ``_()``.
This means we can simplify, converting the docstring to a real
docstring, and converting some other strings to f-strings.

We also convert the module to UTF-8, sort imports,
and remove the history comment, which was last updated in 2002.
Consult the git history for a more accurate summary of changes.
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this issue Feb 7, 2025
…Visitor (python#129672)

* Update the module docstring
* Move ``key_for`` inside the class
* Move ``write_pot_file`` outside the class
cmaloney pushed a commit to cmaloney/cpython that referenced this issue Feb 8, 2025
…Visitor (python#129672)

* Update the module docstring
* Move ``key_for`` inside the class
* Move ``write_pot_file`` outside the class
@picnixz picnixz added triaged The issue has been accepted as valid by a triager. and removed type-feature A feature request or enhancement extension-modules C modules in the Modules dir triaged The issue has been accepted as valid by a triager. labels Feb 10, 2025
@picnixz picnixz added the type-feature A feature request or enhancement label Feb 10, 2025
serhiy-storchaka pushed a commit that referenced this issue Feb 11, 2025
…4402)

This greatly simplifies the code and fixes many corner cases.
@serhiy-storchaka
Copy link
Member

Thanks, @tomasr8. The AST based implementation is much clearer.

The drawback is that it prevents implementing the --add-comments option. I think that the right solution of this is to add optional support of comments in AST -- it could also be useful in other applications. But this is a different issue.

Is it all, or you planned other work on this issue?

@AA-Turner
Copy link
Member

Having comments in the AST would also be useful for applications like Sphinx -- is there an open issue to track this, or should we create one

A

@tomasr8
Copy link
Member Author

tomasr8 commented Feb 11, 2025

The drawback is that it prevents implementing the --add-comments option. I think that the right solution of this is to add optional support of comments in AST -- it could also be useful in other applications. But this is a different issue.

Yes, having the comments in the AST would be great - I can look into that and see if there are any issues open about it!

There is actually a way to add translator comments even with the AST. You can run the tokenizer separately and collect all comments and their corresponding line numbers. You can then match the comments to gettext calls. I had a working proof of concept implementation of this for babel but I can repurpose it for pygettext!

Is it all, or you planned other work on this issue?

Not specifically in this issue, I think we can close it. I have lots of other things I'd like to improve but I think it's better to open separate issues for that (translator comments being one of those)

@tomasr8
Copy link
Member Author

tomasr8 commented Feb 11, 2025

is there an open issue to track this, or should we create one

There's a similar issue about type comments: #101494

@sergey-miryanov
Copy link
Contributor

Yes, having the comments in the AST would be great - I can look into that and see if there are any issues open about it!

We didn't fill an issue, 'cause we stick to too old Python version, but in our applications, we need to support comments in AST and now have to do tricky re.subs to preserve them.

@serhiy-storchaka
Copy link
Member

There is actually a way to add translator comments even with the AST. You can run the tokenizer separately and collect all comments and their corresponding line numbers. You can then match the comments to gettext calls. I had a working proof of concept implementation of this for babel but I can repurpose it for pygettext!

This is even more general and flexible way, but it may be hard. With comments in AST we have an issue of representing comments in a way that will not break existing code that uses AST if the tree contains comments. There is a problem of representing comments between string literals which will be concatenated into a single AST node -- it can be ignored for pygettext, but not for other applications. There is a problem of distinguishing comments that precede a node from a comment that follows it on the same line from comments that follow it on the following lines. Different applications may need to distinguish different types of comments and may have different preferences of the representation.

But there is no urge to add the --add-comments option right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged The issue has been accepted as valid by a triager. type-feature A feature request or enhancement
Projects
Status: Done
Development

No branches or pull requests

7 participants