-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pygettext: use an AST parser instead of a tokenizer #104400
Comments
We replaced the custom parser in pyclbr with an ast visitor a couple of years ago. It was shorter and clearer and agreed to be a definite improvement. For this type of application, any possible slowdown is irrelevant. |
(cherry picked from commit dcae5cd) Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
(cherry picked from commit dcae5cd) Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
The ``fintl`` module is never installed or tested, meaning that the fallback identity function is unconditionally used for ``_()``. This means we can simplify, converting the docstring to a real docstring, and converting some other strings to f-strings. We also convert the module to UTF-8, sort imports, and remove the history comment, which was last updated in 2002. Consult the git history for a more accurate summary of changes.
#129672) * Update the module docstring * Move ``key_for`` inside the class * Move ``write_pot_file`` outside the class
The ``fintl`` module is never installed or tested, meaning that the fallback identity function is unconditionally used for ``_()``. This means we can simplify, converting the docstring to a real docstring, and converting some other strings to f-strings. We also convert the module to UTF-8, sort imports, and remove the history comment, which was last updated in 2002. Consult the git history for a more accurate summary of changes.
…Visitor (python#129672) * Update the module docstring * Move ``key_for`` inside the class * Move ``write_pot_file`` outside the class
…Visitor (python#129672) * Update the module docstring * Move ``key_for`` inside the class * Move ``write_pot_file`` outside the class
…4402) This greatly simplifies the code and fixes many corner cases.
Thanks, @tomasr8. The AST based implementation is much clearer. The drawback is that it prevents implementing the Is it all, or you planned other work on this issue? |
Having comments in the AST would also be useful for applications like Sphinx -- is there an open issue to track this, or should we create one A |
Yes, having the comments in the AST would be great - I can look into that and see if there are any issues open about it! There is actually a way to add translator comments even with the AST. You can run the tokenizer separately and collect all comments and their corresponding line numbers. You can then match the comments to gettext calls. I had a working proof of concept implementation of this for babel but I can repurpose it for pygettext!
Not specifically in this issue, I think we can close it. I have lots of other things I'd like to improve but I think it's better to open separate issues for that (translator comments being one of those) |
There's a similar issue about type comments: #101494 |
We didn't fill an issue, 'cause we stick to too old Python version, but in our applications, we need to support comments in AST and now have to do tricky re.subs to preserve them. |
This is even more general and flexible way, but it may be hard. With comments in AST we have an issue of representing comments in a way that will not break existing code that uses AST if the tree contains comments. There is a problem of representing comments between string literals which will be concatenated into a single AST node -- it can be ignored for But there is no urge to add the |
Follow up on this forum discussion
This is a part 1/X of improving pygettext. Replacing the tokenizer that powers the message extraction with a parser will simplify the code (no more counting brackets and f-string madness) and make it much easier to extend it with new features later down the road.
This change should also come with a healthy dose of new tests to verify the implementation.
PR coming shortly ;)
Linked PRs
fintl.gettext
from pygettext #129580The text was updated successfully, but these errors were encountered: