Skip to content

GH-121970: Extract pydoc_topics into a new extension #129116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 21, 2025

Conversation

AA-Turner
Copy link
Member

@AA-Turner AA-Turner commented Jan 21, 2025

This also simplifies the pydoc-topics builder. Grouping the topic labels by docname improves the speed of topics generation from ~68s to ~57s (19% faster) from a cold state, and from ~13s to ~3.4s (3.8x faster) when re-using the pickled documents.

The representation of topics.py also changes from the default pprint.pformat output of:

topics = {'assert': 'The "assert" statement\n'
           '**********************\n'
           '\n'
           'Assert statements are a convenient way to insert debugging '
           'assertions\n'
           'into a program:\n'

to a simpler representation using triple single quotes (save for when ''' appears in the body):

topics = {
    'assert': r'''The "assert" statement
**********************

Assert statements are a convenient way to insert debugging assertions
into a program:
'''

This representation is both nicer to read and is 63% of the file size of the current topics.py (518KB vs 830KB). Line count also decreases from 17,486 to 12,782.

Tested by running:

>>> import runpy
>>> topics_old = runpy.run_path("Doc/build/topics_old.py")['topics']
>>> topics_new = runpy.run_path("Doc/build/topics_new.py")['topics']
>>> assert list(topics_old) == list(topics_new) # check order
>>> [k for k in topics_old if topics_old[k] != topics_new[k]]
['debugger', 'formatstrings']

The 'formatstrings' change is trailing whitespace on the >>> for num in range(5,12): line:

>>> fs_old_stripped = '\n'.join(map(str.rstrip, topics_old['formatstrings'].splitlines()))
>>> fs_new_stripped = '\n'.join(map(str.rstrip, topics_new['formatstrings'].splitlines()))
>>> assert fs_old_stripped == fs_new_stripped

The 'debugger' change is "Ctrl-C" to "Ctrl"-"C", but I'm not sure what caused this:

>>> print('\n'.join(difflib.unified_diff(topics_old['debugger'].splitlines(), topics_new['debugger'].splitlines())))
--- 
+++ 
@@ -186,9 +186,9 @@
    originate in a module that matches one of these patterns. [1]
 
    By default, Pdb sets a handler for the SIGINT signal (which is sent
-   when the user presses "Ctrl-C" on the console) when you give a
+   when the user presses "Ctrl"-"C" on the console) when you give a
    "continue" command. This allows you to break into the debugger
-   again by pressing "Ctrl-C".  If you want Pdb not to touch the
+   again by pressing "Ctrl"-"C".  If you want Pdb not to touch the
    SIGINT handler, set *nosigint* to true.

cc @hugovk as 3.14 release manager as this does change the format of Lib/pydoc_data/topics.py. I'm happy when doing backports to preserve the current format (pformat) if release managers would prefer.

A


📚 Documentation preview 📚: https://cpython-previews--129116.org.readthedocs.build/

@AA-Turner AA-Turner requested a review from hugovk as a code owner January 21, 2025 03:52
@AA-Turner AA-Turner added docs Documentation in the Doc dir skip news needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes labels Jan 21, 2025
@hugovk
Copy link
Member

hugovk commented Jan 21, 2025

On macOS, make -C Doc pydoc-topics goes from 14s (cold) and 7s (warm) to 9s and 2s.

The file also reduced from 830 KB to 519 KB.

However, I don't have any ['debugger', 'formatstrings'] differences:

Python 3.13.1 (v3.13.1:06714517797, Dec  3 2024, 14:00:22) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import runpy
>>> topics_old = runpy.run_path("topics_old.py")['topics']
>>> topics_new = runpy.run_path("topics_new.py")['topics']
>>> assert list(topics_old) == list(topics_new) # check order
>>> [k for k in topics_old if topics_old[k] != topics_new[k]]
[]
>>>

topics_old_and_new.zip

@AA-Turner
Copy link
Member Author

Wonderful! I think I may have been testing with Sphinx 8.1 vs 8.2 (unreleased), as I made a change to how the :kbd: role is implemented, which would explain the Ctrl-C change.

Copy link
Member

@hugovk hugovk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @hugovk as 3.14 release manager as this does change the format of Lib/pydoc_data/topics.py. I'm happy when doing backports to preserve the current format (pformat) if release managers would prefer.

I'm okay with it, but then there are no backports to 3.14 🙃

@AA-Turner
Copy link
Member Author

Ok, let's merge this and then ask @Yhg1s what he thinks about backporting to 3.13 and 3.12. I'm assuming the backports will fail due to the change to Lib/pydoc_data/topics.py

A

@AA-Turner AA-Turner merged commit 01bcf13 into python:main Jan 21, 2025
53 checks passed
@miss-islington-app
Copy link

Thanks @AA-Turner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry, @AA-Turner, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 01bcf13a1c5bfca5124cf2e0679c9d1b25b04708 3.13

@miss-islington-app
Copy link

Sorry, @AA-Turner, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 01bcf13a1c5bfca5124cf2e0679c9d1b25b04708 3.12

@terryjreedy
Copy link
Member

terryjreedy commented Feb 18, 2025

Did you decide not to backport? Is so, please remove labels.

@AA-Turner
Copy link
Member Author

AA-Turner commented Feb 18, 2025

I would like to backport; waiting on RM approval (see above).

A

AA-Turner added a commit to AA-Turner/cpython that referenced this pull request Feb 22, 2025
AA-Turner added a commit to AA-Turner/cpython that referenced this pull request Feb 22, 2025
…pythonGH-129116)

(cherry picked from commit 01bcf13)

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
@bedevere-app
Copy link

bedevere-app bot commented Feb 22, 2025

GH-130441 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Feb 22, 2025
AA-Turner added a commit to AA-Turner/cpython that referenced this pull request Feb 22, 2025
AA-Turner added a commit to AA-Turner/cpython that referenced this pull request Feb 22, 2025
…pythonGH-129116)

(cherry picked from commit 01bcf13)

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
@bedevere-app
Copy link

bedevere-app bot commented Feb 22, 2025

GH-130443 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 only security fixes label Feb 22, 2025
AA-Turner added a commit that referenced this pull request Feb 27, 2025
… (#130441)

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
AA-Turner added a commit that referenced this pull request Feb 27, 2025
… (#130443)

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
hugovk added a commit to hugovk/cpython that referenced this pull request Mar 14, 2025
hugovk added a commit that referenced this pull request Mar 14, 2025
…ension (#129116)" (#131245)

Revert "GH-121970: Extract ``pydoc_topics`` into a new extension (#129116)"

This reverts commit 01bcf13.
AA-Turner added a commit to AA-Turner/cpython that referenced this pull request Mar 14, 2025
AA-Turner added a commit to AA-Turner/cpython that referenced this pull request Mar 14, 2025
plashchynski pushed a commit to plashchynski/cpython that referenced this pull request Mar 17, 2025
…to a new extension (python#129116)" (python#131245)

Revert "pythonGH-121970: Extract ``pydoc_topics`` into a new extension (python#129116)"

This reverts commit 01bcf13.
seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
…to a new extension (python#129116)" (python#131245)

Revert "pythonGH-121970: Extract ``pydoc_topics`` into a new extension (python#129116)"

This reverts commit 01bcf13.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir skip news
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants