Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-42955: Add sys.modules_names #24238

Merged
merged 2 commits into from
Jan 25, 2021
Merged

bpo-42955: Add sys.modules_names #24238

merged 2 commits into from
Jan 25, 2021

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Jan 18, 2021

Add sys.module_names attribute: the list of the standard library
module names.

https://bugs.python.org/issue42955

@vstinner
Copy link
Member Author

The most important part of the PR is the Tools/scripts/generate_module_names.py script which generates the list.

It ignores the following modules. I'm not sure if we should ignore them or not.

IGNORE = {
    '__pycache__',
    'site-packages',

    # Helper modules for public module
    # (ex: _osx_support is used by sysconfig)
    '_aix_support',
    '_collections_abc',
    '_compat_pickle',
    '_compression',
    '_markupbase',
    '_osx_support',
    '_sitebuiltins',
    '_strptime',
    '_threading_local',
    '_weakrefset',

    # Used to bootstrap setup.py
    '_bootsubprocess',

    # test modules
    'test',
    '__phello__.foo',

    # pure Python implementation
    '_py_abc',
    '_pydecimal',
    '_pyio',
}

I chose to ignore these modules to make the list looking nicer. But it makes the list "not correct".

For Windows, I'm lazy and hardcoded the list since it's short and is no updated often:

WINDOWS_MODULES = (
    "_msi",
    "_testconsole",
    "msvcrt",
    "winreg",
    "winsound"
)

@vstinner
Copy link
Member Author

With this PR, sys.module_names contains 295 names. It should contain the exact same number on any platform. A module is listed even if it's disabled explicitly at build time.

@vstinner
Copy link
Member Author

I updated the documentation:

Some special stdlib modules are excluded from this list, like "test" and private helper modules of public modules.

I also fixed the code to also list sub-packages: the new module count is now 313.

Should we also list package sub-modules like asyncio.base_events?

@vstinner
Copy link
Member Author

ensurepip._bundled is not included since it doesn't contain any .py file, only .whl files.

@vstinner
Copy link
Member Author

ensurepip._bundled is not included since it doesn't contain any .py file, only .whl files.

My bad, it's listed: there is Lib/ensurepip/_bundled/__init__.py.

@vstinner
Copy link
Member Author

@pablogsal @serhiy-storchaka: Ok, this PR is now ready for your review :-) I updated the documentation to explicit which modules are included and which are excluded:

A tuple of strings giving the names of standard library modules.

All module kinds are listed: pure Python, built-in, frozen and extension
modules. Modules which are not available on some platforms and modules
disabled at Python build are also listed.

For packages, only sub-packages are listed, not sub-modules. For example,
``concurrent.futures`` is listed, but not ``concurrent.futures.base``.

Some special stdlib modules are excluded, like test and private modules.

It is a superset of the :attr:`sys.builtin_module_names` list.

@vstinner
Copy link
Member Author

There is the dump of sys.module_names, 296 modules: http://paste.alacon.org/47013

@vstinner
Copy link
Member Author

I modified the script to not ignore any module: with such hack, sys.module_names contains 1869 names. List of the 1575 ignored modules: http://paste.alacon.org/47014

I don't think that we should include these test modules and sub-modules. IMO only listing parent packages is enough. It's easy to detect that "asyncio" is a stdlib module from the "asyncio.base_events" name.

@vstinner
Copy link
Member Author

I rebased my PR, squashed commits, and fixed a few comments of Tools/scripts/generate_module_names.py.

@vstinner
Copy link
Member Author

"make regen-module-names" should be tested on macOS and FreeBSD, I'm not sure that setup.py reports properly missing modules in all cases.

@vstinner
Copy link
Member Author

On FreeBSD, make regen-module-names does not change Python/module_names.h (sys.module_names also contains 296 modules).

I wrote a short script to check which modules can be imported or not. Only 9 modules cannot be import on my FreeBSD VM:

_msi: False
_tkinter: False
msilib: False
msvcrt: False
spwd: False
tkinter: False
turtle: False
winreg: False
winsound: False

There are 5 modules specific to Windows (_msi, msilib, msvcrt, winreg, winsound), spwd doesn't exist on FreeBSD, _tkinter probably needs a missing build dependency (and turtle needs it).

import_all.py script:

import sys
import io

def import_ok(name):
    try:
        __import__(name)
    except ImportError:
        return False
    else:
        return True

stdout = sys.stdout
stderr = sys.stderr

for name in sys.module_names:
    sys.stdout = io.StringIO()
    sys.stderr = io.StringIO()
    ok = import_ok(name)
    sys.stdout = stdout
    sys.stderr = stderr
    print(f"{name}: {ok}")

@vstinner
Copy link
Member Author

On Linux (on my Fedora 33 laptop), only 5 modules of sys.module_names cannot be imported, the 5 Windows specific modules:

_msi: False
msilib: False
msvcrt: False
winreg: False
winsound: False

@vstinner
Copy link
Member Author

On Windows, 22 modules cannot be imported:

_crypt: False
_curses: False
_curses_panel: False
_dbm: False
_gdbm: False
_posixshmem: False
_posixsubprocess: False
crypt: False
curses: False
fcntl: False
grp: False
nis: False
ossaudiodev: False
posix: False
pty: False
pwd: False
readline: False
resource: False
spwd: False
syslog: False
termios: False
tty: False

Oh, there are 3 built-in modules on Windows which are not listed on Linux:

_winapi
_xxsubinterpreters
nt

I chose to exclude _xxsubinterpreters in Tools/scripts/generate_module_names.py:

    # Experimental module
    '_xxsubinterpreters',

I would prefer to have the same list on Linux and Windows.

@vstinner
Copy link
Member Author

I updated the PR to add 3 modules (_winapi, _xxsubinterpreters, nt). sys.module_names now contains 299 modules on all platforms.

Note: I checked that all built-in modules listed in Modules/config.c on Linux and PC/config.c on Windwos are listed by Python/module_names.h.

@vstinner
Copy link
Member Author

vstinner commented Jan 18, 2021

I created PR #24254 for bpo-42923 to only dump third party extensions on a Python fatal error (Py_FatalError(), faulthandler fatal signal). The PR is based on this PR.

@vstinner
Copy link
Member Author

cc @ronaldoussoren

@vstinner
Copy link
Member Author

@ronaldoussoren: The important part of this PR is the documentation. Do you think that it clearly describe what can found in the list and limitations? Or do you disagree with including modules which are not available?

@vstinner
Copy link
Member Author

vstinner commented Jan 19, 2021

I rebased this PR which made it way simpler to only focus on adding the sys.module_names list.

I already merged the uncontroversial part in a private API. I started with a private list _Py_module_names in a new Python/module_names.h file: cad8020 It unblocked bpo-42923 to dump third party extension modules on a fatal error.

@vstinner
Copy link
Member Author

I enhanced Tools/scripts/generate_module_names.py to reorder the list and to avoid duplicates. sysmodule.c no longer has to remove duplicates at runtime.

@vstinner
Copy link
Member Author

I enhanced Tools/scripts/generate_module_names.py to reorder the list and to avoid duplicates. sysmodule.c no longer has to remove duplicates at runtime.

Hum, the generated list can be sorted as well. I simplified the runtime construction of sys.module_names even more.

Doc/library/sys.rst Outdated Show resolved Hide resolved
Doc/library/sys.rst Outdated Show resolved Hide resolved
Python/pylifecycle.c Outdated Show resolved Hide resolved
Doc/library/sys.rst Outdated Show resolved Hide resolved
Doc/library/sys.rst Outdated Show resolved Hide resolved
Doc/whatsnew/3.10.rst Outdated Show resolved Hide resolved
Add sys.module_names, containing the list of the standard library
module names.
@vstinner
Copy link
Member Author

I updated the PR to take @pablogsal review in account:

  • Change sys.module_names type to frozenset.
  • No longer ignore private modules (only ignore test modules).
  • Avoid unsafe PySequence_Contains().
  • Rephrase the documentation.
  • Remove the confusing note in the doc about sys.path and import.

Sorry, I amended my commit to be able to modify the commit message.

@vstinner
Copy link
Member Author

@ronaldoussoren @serhiy-storchaka @pablogsal: I plan to merge this PR next monday. Please tell me if you want to review it before that.

When I created https://bugs.python.org/issue42955 I wasn't sure if sys.module_names would be useful, but then I found tons of use cases, and multiple persons told me that they need it for their projects (see the issue, I listed all of them).

Maybe we could add in addition a way to get paths of the stdlib, but I suggest to do that separately. Multiple use cases cannot import modules, but need to check the module name.

@vstinner vstinner merged commit db584bd into python:master Jan 25, 2021
@vstinner vstinner deleted the module_names branch January 25, 2021 12:24
adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 13, 2021
Add sys.module_names, containing the list of the standard library
module names.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants