Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

submodule of c-extension module is quirky #87533

Open
mattip opened this issue Mar 2, 2021 · 6 comments
Open

submodule of c-extension module is quirky #87533

mattip opened this issue Mar 2, 2021 · 6 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@mattip
Copy link
Contributor

mattip commented Mar 2, 2021

BPO 43367
Nosy @mattip, @skoslowski, @YannickJadoul
Files
  • test.c: Complete C module to reproduce the issue
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2021-03-02.11:41:23.575>
    labels = []
    title = 'submodule of c-extension module is quirky'
    updated_at = <Date 2021-03-02.14:54:05.759>
    user = 'https://github.com/mattip'

    bugs.python.org fields:

    activity = <Date 2021-03-02.14:54:05.759>
    actor = 'skoslowski'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = []
    creation = <Date 2021-03-02.11:41:23.575>
    creator = 'mattip'
    dependencies = []
    files = ['49844']
    hgrepos = []
    issue_num = 43367
    keywords = []
    message_count = 2.0
    messages = ['387917', '387935']
    nosy_count = 3.0
    nosy_names = ['mattip', 'skoslowski', 'YannickJadoul']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue43367'
    versions = []

    Linked PRs

    @mattip
    Copy link
    Contributor Author

    mattip commented Mar 2, 2021

    If I have a module "parent", and I add another module "child" with a method "f" to it:

    child = PyModule_Create(...);
    PyModule_AddObject(parent, "child", child);

    then I can call

    import parent
    parent.child.f()

    but importing like this

    from parent.child import f

    raises a ModuleNotFoundError: ... 'parent' is not a package

    This came up in PyTorch pytorch/pytorch#38137
    and in pybind11 pybind/pybind11#2639,
    and in various other places like stackoverflow https://stackoverflow.com/questions/38454852/importerror-with-error-is-not-a-package

    A complete example is attached

    If this is intentional, it might be nice to emit a warning when calling PyModule_AddObject with a module.

    @skoslowski
    Copy link
    Mannequin

    skoslowski mannequin commented Mar 2, 2021

    >> import parent.child

    first imports "parent" (successfully) but then fails, because the import code has no knowledge of were to find ".child". This is because
    a) the module "parent" is not marked as a package (hence the error message)
    b) even if it were a package, there is no (ext) module file to locate and load.

    If you instead run

    > from parent import child

    only "parent" is imported, and "child" is retrieved as an attribute - which it actually is. The import code itself will add such an attribute, too [1]. However, that is after the submodule was located and loaded. Attribute lookup on the parent is not part of the submodule import itself.

    A (hacky) work-around would be to add an entry for "parent.child" in sys.modules. This prevents the import machinery from running.

    A (more) proper solution would be to mark "parent" as a package and install some importlib hooks. See [2] for an example from Cython-land.

    Finally there is PyImport_AppendInittab() [3], which could possibly be used to register "parent.child". I have never tried that. Even if this worked, it would be unsupported and probably not without side-effects.

    [1] https://docs.python.org/3/reference/import.html#submodules
    [2] https://stackoverflow.com/questions/30157363/collapse-multiple-submodules-to-one-cython-extension
    [3] https://docs.python.org/3/c-api/import.html?highlight=inittab#c.PyImport_AppendInittab

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @LostTime76
    Copy link

    LostTime76 commented Oct 26, 2023

    Bump

    Can somebody please explain the proper usage of the C API such that we can create actual packages and submodules from C code that are lazy loaded (Py_AppendInittab)?

    Edit:

    After some spelunking in the cpython code to find where the "is not a package" error was being thrown, it appears it was not happy with the fact that the parent package did not have the path dunder attribute set. As a test on the parent, I put a module exec slot on the parent and set the path attribute in the exec function to None. Then I created another module definition for a few sub modules of the parent package and appended them using the PyImport_AppendInittab function and dot notation.

    PyImport_AppendInittab(parent)
    PyImport_AppendInittab(parent.sub)
    PyImport_AppendInittab(parent.sub.sub)

    The imports now work in a test python script -> import parent.sub.sub. No python, pure C code embedding python. Neat. Now I can lazy load with sub module support. Not sure if this is intended.. but unless somebody comes in here and says otherwise, I will run with this.

    I guess the recommendation here would be to set an exec slot on every module and set the path dunder to None. Additionally, set other attributes like @mattip's OP to make sure the python accesses work as expected. This can all be done lazily as a python script requests the modules instead of brute force creating all the modules up front. After all, lazy loading (PyImport_AppendInittab) was my goal here.

    @ringohoffman
    Copy link

    ringohoffman commented May 16, 2024

    @mattip if I understand the original source of this correctly:

    The issue is really about pickle not being able to serialize attributes of these C-modules, not so much that you can't import these C-modules directly. I am experiencing exactly the same situation, again coming from an issue serializing attributes of PyTorch C-modules.

    Consider a roughly analogous situation:

    import pickle
    import sys
    
    assert pickle.sys.settrace is sys.settrace
    
    import pickle.sys  # ModuleNotFoundError: No module named 'pickle.sys'; 'pickle' is not a package

    I think it is perfectly reasonable for import pickle.sys to raise an error (even if the error message is wrong).

    I think a pure-Python example of what we are seeing with the C-modules is this:

    my_module.py:

    import pickle
    import types
    
    eval_frame = types.ModuleType("eval_frame")
    
    def reset_code():
        ...
    
    reset_code.__module__ = "__main__.eval_frame"
    
    eval_frame.reset_code = reset_code
    
    with open("path.pt", "wb") as f:
        pickle._Pickler(f).dump(pickle.sys.settrace)  # OK
    
    with open("reset_code.pt", "wb") as f:
        pickle._Pickler(f).dump(eval_frame.reset_code)  # _pickle.PicklingError: Can't pickle <function reset_code at 0x7f2383e651f0>: import of module '__main__.eval_frame' failed

    Like pickle.sys, eval_frame is a module but not a package, and so is not importable with import my_module.eval_frame. The main difference here is that pickle doesn't try to import pickle.sys when serializing pickle.sys.settrace because pickle.sys.settrace.__module__ == "sys", and pickle has no trouble importing sys from the global scope.

    For eval_frame (and our C-modules), this is not the case. It is a module with no corresponding globally importable package. my_module.eval_frame.reset_code.__module__ == "my_module.eval_frame", and you can only access eval_frame thru my_module. This would equate to from my_module import eval_frame.

    By tweaking pickle._Pickler.save_global, and pickle._Unpickler.find_class we can achieve this from ... import ... behavior, maintaining backward compatibility while also supporting the above example and serializing C-module attributes:

    class _Pickler:
        ...
    
        def save_global(self, obj, name=None):
            ...
    
            module_name = whichmodule(obj, name)
            try:
                _module_name, *fromlist = module_name.rsplit(".", 1)
                module = __import__(_module_name, fromlist=fromlist, level=0)
                if fromlist:
                    module = getattr(module, *fromlist)
                obj2, parent = _getattribute(module, name)
            except (ImportError, KeyError, AttributeError):
                ...
            ...
    
    
    class _Unpickler:
        ...
    
        def find_class(self, module, name):
            ...
            module_name, *fromlist = module.rsplit(".", 1)
            module = __import__(module_name, fromlist=fromlist, level=0)
            if fromlist:
                module = getattr(module, fromlist[0])
            if self.proto >= 4:
                return _getattribute(module, name)[0]
            else:
                return getattr(module, name)

    I am eager to solve this problem, and so I welcome feedback on my understanding and proposed solution. Please let me know.

    ringohoffman added a commit to ringohoffman/cpython that referenced this issue May 18, 2024
    …age C-modules
    
    There have been recurring issue with PyModule_Create modules in PyTorch; when trying to serialize attributes of these C-modules, pickle fails to import the C-module because it is not a package
    
    This is the current issue that brought this to my attention: pytorch/pytorch#126154
    
    The existing hack to this issue has been to insert the C-module into sys.modules in order to enable pickle to find them: https://github.com/pytorch/pytorch/pull/38136/files#diff-d7e90d0f94b43db763b44fba679a5c1b4cabe3668aaf34f2aee07de8e2d1b2faR524-R528
    
    Instead of relying on this hack, we can change `pickle`'s  approach to loading, which is currently equivalent to `import package.c_module`; instead, we could do `from package import c_module`, which 1) does not care if `c_module` is a package or not 2) is fully backward compatible with the previous approach and 3) slots in nicely to the `fromlist` parameter of `__import__`, which we are already using to load modules in `pickle`
    ringohoffman added a commit to ringohoffman/cpython that referenced this issue May 18, 2024
    There have been recurring issue with PyModule_Create modules in PyTorch; when trying to serialize attributes of these C-modules, pickle fails to import the C-module because it is not a package
    
    This is the current issue that brought this to my attention: pytorch/pytorch#126154
    
    The existing hack to this issue has been to insert the C-module into sys.modules in order to enable pickle to find them: https://github.com/pytorch/pytorch/pull/38136/files#diff-d7e90d0f94b43db763b44fba679a5c1b4cabe3668aaf34f2aee07de8e2d1b2faR524-R528
    
    Instead of relying on this hack, we can change `pickle`'s  approach to loading, which is currently equivalent to `import package.c_module`; instead, we could do `from package import c_module`, which 1) does not care if `c_module` is a package or not 2) is fully backward compatible with the previous approach and 3) slots in nicely to the `fromlist` parameter of `__import__`, which we are already using to load modules in `pickle`
    ringohoffman added a commit to ringohoffman/cpython that referenced this issue May 18, 2024
    There have been recurring issue with PyModule_Create modules in PyTorch; when trying to serialize attributes of these C-modules, pickle fails to import the C-module because it is not a package
    
    This is the current issue that brought this to my attention: pytorch/pytorch#126154
    
    The existing hack to this issue has been to insert the C-module into sys.modules in order to enable pickle to find them: https://github.com/pytorch/pytorch/pull/38136/files#diff-d7e90d0f94b43db763b44fba679a5c1b4cabe3668aaf34f2aee07de8e2d1b2faR524-R528
    
    Instead of relying on this hack, we can change `pickle`'s  approach to loading, which is currently equivalent to `import package.c_module`; instead, we could do `from package import c_module`, which 1) does not care if `c_module` is a package or not 2) is fully backward compatible with the previous approach and 3) slots in nicely to the `fromlist` parameter of `__import__`, which we are already using to load modules in `pickle`
    @serhiy-storchaka
    Copy link
    Member

    Try to set sys.modules['parent.child'] = child after creating the child module. Would it help?

    BTW, it is not recommended to use PyModule_AddObject(), which is broken by design. Use PyModule_AddObjectRef() or PyModule_Add(), depending whether you want to keep a reference to the added object.

    @serhiy-storchaka
    Copy link
    Member

    I guess the PR was closed due to conflicts with recent changes in the pickle module (there was some refactoring), and due to a lack of response from my side.

    Thoughts about this idea:

    • It works for package.c_module, but not for package.c_module1.c_module2. We could try this recursively, but this may complicate the code.
    • Globals are saved as a pair of the module name and the qualified name. If in save_global() we convert a pair package.c_module and qualname to the pair package and c_module.qualname, we may save some time for unpickling (perhaps).

    On other hand, if setting sys.modules works, we can leave this. There may be the same issue in other serializers -- yaml, xmlrpc.

    I have no strong preference. We may try different approaches and see what will happen.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants