Add context manager functionality using hooks #121

vnmabus · 2024-07-03T07:47:06Z

This is an alternative (simpler and safer) implementation of #70 using a meta hook. By using this approach, typing should work without the need of repeating code or using typing stubs (it would allow to remove most of the code in this package, unless there is some problem that I did not notice).

The syntax used to define the lazy imports would be the same as #70:

import lazy_loader as lazy

with lazy.lazy_imports():
    from .some_func import some_func
    from . import some_mod, nested_pkg

This is currently a draft PR because I wanted to receive feedback and do more tests. It should work with any kind of import.

The code should be quite easy to understand and extend if necessary.

One think to take into account: I did NOT implement error_on_import=False with this syntax. I think that it would complicate the implementation and I see no point in delaying to report the ImportError to the user.

@tlambert03, @stefanv what are your thoughts on this?

A different (easier/safer) approach to lazy importing using a meta import hook.

stefanv · 2024-07-03T20:56:28Z

Wow, this almost looks too good to be true 👀 What's the catch!

Is https://github.com/scientific-python/lazy-loader/pull/121/files#diff-0f92aed2087ad50adf43e527efcd07237adacc5031b7296ae1ece2693462986bR320 a recommended practice for taking over imports?

tlambert03 · 2024-07-03T21:36:28Z

certainly does look elegant! I don't have much experience with the practical details of using importlib.util.LazyLoader, so I can't provide much experience here. The official docs do make it sound like A) it might indeed be a good choice for very large libraries that are trying to avoid up-front greedy import times... but also that B) it's not without caveats. From the docs:

This class only works with loaders that define exec_module() as control over what module type is used for the module is required. For those same reasons, the loader’s create_module() method must return None or a type for which its __class__ attribute can be mutated along with not using slots. Finally, modules which substitute the object placed into sys.modules will not work as there is no way to properly replace the module references throughout the interpreter safely; ValueError is raised if such a substitution is detected.

Note

Note For projects where startup time is critical, this class allows for potentially minimizing the cost of loading a module if it is never used. For projects where startup time is not essential then use of this class is heavily discouraged due to error messages created during loading being postponed and thus occurring out of context.

Certainly, none of that is a deal-breaker (and indeed, we're already deep down in the magic here simply by trying to support lazy imports in a type safe way), but perhaps those warnings should be passed along in the readme here along with docs on how to use this feature

vnmabus · 2024-07-04T05:59:50Z

Is https://github.com/scientific-python/lazy-loader/pull/121/files#diff-0f92aed2087ad50adf43e527efcd07237adacc5031b7296ae1ece2693462986bR320 a recommended practice for taking over imports?

So, adding the finder to sys.meta_path seems to be the way. AFAIK, the finders are invoked in order, so it make sense to put ours at the beginning, in order to be called before the others. Of course, nothing prevents other finder to do the same afterwards and run before ours, but then it should have a good reason to do so.

certainly does look elegant! I don't have much experience with the practical details of using importlib.util.LazyLoader, so I can't provide much experience here. The official docs do make it sound like A) it might indeed be a good choice for very large libraries that are trying to avoid up-front greedy import times... but also that B) it's not without caveats.

I didn't either. I just saw it inside the load function in this same package, and my solution was inspired by that. However, I am not proposing here the one and only way to implement this functionality, but just a way in which lazy imports can be implemented in a safe way and with a nice syntax (and one that tricks type-checkers into thinking that they are normal imports). What I want is some discussion around this functionality to see the best way to proceed forward. Should we use LazyLoader or a different approach? For example, it would be possible, albeit a bit more cumbersome, to record each import in the finder, remove the loaded modules after exiting the context manager, and use a mechanism similar to what attach does, if we feel that approach is better.

vyasr · 2024-07-13T18:11:59Z

@stefanv asked me to take a look at this. Planning to review soon, but messaging in advance so you can ping me if I drop the ball.

vyasr

Overall this approach seems pretty sound. I've done a decent bit of hacking around importlib, but I haven't used the LazyLoader specifically before so I did a bit of testing. I was mainly concerned about whether extension modules or built-ins might use custom loaders, but built-ins do not and the extension modules I tested (built either directly with the Python C API or using pybind11) seemed to be fine. On the restrictions:

This class only works with loaders that define exec_module() as control over what module type is used for the module is required.

The restriction that finders must use exec_module is basically a statement that this doesn't work with finders written prior to Python 3.4 because exec_module and create_module were the replacements introduced then and by default load_module simply calls those two functions in modern versions of Python. That seems fine.

For those same reasons, the loader’s create_module() method must return None or a type for which its class attribute can be mutated along with not using slots.

Similarly, I don't think any modern loaders will return something with an immutable __class__.

Finally, modules which substitute the object placed into sys.modules will not work as there is no way to properly replace the module references throughout the interpreter safely; ValueError is raised if such a substitution is detected.

I could see this restriction impacting one of the more common use cases of custom loaders, which is to intercept imports of a specific package. In some of those cases it is reasonable to imagine sys.modules being rewritten as a way to alias or mask a package's import. However, I still think that is a special enough case that expecting lazy_imports to work seamlessly with those custom loaders is out of scope.

vyasr · 2024-07-14T16:21:30Z

lazy_loader/__init__.py

+        return spec
+
+
+sys.meta_path.insert(0, LazyFinder())


Most meta path finders will be targeted at specific set of modules. In those cases it is probably fine to insert the finder like this because in general if there are multiple finders/loaders affecting the same set of modules and injecting specialized behavior there is no way to guarantee that they can both do so safely anyway. In this case, however, we need the LazyFinder to respect other finders. If a finder is inserted into the meta path after this finder then it could preempt the lazy find. To address that, instead of inserting the finder here we should insert it inside the __enter__ of the context manager and then remove it in __exit__ with something like

with threading.Lock(): sys.meta_path = [f for f in sys.meta_path if not isinstance(finder, LazyFinder)]

We will need to lock meta_path modification (both insertion and deletion) for thread safety. Since the lazy_imports context is by construction not reentrant it should be fine to not include any logic to handle the object already being in the list.

Sorry for not answering earlier, as I was on vacation.

I know that it is possible for another hook to be inserted before this one after this is imported. However, that would only happen if it is explicitly inserted at the beginning of the list, which is not very common.

Although ensuring that we are at the first position on __enter__ seems sensible at first glance, in a multithread context this simply cannot be done safely, AFAIK, because hooks are global and not per-thread, so you would be messing with the meta path for other threads too (and other threads could change it too!).

Your code seems broken. Not only it assumes that all threads would synchronize the meta path using the same lock (impracticable), but if I am not mistaken you are creating a new lock every time, so you are not even protecting anything (see pylint-dev/pylint#5208).

No problem!

Apologies, my example wasn't intended to be taken verbatim. Concretely, I would suggest storing a lock either at module scope or as a class member of the LazyFinder (e.g. upon first import of the lazy_loader itself). Then you would have a single instance that is shared across all threads and all import calls. I am confused as to why you think having all threads synchronizing the meta path via the same lock would be impractical. In general, thread-safe code requires that all threads would use the same lock, otherwise the lock serves no purpose. Perhaps I am misunderstanding what you mean? Are you concerned that imports might be occurring in a performance-sensitive context? My assumption is that even if you have dozens/hundreds/more threads, the import shouldn't be on the critical path for performance.

No, what I mean is that you cannot control what other threads do. In a multithreading context some threads may be created by other libraries or the application. These could in principle access the list while you are checking/changing it, and they won't use your lock. You can only protect by locking the APIs that you define, not the APIs defined elsewhere, unless you are in complete control of all threads of the process.

Ah yes, if any other library includes modification of sys.meta_path beyond simply usage of this context manager in different threads then I agree that we have no control over that. There is nothing that we can do in that case. The question to me is really which case has more risk; allowing the lazy finder to be preempted by different insertion order, or allowing the lazy finder to have possible nondeterministic behavior in a multithreaded scenario where multiple distinct code paths modify sys.meta_path. I was assuming the latter would be far less likely than the former (granting that the former is also quite unlikely) and was largely operating under the assumption that the lazy loader would be the only code modifying sys.meta_path in a multithreaded context. OTOH nondeterministic code is a far greater evil and I wasn't really thinking much about it. I'm fine with leaving the implementation as is and just accepting the limitation.

What I think we could do mostly safely is to warn if we are not in first position, possibly even listing which other hooks are before ours, so that users are aware of that and do not spend hours to debug it.

Yeah, worst case with the warning is that we report a false positive if some other libraries is inserted and then removed with bad timing in another thread.

vyasr · 2024-07-14T16:27:26Z

lazy_loader/__init__.py

+        if spec is None:
+            raise ModuleNotFoundError(f"No module named '{fullname}'")
+
+        spec.loader = importlib.util.LazyLoader(spec.loader)


Since the lazy loader produces a delayed evaluation such that any import errors will show up later than expected, and since the user of the lazy_imports context may not realize what is happening under the hood, it might make sense to subclass the LazyLoader and do something like

class MyLazyLoader(LazyLoader): def exec_module(self, module): try: super().exec_module(module) except BaseException as e: raise ImportError(f"Lazy load of module {module} failed") from e

I in principle agree with improving the user reporting of errors, so this sounds fine to me.

stefanv · 2024-10-30T18:38:42Z

After @vyasr's review (thank you!) that this approach is viable, so should we try to get this out there for people to try?

DavidCEllis · 2024-11-05T19:11:47Z

Hi, @vnmabus mentioned this on a python discuss thread about deferred (originally async) imports so I thought I'd comment here as I ran into some issues trying it out.

It seems like you need to import threading somewhere before the context manager is used. If threading isn't already imported, importlib.util.LazyLoader will try to import it within its own exec_module method. As you're in the with block, inside_context_manager is True and so LazyFinder tries to lazily load threading and fails.

Code:

import lazy_loader as lazy

with lazy.lazy_imports():
    import numpy

Error:

Traceback (most recent call last):
  File "C:\Users\ducks\Source\scratch\lazy_demo.py", line 4, in <module>
    import numpy
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib.util>", line 257, in exec_module
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib.util>", line 267, in exec_module
AttributeError: partially initialized module 'threading' has no attribute 'RLock' (most likely due to a circular import)

Code for LazyLoader.exec_module

https://github.com/python/cpython/blob/ffb44cd08c625dc4eda7eea058e1049d1f630491/Lib/importlib/util.py#L257-L274

I also see some examples that use from imports, but as far as I can tell these are not lazily loaded with this method, however any modules imported by the module you are importing from are turned into _LazyModules. I'm not sure this is the desired behaviour.

(This is with the threading import added to avoid the error)

my_module.py

import numpy

print(type(numpy))

def my_function():
    pass

lazy_demo.py

import lazy_loader as lazy

with lazy.lazy_imports():
    import my_module

python -Ximporttime lazy_demo.py

import time: self [us] | cumulative | imported package
...
import time:      1145 |      38170 | lazy_loader
import time:       113 |        113 | my_module

lazy_demo2.py

import lazy_loader as lazy

with lazy.lazy_imports():
    from my_module import my_function

python -Ximporttime lazy_demo2.py

import time: self [us] | cumulative | imported package
...
import time:      1511 |      39843 | lazy_loader
import time:       193 |        193 | my_module
import time:       470 |        470 | numpy
<class 'importlib.util._LazyModule'>

Edit: Noticed the initial error was from a different run than the script from the traceback, must have copied from the wrong run.

stefanv · 2024-11-05T19:26:38Z

@DavidCEllis Thanks very much for taking a look.

More generally speaking, given your depth of experience, if you run into any issues with lazy_loader, or have recommendations on how to make it safer or easier to use, your input would be much appreciated.

mikeshardmind · 2024-11-05T20:09:50Z

There's some other existing prior/parallel art here with the context manager approach. Something that's done in prior/parallel art is declaring which namespaces should be affected.

Taken from the readme of another project working on this problem:

with defer_imports.install_import_hook(module_names=("discord",), recursive=True):
    ...

This avoids your above issue with needing to be careful about importing something (threading) merely for effect before using this context manager.

vnmabus · 2024-11-06T08:57:08Z

Thanks @DavidCEllis and @mikeshardmind for giving feedback!

Apparently the issue with threading is new, due to a recent commit (python/cpython#117983). For solving that, and the "from" issue maybe we should use our own class, to have more control of what is loaded and when.

In #70 I had another draft implementation that relied on replacing temporarily the __import__ function. However, at the moment that felt to me more brittle than using hooks, and I don't know about other options. However if I am not mistaken that took into account also "from" statements, and it could be possible to do the same here by using our own loader.

To be honest, I am not an expert on importing internals, so I hoped that one of these PRs was inspirational enough to "nerd-snip" someone with more knowledge that can help finish the implementation successfully.

DavidCEllis · 2024-11-06T12:21:30Z

I'm not sure you can cleanly handle from imports this way. You might be able to get close but I don't know if you wouldn't run into a great deal of edge cases and headaches.

The _LazyModule objects returned by LazyLoader imports do some __class__ switching as part of their mechanics.

When a module is imported the partially initialised module has its __class__ switched for _LazyModule in LazyLoader.exec_module and then when __getattribute__ is called it does the exec_module call from the original loader and switches the __class__ back.

(Note that even with __getattribute__ overridden you can still call type(module) and see that it's a _LazyModule if you do so before accessing anything.)

By contrast the objects retrieved with a from import can be anything and there's no loading knowledge you can have before the object is accessed to swap out its class, and even if you did the object that existed will still be observable and may cause issues (for example if type(obj) is <type> is used before obj is accessed some other way). PEP 690 indicates there's not really a good way to do this that can be completely transparent without having to modify how module __dict__ works.

The solution I use myself for lazy imports uses roughly the same mechanics as your attach function but covers what I need for the equivalent of attach and load.

Instead of building a function that returns a __getattr__ function for the module I create an instance of a LazyImporter class with a similar __getattr__ method¹. This means I can use the attributes of the instance within the module, or export them via getattr(importer, name) to the module __getattr__ (with a helper function).

Does make a noticeable performance improvement in the CLI tools I use it for. Still doesn't play nice with static analysis sadly.

In my case this class doesn't actually do any importing and delegates that to another class which handles the differences in module, from and try/except imports. ↩

vnmabus · 2024-11-06T12:33:10Z

Wouldn't an hybrid solution be possible, in which the with statement does not import anything (but records the things that should be imported, and fools the type checkers) and then an explicit attach function is called to modify __getattr__?

DavidCEllis · 2024-11-06T17:53:56Z

I think if you want to prevent any import machinery from happening you might be better served by replacing __import__. I believe the import hooks are really for if you want to modify how some part of the process happens, rather than preventing it all together.

Trying to match things up does get a bit hairy when you're trying to handle from x import y as z as 'z' isn't captured in the import call. I have half a solution sketched out but I'm not happy with how I'm matching things up and I'm not sure it won't fall apart with some import type I haven't anticipated.

DavidCEllis · 2024-11-06T19:00:38Z

Something along these lines might work the way you're suggesting? I'm not sure I like it though.

https://gist.github.com/DavidCEllis/29c5e77262f522c26b9f810fcb031a43

vnmabus · 2024-11-06T19:15:56Z

What I worried with replacing __import__ (and the reason I tried with hooks and ContextVar) is about what happens with concurrency (either multithreading or coroutines). I do not think it is safe to change __import__ and change it back in the presence of multiple threads (I think coroutines are safe, as they need to be yield explicitly, if I am not mistaken, and you won't do that inside the with block).

DavidCEllis · 2024-11-06T20:49:23Z

Yes, that's one of the reasons I didn't really like it. In the context of multiple threads trying to import simultaneously you'll run into issues. Even with path hooks, something else could insert a path hook and break the process though.

Defining the imports separately either in a stub or in a dict for attach is probably still the safest way, but I'll have another look to see if there's a way to use path hooks.

Edit: I think it takes some of the tools you'd need to do this out of your hands, but I'm not certain yet.

DavidCEllis · 2024-11-06T22:46:36Z

The docs for the import system state (emphasis mine):

If it is acceptable to only alter the behaviour of import statements without affecting other APIs that access the import system, then replacing the builtin __import__() function may be sufficient. This technique may also be employed at the module level to only alter the behaviour of import statements within that module.

However I haven't actually found a way to make it so the import statement is only replaced in one module, maybe I'm missing something or that's not what this sentence is supposed to imply.

I did make a handful of changes to mitigate some of the issues anyway.

Imports that occur in other modules while the replacement __import__ is in place are now passed through to whatever __import__ was before replacement (__name__ must be explicitly provided). I've also put a threading lock around the replacement, so at least this context manager shouldn't be able to replace itself.

If builtins.__import__ is replaced with something else while within the context manager this will now error, I don't see this as significantly different to finding that something has put a meta path hook earlier in sys.meta_path while within a block that used that mechanism.

vnmabus · 2025-03-22T22:28:01Z

So, I have taken another look at this. It seems that in CPython, there is a __builtins__ global variable per-module containing the builtins (including __import__). Thus, it would be possible to temporarily replace the __builtins__ variable with a custom dict inside the with block (using a bit of hackery to modify the __globals__ of the parent frame).

This would allow to use an implementation that is almost as simple as the one in #70, but that has no concurrency problems (as it would only change temporarily the builtins in the current module, and two threads cannot import the same module concurrently - there is a lock preventing that).

Caveats:

There would be some hackery involved to modify the __globals__ of the parent frame.
The __builtins__ variable is a (documented) implementation detail of CPython. Other interpretersmay not have it, or it may not work exactly the same. PyPy seems to have it, but apparently it ignores writes to it (exec ignores custom __builtins__ in globals pypy/pypy#2653), so this would not work.

vnmabus added 2 commits July 3, 2024 09:07

Lazy imports in context manager.

a17d89e

A different (easier/safer) approach to lazy importing using a meta import hook.

Improve tests.

9a47f79

vyasr suggested changes Jul 14, 2024

View reviewed changes

Sachaa-Thanasius mentioned this pull request Nov 21, 2024

Circular import error for threading when using importlib.util.LazyLoader with a custom finder python/cpython#127116

Open

Add context manager functionality using hooks #121

Are you sure you want to change the base?

Add context manager functionality using hooks #121

Uh oh!

Conversation

vnmabus commented Jul 3, 2024

Uh oh!

stefanv commented Jul 3, 2024

Uh oh!

tlambert03 commented Jul 3, 2024

Uh oh!

vnmabus commented Jul 4, 2024

Uh oh!

vyasr commented Jul 13, 2024

Uh oh!

vyasr left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vyasr Aug 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stefanv commented Oct 30, 2024

Uh oh!

DavidCEllis commented Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stefanv commented Nov 5, 2024

Uh oh!

mikeshardmind commented Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vnmabus commented Nov 6, 2024

Uh oh!

DavidCEllis commented Nov 6, 2024

Footnotes

Uh oh!

vnmabus commented Nov 6, 2024

Uh oh!

DavidCEllis commented Nov 6, 2024

Uh oh!

DavidCEllis commented Nov 6, 2024

Uh oh!

vnmabus commented Nov 6, 2024

Uh oh!

DavidCEllis commented Nov 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DavidCEllis commented Nov 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vnmabus commented Mar 22, 2025

Uh oh!

Uh oh!

vyasr left a comment •

edited

Loading

vyasr Aug 22, 2024 •

edited

Loading

DavidCEllis commented Nov 5, 2024 •

edited

Loading

mikeshardmind commented Nov 5, 2024 •

edited

Loading

DavidCEllis commented Nov 6, 2024 •

edited

Loading

DavidCEllis commented Nov 6, 2024 •

edited

Loading