Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable lazy importing for external dependencies #1059

Closed
ericmjl opened this issue Apr 5, 2022 · 6 comments · Fixed by #1180
Closed

Enable lazy importing for external dependencies #1059

ericmjl opened this issue Apr 5, 2022 · 6 comments · Fixed by #1180

Comments

@ericmjl
Copy link
Member

ericmjl commented Apr 5, 2022

I'd like to propose that we use lazy imports for our external dependencies.

Currently, importing pyjanitor results in about ~1-2 seconds of delay. While insignificant for an interactive data exploration setting, it becomes a significant latency burden if there are projects that depend on pyjanitor.

There is a SPEC for lazy importing, which means this idea is getting community traction in the SciPy world: https://scientific-python.org/specs/spec-0001/

There is also a tool for making lazy loading easy: https://pypi.org/project/lazy_loader/, thereby giving us a path towards lazy loading.

Would love to get thoughts from the dev team!

@thatlittleboy
Copy link
Contributor

Really cool idea, TIL @ericmjl !
I'ld just like to play devil's advocate here for the sake of discussion. Hope you don't take this the wrong way!

Lazy loading of imports, to me, would seem to be most beneficial if we are importing many optional submodules when importing a top-level package.
Case in point is the example in the SPEC: import scipy would also import the linalg submodule and all of its dependencies even though the end user might just want to use a non-linalg function from scipy.
While still maintaining the flexibility for the user to import linalg once they actually need it.


In pyjanitor:

For optional imports such as the scipy.special etc., I think it is a net positive to include lazy loading. I'm not quite convinced yet that lazy loading pandas / pandas_flavour is relevant, since most of the general use of janitor already requires these 2 (i.e., they are not really optional).

Also, is the lazy loading not meant to be implemented as top-level as possible? E.g. in the scenario that some function located within janitor/math.py necessitates a very large import that is not used elsewhere in the library, we should lazy import math in the top-level pyjanitor init rather than lazy-importing that large import inside math.py itself.
I bring this up, because I noticed in the PR, you've opted to lazy import some modules within the "child" files rather than lazy importing the submodules from the top-level files.

These are just my speculation and thoughts,though. Love to be corrected / educated on this! 😄

@ericmjl
Copy link
Member Author

ericmjl commented Apr 17, 2022

Great comments, @thatlittleboy! Thanks for thinking through the matter. I think you've got the right intuition w.r.t. what to lazy-import and in which files to do so. The only complication is the magical nature of pyjanitor where pandas-flavor automatically attaches functions as class methods -- that gets disabled when we lazy-import the submodules. As such, I had to do a workaround where we lazy-load pandas-flavor, which in turns loads pandas (which is the biggest blocker for imports).

Now that I've typed that all out, I think I will go in and modify pandas-flavor (for which I'm one of the maintainers) and make it do selective and lazy imports too. That will help with import times.

@asmirnov69
Copy link
Contributor

Looks like it was never merged to dev? Any reasons not to do that now?

@ericmjl
Copy link
Member Author

ericmjl commented Oct 17, 2022

@asmirnov69 it was mostly my fault for letting the PR go stale, therefore we didn't merge. Is this something you'd like to pick up?

@asmirnov69
Copy link
Contributor

asmirnov69 commented Oct 17, 2022

@ericmjl sure, if you still think it is important to have feature.
it should also help me with deeper understanding how to contribute to pyjanitor.
My plan:

  • fork
  • create new branch feature/conditional-imports-revive out of feature/conditional-imports for better tracking of changes
  • merge dev to feature/conditional-imports-revive
  • test using available pyjanitor tests and more
  • open PR at this point - this is something I have never done but should be possible to ask around

@asmirnov69
Copy link
Contributor

PR is open, original plan was changed to:

  • fork
  • to have feature/conditional-imports2 branch of dev
  • all changes were manual
  • tests looks good
  • PR open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants