-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Add lazy imports to speed up the time taken to load pyjanitor (part 2) #1180
[ENH] Add lazy imports to speed up the time taken to load pyjanitor (part 2) #1180
Conversation
…onditional-imports) to latest dev
Codecov Report
@@ Coverage Diff @@
## dev #1180 +/- ##
==========================================
- Coverage 97.99% 97.58% -0.41%
==========================================
Files 76 76
Lines 3387 3397 +10
==========================================
- Hits 3319 3315 -4
- Misses 68 82 +14 |
Hi @asmirnov69, thanks for taking the initiative on this PR! Can I ask, when you benchmark import times on your own machine, do you observe a speedup in import times? I'm asking b/c on a fresh Codespace, running the import benchmark gets me: However, a subsequent run gets me: As you probably can tell, the absolute first clean import on a machine will take ~6 seconds, and subsequent imports will be fast. I'm not sure whether this will be desired behaviour or not, though. What are your thoughts here? |
Hi @ericmjl Looks like first run is busy caching bytecode into Few immediate observations:
30% improvement is smaller gain than you've seen (~50%). But looks like it make sense to use lazy_import in pyjanitor since gain is here and if anything goes wrong with future development it should be easy to change import statements. In general slow startup time of some script due to huge package dependencies could become the problem elsewhere too. E.g. usual executables or shared objects sometimes loaded before program start using LD_PRELOAD (also used if you have circular dependencies). Maybe python3 team will add something like this in python itself in future. But the real solution could be proper split of pyjanitor to pluggable modules (as mentioned in #826). pyjanitor list of mandatory dependencies is too long. In long term I am for packing of pyjanitor using optional dependencies which I think is proper answer on how to implement the ask of #826:
BTW, I can explore what kind of visualization support can be added to my proposal #1176 to start addressing package dependencies problem. The idea could be to collect import dependencies into rdf triple form and then maybe have some useful diagram out of that - e.g. pyjanitor submodules dependencies graph. |
What should be done about codecov checks failures? Is it mandatory precondition to have all checks green to have contribution accepted? |
@asmirnov69 thanks for the detailed response, and for putting in the time and effort to do profiling on your end. Considering what you've written, I think we're good with the PR. With respect to codecov, the big ones to look out for are that all tests pass (primarily), and that code coverage on the project doesn't decrease significantly. In this case, a decrease by 0.41% is acceptable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving the PR. Let's see how this works out on dev
(i.e. make sure all tests still pass).
@ericmjl do i need to do anything at this point? who is going to complete this PR with merge commit to dev? |
We'll need one more review from the team. I just asked for @samukweku, @thatlittleboy, and @Zeroto521. As a rule of thumb, we try to go for 2 reviews (but time-bound to ~1-1.5 weeks of having a PR open.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small patches.
Since lazy_loader
is imported in janitor/__init__.py
. It's a necessary package.
So It need to add into .requirements/base.in
.
import lazy_loader as lazy | ||
import numpy as np | ||
import pandas_flavor as pf | ||
import pandas as pd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw all packages in janitor/math.py
were wrapped by lazy.load
.
But janitor/functions/impute.py
only wrapped scipy.stats
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i didn't try to go beyond original @ericmjl code changes.
it could be true that some newer additions on dev would require changes in imports. i assume we can address it later in subsequent PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, @Zeroto521. We can continue to improve coverage of lazy loading, so no worries here. Plus we might also benefit from Python having lazy loading by default! (I'm thinking of PEP690 here.) Let's stick with what we have for now.
Three approvers! I am going to hit the merge button 😄. |
PR Description
This PR resolves #1059
Code changes are taken from original PR #1060 and placed to current dev branch head.
PR Checklist
Please ensure that you have done the following:
AUTHORS.md
.CHANGELOG.md
under the latest version header (i.e. the one that is "on deck") describing the contribution.Automatic checks
There will be automatic checks run on the PR. These include:
Relevant Reviewers
@ericmjl