-
-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3x import-time performance regression between 2.x and 3.x #362
Comments
Excellent notes, thanks! I did some similar lazy init in the unicode submodule, but was not aware that Regex had similar issues. |
Probably the lowest-risk is to move the regex compile for Regex out of its init method and into parse_impl. Most of these regex expressions probably don't need to be compiled at all by many parsers, so having Regex save re.compile until the first time it has to parse will convert these compile to just string assignments. Can you give me an idea of any urgency on this fix? |
I don't have any personal stake in this -- but I've noticed in the past that it does for instance make pip pretty slugging to use, but that's also not really that much of a regression there |
Deferring the re compiles until parse time reduces import time as captured using your script from .187-.203s to .125-.140s. Shaving ~60ms on my slow Windows laptop. I wonder how much of the 2.4.7 vs 3.x increase is due to the breakup of pyparsing across multiple files? |
I profiled every mainline commit between 2.4.7 and 3.0.0 and there's ~basically just those two bumps -- the reorg didn't really seem to make a difference as far as I could tell
|
I took at stab at it in #363 I could probably do some more perf analysis to shave off some more time -- but this gets it back to ~66ms on my machine |
Fixed a bad unit test that turned up a bug in raising ValueError - made minor fixes in a follow-up commit. |
Any other optimizations would be very welcome! |
👋 I'm a bit down a rabbit hole so I'll explain my journey and how I got here
I was taking a look at the newest version of pip today and I noticed it was visibly slower than before. a pretty noticeable half second pause on startup (450-500ms)
I did a little bit of profiling and noticed that almost half of that time (minus the base interpreter startup) was spent importing
pyparsing
and it felt like it was slower than beforeI poked around a little bit and noticed that the performance regressed pretty significantly between 2.4.7 and 3.0.0
here's a comparison of some of the import times there (note that these are ~best of 5 which isn't necessarily a super useful representation of speed -- but it's enough to show significance here)
base interpreter startup
(essentially
import site
) -- I'm using python 3.8.10 here on ubuntu 20.042.4.7
3.0.0
and just to confirm it's still present on the latest version, this is 938f59d
profiling
I did a bit of profiling, it looks like it spends ~75% of the time compiling regular expressions -- most of it coming from
pyparsing.core:Regex.__init__
-- I'll attach a svg of the profile belowregressions (?)
the nice thing about this script is it lets me set thresholds and find each regression here
here's some thresholds as well as their commits:
these seem to be the two most impactful changes to startup time
ideas
optimizing this is potentially a little annoying, but I think the best thing that could be done here is to lazily compile those regular expressions -- perhaps by moving the regex object to a property? or lazily constructing the
Regex
instance ? or if python3.7+ is targetted it could use module-level__getattr__
to get some further winsThe text was updated successfully, but these errors were encountered: