Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added caching to improve pipenv-resolver performance #3667

Closed
wants to merge 8 commits into from

Conversation

wolph
Copy link

@wolph wolph commented Apr 3, 2019

The issue

The pipenv-resolver performance was bugging me a bit so I had a few rounds with a profiler and added caching with a few of the heaviest calls. In my case these were doing over a 100k calls so even with little performance impact it still made a difference.

I suppose it's related to several of these: https://github.com/pypa/pipenv/search?q=performance&type=Issues but specifically this one: #356

For what it's worth, this change made quite a lot of difference for me. My initial ~30 seconds pipenv lock time was decreased to about ~17 seconds. There might be more low-hanging fruit but this takes a stab at it :)

The fix

It adds a few minor caches to a few key spots in the code which are called very often according to some cProfile runs.

The checklist

  • Associated issue
  • A news fragment in the news/ directory to describe this fix with the extension .bugfix, .feature, .behavior, .doc. .vendor. or .trivial (this will appear in the release changelog). Use semantic line breaks and name the file after the issue number or the PR #.

If this is a patch to the vendor directory…

While these could be submitted to the upstream projects, I think they're mostly very specific to pipenv but feel free to disagree and I'll take care of it.

The case of the Version() modification could be modified from everywhere it's being called, but that's still within the vendor packages so it wouldn't help. It's a rather minor change here but a clean version upstream would be pretty invasive.

@frostming
Copy link
Contributor

Thanks for that patch. It is a great work, but IMO I am not willing to merge such big changes made to vendor libraries(notpip is pip indeed). This will bring extra workload when updating the vendor dependencies. Thanks for understanding.

@frostming frostming added the Type: Vendored Dependencies This issue affects vendored dependencies within pipenv. label Apr 3, 2019
@wolph
Copy link
Author

wolph commented Apr 3, 2019

I understand the reasoning, but I do think a similar solution would be beneficial for pipenv. It nearly halves the time needed for a fairly standard Django project for me.

While pipenv is really great, it's really too slow for comfort. I'll see if I can somehow patch something closer instead but I couldn't find anything quickly accessible that would have a meaningful effect.

@wolph
Copy link
Author

wolph commented Apr 3, 2019

In lieu of changes upstream, would a monkeypatch for these two small sections be an option?

@frostming
Copy link
Contributor

@wolph Yes, if you have an elegant solution instead of hacking into the vendors I will be glad to review.

@wolph wolph changed the title Added caching to improve pienv-resolver performance Added caching to improve pipenv-resolver performance Apr 3, 2019
@wolph
Copy link
Author

wolph commented Apr 4, 2019

@frostming I believe this is about as elegant as monkey patching can get :)
But it definitely helps a bit depending on the case. I think there's more to be done here but it's a start, if this would be accepted I can do a bit more profiling to see how far we can get.

(to clarify, the last 5 runs were with cache enabled)
image

If we want to get any really decent performance we'll have to implement multithreaded and/or async downloading though. Or perhaps even skipping the downloading of the release list and simply check if there's been updates.

@wolph
Copy link
Author

wolph commented Apr 5, 2019

Ok, that last test confirms it. It's not my code, the tests are broken ;)

So please review regardless of the test status :)

@frostming frostming added the Status: Awaiting Review This item is currently awaiting review. label Apr 5, 2019
@wolph
Copy link
Author

wolph commented Apr 6, 2019

@frostming since a large portion of the time is now spent in downloading, an async multithreaded downloading approach could be very beneficial. That would require a similar edit in notpip and doesn't have to be all that invasive I think.

Simply replace the _get_pages method and we're done:

def _get_pages(self, locations, project_name):
# type: (Iterable[Link], str) -> Iterable[HTMLPage]
"""
Yields (page, page_url) from the given locations, skipping
locations that have errors.
"""
seen = set() # type: Set[Link]
for location in locations:
if location in seen:
continue
seen.add(location)
page = _get_html_page(location, session=self.session)
if page is None:
continue
yield page

Would a multithreaded version of that code be acceptable?

@mungojam
Copy link
Contributor

a large portion of the time is now spent in downloading

We use a custom, internal pypi hosted on GitHub pages, so this really hits us as GitHub pages has a fixed cache-length of 10 minutes. It would be help us if the offline cache length for the .whl files could be overridden to be infinite. Otherwise we end up downloading big things like numpy over and over.

@wolph
Copy link
Author

wolph commented Jun 1, 2019

@techalchemy do you have time to review this by any chance?

@techalchemy
Copy link
Member

I will be able to have a look this week, sorry for the very long delay while all the moving pieces were being pulled together

@VonAncken
Copy link

Almost a year since the last activity, what is the current status?

@wolph
Copy link
Author

wolph commented Apr 13, 2020

I've personally mostly stopped using pipenv. While I like the idea, it's far too slow to be usable for a decently sized project.

Even with all of these improvements it's only about twice as fast or so, but it needs to be about 10 times as fast to be feasible for me which would require a substantial rewrite.

@matteius
Copy link
Member

This would have to be revisited for modern python3 and modern pip -- perhaps there is something we can learn from this PR to provide some level of caching in the resolver -- but it would need to be redone. I have a different performance PR out that greatly reduces the amount of system load to perform a batch install in similar or less wall time than the current version, but it doesn't effect the resolver: #5301

The biggest improvements I can see to the resolver without trying the monkeypatch pip internals, would be to get more support for fetching packages hashes from private pypis without having to download all of the wheels or other artifacts to hash them.

@matteius matteius closed this Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting Review This item is currently awaiting review. Type: Vendored Dependencies This issue affects vendored dependencies within pipenv.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants