Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download wheels in batch at the end of .prepare_linked_requirements_more() #8896

Merged

Conversation

cosmicexplorer
Copy link
Contributor

@cosmicexplorer cosmicexplorer commented Sep 22, 2020

This PR corresponds to the second work section described in #7819 (comment) from the much-too-large #8448:

move out "download entire file" out of prepare (and hence, Resolver.resolve)

With some modifications to that prompt, it now consists of:

  1. Create a method RequirementPreparer._complete_partial_requirements() to fully download and prepare lazily-downloaded wheels.
  2. Call that new method at the bottom of .prepare_linked_requirements_more(), separating the batch download from the rest of that method.

@cosmicexplorer cosmicexplorer force-pushed the download-entire-file-outside-of-resolver branch from de18d0d to 4d02725 Compare September 22, 2020 09:35
@cosmicexplorer
Copy link
Contributor Author

cosmicexplorer commented Sep 23, 2020

cc @pradyunsg @McSinyx wrote a tweet about much I loved your work on the lazy wheel fetching: https://twitter.com/hipsterelectron/status/1308306011409661954?s=20

@McSinyx
Copy link
Contributor

McSinyx commented Sep 23, 2020

I'm sorry to disappoint you, the lazy wheel isn't making download any faster at the moment. I suppose the other use-case (pip resolve) is possible with extra steps and this is what you're trying to do here.

Regarding the news file, since pip's internal API is internal, I think this should be flagged as trivial. Concerning the approach, I can recall discussions with @pradyunsg and @chrahunt resulting in keeping the late download in the resolver (at least until the legacy resolver is removed, to keep the consistency of output by the two resolvers), in particular, within RequirementPreparer (see GH-8685). IIRC @pradyunsg told me that it's preferable to pass a dry_run option to the resolver instead—the reason pip resolve has not been implemented is rather we don't know to be exact what format of the output would be most helpful.

Side comment: the use PartialRequirementDownloadCompleter is solely init then call it's only method, which I would love to see to be changed into a function if it turns out that my memory regarding the general approach said above isn't correct.

pradyunsg
pradyunsg previously approved these changes Sep 23, 2020
Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me likey!

news/8896.feature Outdated Show resolved Hide resolved
src/pip/_internal/commands/download.py Outdated Show resolved Hide resolved
@pradyunsg pradyunsg dismissed their stale review September 23, 2020 19:11

That was the wrong shortcut, on the wrong window. :)

Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me likey the general clarity that factoring out the logic for the batch download logic brings.

As @McSinyx mentioned, until the older resolver dies, we'd want to keep the "download everything else" inside the resolver -- maintaining the abstraction of "everything is ready". While the "prepare more" logic is indeed a no-op with the old resolver, it'll be nice to keep it inside to be able to reason about these things more easily until that old one goes away.

Let's keep the download logic inside the resolver.resolve call, and add a flag to tell the resolver whether to fully-prepare partially prepared items. :)

@pradyunsg
Copy link
Member

I made a mess of the review here -- but that's what I get for reviewing at 1am. :)

@cosmicexplorer
Copy link
Contributor Author

cosmicexplorer commented Sep 24, 2020

I'm sorry to disappoint you, the lazy wheel isn't making download any faster at the moment.

You couldn't possibly disappoint me!!! Sounds like it's time for some profiling 🔍 . Sorry for disappearing off the face of the earth for so long! Your adaptation was much much cleaner than what I had and that is a much better place to start off than the other way around (at least, in my opinion, in this case).

I suppose the other use-case (pip resolve) is possible with extra steps and this is what you're trying to do here.

So my thought process was actually just that parallel downloading would be the key to actually improving the download performance!! And if not, I was thinking a further optimization would be to keep alive the connections used to download the metadata, in case requests wasn't doing that already. This is why I'm sorry I was so quiet for a few months 😅. This is definitely on me providing an unclear description of the goal and not handing off my work correctly!

Regarding the news file, since pip's internal API is internal, I think this should be flagged as trivial. Concerning the approach, I can recall discussions with @pradyunsg and @chrahunt resulting in keeping the late download in the resolver (at least until the legacy resolver is removed, to keep the consistency of output by the two resolvers), in particular, within RequirementPreparer (see GH-8685).

This is super super helpful especially with the link to the PR I missed!! Hardly gave @pradyunsg much else to review ^_^

IIRC @pradyunsg told me that it's preferable to pass a dry_run option to the resolver instead—the reason pip resolve has not been implemented is rather we don't know to be exact what format of the output would be most helpful.

Yes! dry_run seems much closer to the option I'd expect. Off the top of my head I was thinking something like a constraints.txt with pinned (==) requirements would be most interesting (I think whatever pip freeze produces would likely be super neat) -- but will follow up in #53!

Side comment: the use PartialRequirementDownloadCompleter is solely init then call it's only method, which I would love to see to be changed into a function if it turns out that my memory regarding the general approach said above isn't correct.

Will do!

Let's keep the download logic inside the resolver.resolve call, and add a flag to tell the resolver whether to fully-prepare partially prepared items. :)

This makes perfect sense! It's also less code ^_^

I made a mess of the review here -- but that's what I get for reviewing at 1am. :)

You reviewed at approximately the same time I posted it, then! We'll call it even.

@cosmicexplorer
Copy link
Contributor Author

Instead of doing this:

add a flag to tell the resolver whether to fully-prepare partially prepared items. :)

I instead just added an extra method .complete_partial_requirements() in operations/prepare.py (and updated the description), because it seemed that piping a PipSession instance into the resolver would risk it being used for other things that were previously nicely separated. So instead I made use of the existing fields of the RequirementPreparer to just ensure that resolver.resolve() could clearly mark which part did the batch downloading. I hope that seems reasonable -- totally willing to refactor if either of you two had something else in mind!

Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... keeping things on RequirementPreparer itself... should get us nearly the same benefits (separation of "collect" vs "download" in prepare_more). I think we can get way without introducing changes on the resolver side?

We'd skip the entire "prepare_more" stage from the resolver during a dry run anyway, so... :)

src/pip/_internal/resolution/resolvelib/resolver.py Outdated Show resolved Hide resolved
src/pip/_internal/operations/prepare.py Outdated Show resolved Hide resolved
@cosmicexplorer
Copy link
Contributor Author

should get us nearly the same benefits (separation of "collect" vs "download" in prepare_more).

Yes! This is a much nicer way to think about it. Will do.

We'd skip the entire "prepare_more" stage from the resolver during a dry run anyway, so... :)

Great point!! (along with the rest)

@cosmicexplorer cosmicexplorer force-pushed the download-entire-file-outside-of-resolver branch 4 times, most recently from 34d27a5 to 50c5e51 Compare September 24, 2020 11:31
@cosmicexplorer
Copy link
Contributor Author

Done, I think! I think this gets us the separation you noted above, paving the way for other methods of downloading requirements, as well as not downloading them at all!

@cosmicexplorer cosmicexplorer changed the title Download wheels in batch outside of the resolver when --use-feature=fast-deps is on Download wheels in batch at the end of .prepare_linked_requirements_more() Sep 25, 2020
@cosmicexplorer
Copy link
Contributor Author

ping!

Copy link
Member

@xavfernandez xavfernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems clean enough but would need rebasing :)

create PartialRequirementDownloadCompleter, and use in wheel, install, and download

add NEWS entry

rename NEWS entry

rename NEWS entry

respond to review comments

move the partial requirement download completion to the bottom of the prepare_more method
@cosmicexplorer cosmicexplorer force-pushed the download-entire-file-outside-of-resolver branch from e9ed0b5 to 22406d4 Compare October 9, 2020 07:20
@cosmicexplorer
Copy link
Contributor Author

Rebased!

@cosmicexplorer
Copy link
Contributor Author

Perhaps @xavfernandez is the right person to ping here?

@cosmicexplorer
Copy link
Contributor Author

Ping once more! ^_^

@cosmicexplorer
Copy link
Contributor Author

Let me know if there's someone specific I should contact!

@pradyunsg
Copy link
Member

Nah -- I'll merge this once #8936 is done with (later this week). :)

@cosmicexplorer
Copy link
Contributor Author

ping @pradyunsg! ❤️

@cosmicexplorer
Copy link
Contributor Author

ping!

@pradyunsg
Copy link
Member

Pong! pip 20.3 hasn't been released yet, because, well, 2020.

I will come back to this, once that's done.

@cosmicexplorer
Copy link
Contributor Author

Not a problem!!! Thanks for the update. I will follow along with the 20.3 updates then!

@uranusjr uranusjr closed this Feb 18, 2021
@uranusjr
Copy link
Member

uranusjr commented Feb 18, 2021

Come on, report your status, Azure.

@uranusjr uranusjr reopened this Feb 18, 2021
@uranusjr uranusjr merged commit f03d71e into pypa:master Feb 18, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants