Cache VCS repositories locally when installing #11126

sp-ricard-valverde · 2022-05-18T15:07:10Z

What's the problem this feature will solve?

When installing a package from a VCS source from which there's no locally cached wheel yet, pip has to download the repository every time to get its metadata.
For big repositories and/or many VCS dependencies, this can increase installation time by a lot.

Describe the solution you'd like

I would like pip to cache the VCS repositories too whenever they are downloaded, and work on that local copy(fetch latest changes, etc).

Alternative Solutions

Changing all VCS packages urls in direct requirements to to local file repositories would be a partial workaround.
It could be used for local development to speed up solving dependency issues, but would not work for transient VCS dependencies.

Additional context

I came to think of this feature while iterating on solving a dependency hell in a project's requirements.txt file. Because the packages in requirements are not built until all dependency issues are solved(one at a time) the packages for from VCS are never actually cached, and every time I try to fix a dependency it has to download all repositories for the VCS packages yet again.

Code of Conduct

I agree to follow the PSF Code of Conduct.

uranusjr · 2022-05-19T02:17:27Z

Personally I don’t particularly like this. A VCS repository cache is too easy to corrupt, and compared to other kinds of caches (which are validated by hash), it’s not as simple to detect such corruption. Personally I would prefer to keep things simple, and leave such optimisation to the user.

pfmoore · 2022-05-19T09:03:48Z

I agree. Doing the fetch could fail for all sorts of reasons - what if there was a rebase in the source repo, which couldn't be merged without manual intervention? It would take a lot of care in pip to handle those cases cleanly, and IMO the gain isn't worth the complexity.

As the OP noted, changing all VCS urls to local directories, and manually managing the refreshes is a workaround here. I don't think we should try to make pip handle all possible workflows "out of the box", and in this case there's an alternative workflow that doesn't need special support from pip.

sp-ricard-valverde · 2022-05-19T10:52:50Z

It's a cache so it could be a best effort, opt-in solution. A fetch with -f(for Git VCS) will solve that particular problem of history rewrites(you don't care about local changes at all).

As I said, changing the VCS urls will only be useful for local development(you will need to remember to change the urls in requirements back again!) and will not work on transitive dependencies.

RonnyPfannschmidt · 2022-06-04T10:54:56Z

for all content hash based vcs's an optimization can be done (like read if the hash of the remote is still the same,
or if its a tag, even error out if its changed

i believe this applies to mercurial and git

sbidoul · 2022-06-04T11:40:32Z

I also think pip is not the place to implement VCS-level caching. One approach that helped my group with large repos is a caching git wrapper such as git-autoshare.

pip already caches wheels built from VCS references to git commits (it probably also works for mercurial and bazaar).

But as the OP mentions, wheels are not built until the resolution was successful so the current wheel cache does not help in that case, and I agree the described scenario can be painful.

There are two optimizations I have in the back of my mind that could help:

caching prepared metadata (which would help with sdists too)
also caching VCS references to branches and tag (based on the resolved commit)

These are not very high on my priority list, though.

pfmoore · 2022-06-05T02:56:30Z

caching prepared metadata (which would help with sdists too)

This is potentially dangerous in any case. There's no guarantee that a source tree will generate the same prepared metadata when rebuilt at a later time. Consider a hatch plugin similar to hatch-vcs, which generated a calver version based on the date...

We can make some assumptions about metadata for a sdist (name and version are static), but not for a source tree. (Of course, we could declare such edge cases as unsupported, but we can't even know to explicitly reject them without doing the build, so we'd risk silently using incorrect data).

sbidoul · 2022-06-05T08:29:59Z

There's no guarantee that a source tree will generate the same prepared metadata when rebuilt at a later time.

This applies to our current wheel cache too. There are indeed many ways two builds of the same source tree can yield different results, due to environmental parameters that our wheel cache can't possibly know about.

If we use the same criteria as the wheel cache to decide whether to cache or not, I think the situation would be exactly the same with a metadata cache ?

And in such situations, users can use different caches or disable caching entirely, as they already need to do today with the wheel cache.

pfmoore · 2022-06-05T12:36:00Z

Good point. But I thought the wheel cache was keyed by project name (and used as essentially an extra source of potential wheels). We can’t do the same with metadata for a source tree as we can’t know the name without getting the metadata. So what would be the key for the metadata cache?

You did say this wasn’t a high priority for you though, so I’m fine if you want to park the discussion for now.

sbidoul · 2022-06-05T12:51:35Z

Wheel cache entries are keyed by source artifact URL, and then the name and supported tags are necessary to look up a wheel in a cache entry. So to benefit from the wheel cache with direct urls, one has to use the "name @ url" syntax. For sdists obtained via an index, the name is known before looking up the url.

My current intuition is that these mechanisms would work for a metadata cache too, but being sure of that will require investigation and that will be for other times indeed.

sbidoul · 2022-06-05T12:55:48Z

As a side note, this question of environmental parameters makes me think that we could consider taking into account --config-settings in addition to the URL in wheel cache keys. That may also mean we should encourage users to use config settings over environment variables to pass options to build backends.

pfmoore · 2022-06-05T13:30:20Z

That might be worthwhile but I’d be reluctant to include the extra complexity until there’s more sign that backends will actually use the config settings.

sbidoul · 2022-06-05T13:35:39Z

I've created 3 separate issues to discuss and track possible optimizations that could help in the OP scenario.

sbidoul · 2022-09-25T10:53:36Z

Closing this as the conclusions are tracked in separate issues.

sp-ricard-valverde added S: needs triage Issues/PRs that need to be triaged type: feature request Request for a new feature labels May 18, 2022

This was referenced Jun 5, 2022

Consider --config-settings in wheel cache keys #11164

Open

Cache prepared metadata #11165

Open

sbidoul mentioned this issue Jun 5, 2022

Cache wheels built from VCS branches and tags #11166

Open

1 task

sbidoul added S: awaiting response Waiting for a response/more information resolution: out of scope and removed S: needs triage Issues/PRs that need to be triaged labels Jun 5, 2022

sbidoul closed this as not planned Won't fix, can't repro, duplicate, stale Sep 25, 2022

github-actions bot locked as resolved and limited conversation to collaborators Oct 26, 2022

pradyunsg removed the S: awaiting response Waiting for a response/more information label Mar 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache VCS repositories locally when installing #11126

Cache VCS repositories locally when installing #11126

sp-ricard-valverde commented May 18, 2022 •

edited

Loading

uranusjr commented May 19, 2022

pfmoore commented May 19, 2022

sp-ricard-valverde commented May 19, 2022

RonnyPfannschmidt commented Jun 4, 2022

sbidoul commented Jun 4, 2022

pfmoore commented Jun 5, 2022

sbidoul commented Jun 5, 2022

pfmoore commented Jun 5, 2022

sbidoul commented Jun 5, 2022

sbidoul commented Jun 5, 2022

pfmoore commented Jun 5, 2022

sbidoul commented Jun 5, 2022

sbidoul commented Sep 25, 2022

Cache VCS repositories locally when installing #11126

Cache VCS repositories locally when installing #11126

Comments

sp-ricard-valverde commented May 18, 2022 • edited Loading

What's the problem this feature will solve?

Describe the solution you'd like

Alternative Solutions

Additional context

Code of Conduct

uranusjr commented May 19, 2022

pfmoore commented May 19, 2022

sp-ricard-valverde commented May 19, 2022

RonnyPfannschmidt commented Jun 4, 2022

sbidoul commented Jun 4, 2022

pfmoore commented Jun 5, 2022

sbidoul commented Jun 5, 2022

pfmoore commented Jun 5, 2022

sbidoul commented Jun 5, 2022

sbidoul commented Jun 5, 2022

pfmoore commented Jun 5, 2022

sbidoul commented Jun 5, 2022

sbidoul commented Sep 25, 2022

sp-ricard-valverde commented May 18, 2022 •

edited

Loading