-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache VCS repositories locally when installing #11126
Comments
Personally I don’t particularly like this. A VCS repository cache is too easy to corrupt, and compared to other kinds of caches (which are validated by hash), it’s not as simple to detect such corruption. Personally I would prefer to keep things simple, and leave such optimisation to the user. |
I agree. Doing the fetch could fail for all sorts of reasons - what if there was a rebase in the source repo, which couldn't be merged without manual intervention? It would take a lot of care in pip to handle those cases cleanly, and IMO the gain isn't worth the complexity. As the OP noted, changing all VCS urls to local directories, and manually managing the refreshes is a workaround here. I don't think we should try to make pip handle all possible workflows "out of the box", and in this case there's an alternative workflow that doesn't need special support from pip. |
It's a cache so it could be a best effort, opt-in solution. A fetch with -f(for Git VCS) will solve that particular problem of history rewrites(you don't care about local changes at all). As I said, changing the VCS urls will only be useful for local development(you will need to remember to change the urls in requirements back again!) and will not work on transitive dependencies. |
for all content hash based vcs's an optimization can be done (like read if the hash of the remote is still the same, i believe this applies to mercurial and git |
I also think pip is not the place to implement VCS-level caching. One approach that helped my group with large repos is a caching git wrapper such as git-autoshare. pip already caches wheels built from VCS references to git commits (it probably also works for mercurial and bazaar). But as the OP mentions, wheels are not built until the resolution was successful so the current wheel cache does not help in that case, and I agree the described scenario can be painful. There are two optimizations I have in the back of my mind that could help:
These are not very high on my priority list, though. |
This is potentially dangerous in any case. There's no guarantee that a source tree will generate the same prepared metadata when rebuilt at a later time. Consider a hatch plugin similar to hatch-vcs, which generated a calver version based on the date... We can make some assumptions about metadata for a sdist (name and version are static), but not for a source tree. (Of course, we could declare such edge cases as unsupported, but we can't even know to explicitly reject them without doing the build, so we'd risk silently using incorrect data). |
This applies to our current wheel cache too. There are indeed many ways two builds of the same source tree can yield different results, due to environmental parameters that our wheel cache can't possibly know about. If we use the same criteria as the wheel cache to decide whether to cache or not, I think the situation would be exactly the same with a metadata cache ? And in such situations, users can use different caches or disable caching entirely, as they already need to do today with the wheel cache. |
Good point. But I thought the wheel cache was keyed by project name (and used as essentially an extra source of potential wheels). We can’t do the same with metadata for a source tree as we can’t know the name without getting the metadata. So what would be the key for the metadata cache? You did say this wasn’t a high priority for you though, so I’m fine if you want to park the discussion for now. |
Wheel cache entries are keyed by source artifact URL, and then the name and supported tags are necessary to look up a wheel in a cache entry. So to benefit from the wheel cache with direct urls, one has to use the "name @ url" syntax. For sdists obtained via an index, the name is known before looking up the url. My current intuition is that these mechanisms would work for a metadata cache too, but being sure of that will require investigation and that will be for other times indeed. |
As a side note, this question of environmental parameters makes me think that we could consider taking into account |
That might be worthwhile but I’d be reluctant to include the extra complexity until there’s more sign that backends will actually use the config settings. |
I've created 3 separate issues to discuss and track possible optimizations that could help in the OP scenario. |
Closing this as the conclusions are tracked in separate issues. |
What's the problem this feature will solve?
When installing a package from a VCS source from which there's no locally cached wheel yet, pip has to download the repository every time to get its metadata.
For big repositories and/or many VCS dependencies, this can increase installation time by a lot.
Describe the solution you'd like
I would like pip to cache the VCS repositories too whenever they are downloaded, and work on that local copy(fetch latest changes, etc).
Alternative Solutions
Changing all VCS packages urls in direct requirements to to local file repositories would be a partial workaround.
It could be used for local development to speed up solving dependency issues, but would not work for transient VCS dependencies.
Additional context
I came to think of this feature while iterating on solving a dependency hell in a project's requirements.txt file. Because the packages in requirements are not built until all dependency issues are solved(one at a time) the packages for from VCS are never actually cached, and every time I try to fix a dependency it has to download all repositories for the VCS packages yet again.
Code of Conduct
The text was updated successfully, but these errors were encountered: