Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better workaround for cache poisoning (see #3025) #7319

Merged
merged 9 commits into from
Dec 12, 2019

Conversation

sbidoul
Copy link
Member

@sbidoul sbidoul commented Nov 9, 2019

Make sure pip wheel never outputs pure python wheels with a
python implementation tag. Better fix/workaround for #3025 by
using a per-implementation wheel cache instead of caching pure python
wheels with an implementation tag in their name.

Fixes #7296

Copy link
Member

@chrahunt chrahunt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach seems cleaner than setting the python tag explicitly. In both setups we're caching a wheel per interpreter, but here we preserve all the information that a well-behaved wheel build might want to communicate and it will let us remove some workarounds in our current implementation.

src/pip/_internal/cache.py Outdated Show resolved Hide resolved
@pradyunsg
Copy link
Member

pradyunsg commented Nov 10, 2019

My big concern here is that this is basically invalidating all existing caches, without clearing them out.

@chrahunt
Copy link
Member

I would consider that as covered by #6956.

@pradyunsg
Copy link
Member

This change makes it so that all existing cached wheels would no longer be considered when installing/building a wheelhouse etc. Providing functionality to trim the size of your cached wheels (#6956) is different and would not do anything toward addressing that.

I'd rather that we stop spewing out more the specific tags in our 'pip install' wheels, since we're at a point where it's OK to say something along the lines of:

You're doing dynamic things in your setup.py, which isn't a robust approach --things can and will break. Declare your environment-specific details using static metadata that pip can use when making decisions. If that's not workable and you're having issues with our aggressive caching, please disable it with --no-cache-dir.

Basically (1) from #7296 (comment).

@sbidoul
Copy link
Member Author

sbidoul commented Nov 10, 2019

Requested changes done. Unfortunately sys.implementation is not available on python 2 so I copied a function from packaging.tags.

@pradyunsg a possible issue with the more aggressive approach is that #3025 reveals itself for downstream users of broken sdists. Since such broken sdists are probably unmaintained nowadays, the only actionable option for users will be to disable caching or adapt their, say, tox configurations, to use a different cache per python implementation. So I tend to think some lost disk space and a cache rebuild will generate less complaints. I've no strong opinion on this though.

@sbidoul
Copy link
Member Author

sbidoul commented Nov 10, 2019

I added two commits to remove unused code related to python_tag and implementation_tag.

@xavfernandez
Copy link
Member

This PR would mean that pip 20 would have a completely incompatible cache with previous versions.

I think _get_candidates could call a self.get_legacy_path_for_link(link) (with pip 19 computation) if self.get_path_for_link(link) directory does not exist.
It would mean that pip 20 could still use pip<20 cache directories.

@sbidoul
Copy link
Member Author

sbidoul commented Nov 16, 2019

I rebased and added a commit with the new key construction and hashing algorithm.

@sbidoul
Copy link
Member Author

sbidoul commented Nov 17, 2019

I added support for legacy cache entries, following @xavfernandez's suggestion.

The diff is becoming somewhat big, but individual commits should remain easy to review.

src/pip/_internal/cache.py Outdated Show resolved Hide resolved
@xavfernandez
Copy link
Member

And I think a .removal newsfile should be added to advertise the fact that pip>=20 cache is not retro-compatible with previous version (but that pip>=20 should still be able to take advantage of previous cache).
We might also want to plan for the removal of this fallback ? 6 months later ? Or maybe a full year ?

@sbidoul sbidoul force-pushed the cache-per-impl-sbi branch 2 times, most recently from 600778a to 69cdf75 Compare November 18, 2019 14:12
Copy link
Member

@xavfernandez xavfernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've a last comment. Otherwise, it looks really good 👍

src/pip/_internal/cache.py Outdated Show resolved Hide resolved
tests/unit/test_cache.py Show resolved Hide resolved
Copy link
Member

@chrahunt chrahunt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor followup comments (only one actionable right now), otherwise this looks good to me.

src/pip/_internal/cache.py Outdated Show resolved Hide resolved
path = self.get_path_for_link(link)
if os.path.isdir(path):
candidates.extend(os.listdir(path))
# TODO remove legacy path lookup in pip>=21
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make an issue to add a deprecation warning around here when this is merged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of deprecation warning do you have in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering the same as I think pip._internal.utils.deprecation.deprecated would be too noisy for such a thing...
Maybe something like

if gone_in is not None and parse(current_version) >= parse(gone_in):
raise PipDeprecationWarning(message)
to make sure we don't forget about it ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not forget about things to do in the future, you could maybe use GitHub milestones and assign such TODO issues to future milestones. Now that pip has a regular release cadence it might be easy enough to manage.

src/pip/_internal/utils/misc.py Outdated Show resolved Hide resolved
Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to take a look at this tomorrow.

news/7296.removal Outdated Show resolved Hide resolved
@pypa-bot
Copy link

pypa-bot commented Dec 2, 2019

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will eligible for code review and hopefully merging!

@pypa-bot pypa-bot added the needs rebase or merge PR has conflicts with current master label Dec 2, 2019
Make sure ``pip wheel`` never outputs pure python wheels with a
python implementation tag. Better fix/workaround for
`pypa#3025 <https://github.com/pypa/pip/issues/3025>`_ by
using a per-implementation wheel cache instead of caching pure python
wheels with an implementation tag in their name.

Fixes pypa#7296
sbidoul and others added 6 commits December 2, 2019 12:07
Instead of building an URL-ish string that could be
complex to describe and reproduce, generate a dictionary that is
hashed with a simple algorithm.
Pip 20 changes the cache key format to include the
interpreter name. To avoid invalidating all existing caches,
we continue using existing cache entries that were computed
with the legacy algorithm. This should not regress issue pypa#3025
because wheel cached in such legacy entries should have
the python implementation tag set.
Co-Authored-By: Pradyun Gedam <pradyunsg@gmail.com>
@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label Dec 2, 2019
@sbidoul
Copy link
Member Author

sbidoul commented Dec 2, 2019

I rebased to resolve a merge conflict.

@sbidoul
Copy link
Member Author

sbidoul commented Dec 5, 2019

@xavfernandez I think all your comments have been handled. Can you update your review?

@xavfernandez
Copy link
Member

@sbidoul Sorry, this PR was approved in my head ^^

@chrahunt
Copy link
Member

Ping @pradyunsg, any issues with this approach?

Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things, that I think we should consider tackling in follow ups prior to 20.0:

  • add the deprecation notice, for the legacy path usage -- I'm not 100% sure whether we should be printing a not-actionable notice for something like the cache which pip's generating.
  • switch to ask-for-forgiveness-not-permission paradigm for file system access here.

Neither of these are blocking concerns though, so green tick it is. :)

Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL, thanks GitHub.

@pradyunsg
Copy link
Member

Thanks for all your work here @sbidoul! Thanks for the reviews @chrahunt and @xavfernandez!

Thanks again for the ping @chrahunt. :)

@pradyunsg pradyunsg changed the title Better workaround for cache poisoning #3025 Better workaround for cache poisoning (#3025, #7319) Dec 12, 2019
@pradyunsg pradyunsg merged commit ce5edd4 into pypa:master Dec 12, 2019
@pradyunsg pradyunsg changed the title Better workaround for cache poisoning (#3025, #7319) Better workaround for cache poisoning (see #3025) Dec 12, 2019
@sbidoul sbidoul deleted the cache-per-impl-sbi branch December 12, 2019 11:30
@sbidoul
Copy link
Member Author

sbidoul commented Dec 12, 2019

Thanks all!

@pradyunsg pradyunsg added the type: enhancement Improvements to functionality label Dec 21, 2019
@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Jan 11, 2020
@lock lock bot locked as resolved and limited conversation to collaborators Jan 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation type: enhancement Improvements to functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

python tag issue with pip wheel / pip install / caching
5 participants