Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix slowness on Python 3.11 when updating an existing large environment. #12080

Merged
merged 4 commits into from
Jun 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions news/12079.bugfix.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix slowness when using ``importlib.metadata`` (the default way for pip to read metadata in Python 3.11+) and there is a large overlap between already installed and to-be-installed packages.
5 changes: 4 additions & 1 deletion src/pip/_internal/resolution/resolvelib/candidates.py
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,7 @@ def __init__(
self.dist = dist
self._ireq = _make_install_req_from_dist(dist, template)
self._factory = factory
self._version = None

# This is just logging some messages, so we can do it eagerly.
# The returned dist would be exactly the same as self.dist because we
Expand Down Expand Up @@ -376,7 +377,9 @@ def name(self) -> str:

@property
def version(self) -> CandidateVersion:
return self.dist.version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know why the self.dist.version line is slow? And how is it related to Python 3.11?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reiterate for clarity: it is not Python 3.11 per se, but it is because 3.11 uses importlib.metadata by default, instead of pkg_resources. You see the same on Python 3.10 with _PIP_USE_IMPORTLIB_METADATA=1, and 3.11 is fast with _PIP_USE_IMPORTLIB_METADATA=0.

Getting one version is also not slow, but when doing bin/pip install Plone -c https://dist.plone.org/release/6.0.5/constraints.txt it gets called 440,106 times, so even milliseconds add up. I see that with the code from this PR, self.dist.version only gets called one time per package, or 236 times total. For setuptools alone, self.version gets called 48,891 times.

The difference is in the class of the distribution, and how it gets the version. It is either pip._internal.metadata.pkg_resources.Distribution or pip._internal.metadata.importlib._dists.Distribution.

I get confused following the pkg_resources case, but for importlib you get into the standard lib:

> /Users/maurits/.pyenv/versions/3.11.2/lib/python3.11/importlib/metadata/__init__.py(629)version()
-> @property
(Pdb) l
629  ->	    @property
630  	    def version(self):
631  	        """Return the 'Version' metadata for the distribution package."""
632  	        return self.metadata['Version']

That line 632 actually reads the METADATA file, making it slow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so the main difference is probably pkg_resources caches this value in-memory, while importlib.metadata needs to re-parse the file every time. The fix makes sense then.

It would be best to modify the news fragment to reflect that this is about importlib.metadata (and not 3.11 per se), and when pip repeatedly accesses an already-installed distribution (and not all “large environments” are affected, only if that environment has a lot of overlaps with the to-be-installed requirement set).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (and rebased on main).

if self._version is None:
self._version = self.dist.version
return self._version
mauritsvanrees marked this conversation as resolved.
Show resolved Hide resolved

@property
def is_editable(self) -> bool:
Expand Down