Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata backend with importlib.metadata #10709

Merged
merged 26 commits into from
Apr 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
f13fa41
News
uranusjr Dec 5, 2021
d72356d
Hoist distrubution factory methods to base class
uranusjr Dec 4, 2021
60a7ad3
An importlib.metadata-based backend
uranusjr Dec 5, 2021
a9cf547
Add workflow to run tests against the new backend
uranusjr Dec 5, 2021
e3952f8
Fix unit test to check against the correct backend
uranusjr Dec 5, 2021
846d8e5
Try to cover Path interface differences
uranusjr Dec 5, 2021
e371bf2
Catch wheel content error to emit appropriate exception
uranusjr Dec 5, 2021
a640a24
Fix CI bzr installation
uranusjr Dec 5, 2021
675de98
Better test setup and teardown with fixture
uranusjr Dec 5, 2021
4fd0e09
Rename iter_distribution() with 'all' for clarity
uranusjr Dec 5, 2021
b0ec2c0
Only return one distribution under a name
uranusjr Dec 5, 2021
1d22560
Evaluate dependencies for package without extras
uranusjr Dec 5, 2021
1544a31
Implement our own requires.txt parser
uranusjr Dec 5, 2021
17355f8
Unnormalized metadata version works now!
uranusjr Dec 5, 2021
4c096b7
Refactor importlib metadata backend to subpackage
uranusjr Dec 12, 2021
5e9c5ad
Implement egg-link support
uranusjr Dec 12, 2021
0f085e1
Better ZIP support
uranusjr Dec 12, 2021
41de887
Satisfy mypy
uranusjr Dec 12, 2021
173ef62
Fix constructor arguments
uranusjr Dec 12, 2021
321c967
Add compat shim to find eggs in importlib backend
uranusjr Dec 12, 2021
eea84b3
Clean up egg compatibility notes
uranusjr Dec 12, 2021
8073f65
Properly normalize and handle linked dist
uranusjr Dec 12, 2021
a55f0dd
Wording
uranusjr Feb 12, 2022
b894081
Add note on distutils expecting pathlib.Path
uranusjr Feb 12, 2022
f9e554c
Fix outdated comments
uranusjr Feb 12, 2022
ee81f71
Make version hack more reliable
uranusjr Apr 8, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -218,3 +218,38 @@ jobs:
--verbose --numprocesses auto --showlocals
env:
TEMP: "R:\\Temp"

tests-importlib-metadata:
name: tests for importlib.metadata backend
runs-on: ubuntu-latest
env:
_PIP_METADATA_BACKEND_IMPORTLIB: egg-compat

needs: [pre-commit, packaging, determine-changes]
if: >-
needs.determine-changes.outputs.tests == 'true' ||
github.event_name != 'pull_request'

steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: '3.10'

- name: Install Ubuntu dependencies
run: sudo apt-get install bzr

- run: pip install nox 'virtualenv<20'

# Main check
- name: Run unit tests
run: >-
nox -s test-3.10 --
-m unit
--verbose --numprocesses auto --showlocals
- name: Run integration tests
run: >-
nox -s test-3.10 --
-m integration
--verbose --numprocesses auto --showlocals
--durations=5
3 changes: 3 additions & 0 deletions news/10709.process.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Start migration of distribution metadata implementation from ``pkg_resources``
to ``importlib.metadata``. The new implementation is currently not exposed in
any user-facing way, but included in the code base for easier development.
2 changes: 1 addition & 1 deletion src/pip/_internal/commands/show.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def search_packages_info(query: List[str]) -> Generator[_PackageInfo, None, None
"""
env = get_default_environment()

installed = {dist.canonical_name: dist for dist in env.iter_distributions()}
installed = {dist.canonical_name: dist for dist in env.iter_all_distributions()}
query_names = [canonicalize_name(name) for name in query]
missing = sorted(
[name for name, pkg in zip(query, query_names) if pkg not in installed]
Expand Down
42 changes: 29 additions & 13 deletions src/pip/_internal/metadata/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,14 @@
from typing import List, Optional
import functools
import os
from typing import TYPE_CHECKING, List, Optional, Type, cast

from .base import BaseDistribution, BaseEnvironment, FilesystemWheel, MemoryWheel, Wheel

if TYPE_CHECKING:
from typing import Protocol
else:
Protocol = object

__all__ = [
"BaseDistribution",
"BaseEnvironment",
Expand All @@ -11,19 +18,34 @@
"get_default_environment",
"get_environment",
"get_wheel_distribution",
"select_backend",
]


class Backend(Protocol):
Distribution: Type[BaseDistribution]
Environment: Type[BaseEnvironment]


@functools.lru_cache(maxsize=None)
def select_backend() -> Backend:
if os.environ.get("_PIP_METADATA_BACKEND_IMPORTLIB"):
from . import importlib

return cast(Backend, importlib)
from . import pkg_resources

return cast(Backend, pkg_resources)
Comment on lines +30 to +38
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a note for the future -- this is intended to be a temporary measure. We'll swap this out for a more "sane" selection logic in the future.



def get_default_environment() -> BaseEnvironment:
"""Get the default representation for the current environment.

This returns an Environment instance from the chosen backend. The default
Environment instance should be built from ``sys.path`` and may use caching
to share instance state accorss calls.
"""
from .pkg_resources import Environment

return Environment.default()
return select_backend().Environment.default()


def get_environment(paths: Optional[List[str]]) -> BaseEnvironment:
Expand All @@ -33,9 +55,7 @@ def get_environment(paths: Optional[List[str]]) -> BaseEnvironment:
given import paths. The backend must build a fresh instance representing
the state of installed distributions when this function is called.
"""
from .pkg_resources import Environment

return Environment.from_paths(paths)
return select_backend().Environment.from_paths(paths)


def get_directory_distribution(directory: str) -> BaseDistribution:
Expand All @@ -44,9 +64,7 @@ def get_directory_distribution(directory: str) -> BaseDistribution:
This returns a Distribution instance from the chosen backend based on
the given on-disk ``.dist-info`` directory.
"""
from .pkg_resources import Distribution

return Distribution.from_directory(directory)
return select_backend().Distribution.from_directory(directory)


def get_wheel_distribution(wheel: Wheel, canonical_name: str) -> BaseDistribution:
Expand All @@ -57,6 +75,4 @@ def get_wheel_distribution(wheel: Wheel, canonical_name: str) -> BaseDistributio

:param canonical_name: Normalized project name of the given wheel.
"""
from .pkg_resources import Distribution

return Distribution.from_wheel(wheel, canonical_name)
return select_backend().Distribution.from_wheel(wheel, canonical_name)
64 changes: 39 additions & 25 deletions src/pip/_internal/metadata/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
TYPE_CHECKING,
Collection,
Container,
Generator,
Iterable,
Iterator,
List,
Expand All @@ -32,10 +31,7 @@
DirectUrlValidationError,
)
from pip._internal.utils.compat import stdlib_pkgs # TODO: Move definition here.
from pip._internal.utils.egg_link import (
egg_link_path_from_location,
egg_link_path_from_sys_path,
)
from pip._internal.utils.egg_link import egg_link_path_from_sys_path
from pip._internal.utils.misc import is_local, normalize_path
from pip._internal.utils.urls import url_to_path

Expand All @@ -46,7 +42,7 @@

DistributionVersion = Union[LegacyVersion, Version]

InfoPath = Union[str, pathlib.PurePosixPath]
InfoPath = Union[str, pathlib.PurePath]

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -96,6 +92,28 @@ def _convert_installed_files_path(


class BaseDistribution(Protocol):
@classmethod
def from_directory(cls, directory: str) -> "BaseDistribution":
"""Load the distribution from a metadata directory.

:param directory: Path to a metadata directory, e.g. ``.dist-info``.
"""
raise NotImplementedError()

@classmethod
def from_wheel(cls, wheel: "Wheel", name: str) -> "BaseDistribution":
"""Load the distribution from a given wheel.

:param wheel: A concrete wheel definition.
:param name: File name of the wheel.

:raises InvalidWheel: Whenever loading of the wheel causes a
:py:exc:`zipfile.BadZipFile` exception to be thrown.
:raises UnsupportedWheel: If the wheel is a valid zip, but malformed
internally.
"""
raise NotImplementedError()

def __repr__(self) -> str:
return f"{self.raw_name} {self.version} ({self.location})"

Expand Down Expand Up @@ -149,14 +167,7 @@ def installed_location(self) -> Optional[str]:

The returned location is normalized (in particular, with symlinks removed).
"""
egg_link = egg_link_path_from_location(self.raw_name)
if egg_link:
location = egg_link
elif self.location:
location = self.location
else:
return None
return normalize_path(location)
raise NotImplementedError()

@property
def info_location(self) -> Optional[str]:
Expand Down Expand Up @@ -317,21 +328,19 @@ def is_file(self, path: InfoPath) -> bool:
"""Check whether an entry in the info directory is a file."""
raise NotImplementedError()

def iterdir(self, path: InfoPath) -> Iterator[pathlib.PurePosixPath]:
"""Iterate through a directory in the info directory.
def iter_distutils_script_names(self) -> Iterator[str]:
"""Find distutils 'scripts' entries metadata.

Each item yielded would be a path relative to the info directory.

:raise FileNotFoundError: If ``name`` does not exist in the directory.
:raise NotADirectoryError: If ``name`` does not point to a directory.
If 'scripts' is supplied in ``setup.py``, distutils records those in the
installed distribution's ``scripts`` directory, a file for each script.
"""
raise NotImplementedError()

def read_text(self, path: InfoPath) -> str:
"""Read a file in the info directory.

:raise FileNotFoundError: If ``name`` does not exist in the directory.
:raise NoneMetadataError: If ``name`` exists in the info directory, but
:raise FileNotFoundError: If ``path`` does not exist in the directory.
:raise NoneMetadataError: If ``path`` exists in the info directory, but
cannot be read.
"""
raise NotImplementedError()
Expand Down Expand Up @@ -471,8 +480,8 @@ def _iter_distributions(self) -> Iterator["BaseDistribution"]:
"""
raise NotImplementedError()

def iter_distributions(self) -> Generator["BaseDistribution", None, None]:
"""Iterate through installed distributions."""
def iter_all_distributions(self) -> Iterator[BaseDistribution]:
"""Iterate through all installed distributions without any filtering."""
for dist in self._iter_distributions():
# Make sure the distribution actually comes from a valid Python
# packaging distribution. Pip's AdjacentTempDirectory leaves folders
Expand Down Expand Up @@ -502,6 +511,11 @@ def iter_installed_distributions(
) -> Iterator[BaseDistribution]:
"""Return a list of installed distributions.

This is based on ``iter_all_distributions()`` with additional filtering
options. Note that ``iter_installed_distributions()`` without arguments
is *not* equal to ``iter_all_distributions()``, since some of the
configurations exclude packages by default.

:param local_only: If True (default), only return installations
local to the current virtualenv, if in a virtualenv.
:param skip: An iterable of canonicalized project names to ignore;
Expand All @@ -511,7 +525,7 @@ def iter_installed_distributions(
:param user_only: If True, only report installations in the user
site directory.
"""
it = self.iter_distributions()
it = self.iter_all_distributions()
if local_only:
it = (d for d in it if d.local)
if not include_editables:
Expand Down
4 changes: 4 additions & 0 deletions src/pip/_internal/metadata/importlib/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from ._dists import Distribution
from ._envs import Environment

__all__ = ["Distribution", "Environment"]
41 changes: 41 additions & 0 deletions src/pip/_internal/metadata/importlib/_compat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import importlib.metadata
from typing import Any, Optional, Protocol, cast


class BasePath(Protocol):
"""A protocol that various path objects conform.

This exists because importlib.metadata uses both ``pathlib.Path`` and
``zipfile.Path``, and we need a common base for type hints (Union does not
work well since ``zipfile.Path`` is too new for our linter setup).

This does not mean to be exhaustive, but only contains things that present
in both classes *that we need*.
"""

name: str

@property
def parent(self) -> "BasePath":
raise NotImplementedError()


def get_info_location(d: importlib.metadata.Distribution) -> Optional[BasePath]:
"""Find the path to the distribution's metadata directory.

HACK: This relies on importlib.metadata's private ``_path`` attribute. Not
all distributions exist on disk, so importlib.metadata is correct to not
expose the attribute as public. But pip's code base is old and not as clean,
so we do this to avoid having to rewrite too many things. Hopefully we can
eliminate this some day.
"""
return getattr(d, "_path", None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a warning message here, for when this doesn't get a PathDistribution object? Keep a note to remove this, once we decide to expose this as the default, as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does expect non-PathDistribution objects though, and simply returns None for them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @jaraco @warsaw @brettcannon for awareness about this hack.

Personally, I'm fine with this -- I don't imagine there being too many code changes in importlib-metadata, in the 3.11+ standard library. If there are, hopefully, we'll be able to deal with them in an expedient manner on pip's side. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer that pip determine why it cares about the "location" of the metadata and ask what interfaces would be appropriate for a package not on the file system to satisfy those needs. But for now, this hack is probably acceptable. Out of curiousity, what breaks if a Distribution doesn't have a _path?

Copy link
Member Author

@uranusjr uranusjr Apr 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distributions without _path (in this implementation) won’t be able to be uninstalled (or upgraded, because upgrade in pip is just uninstall + install). That’s the only thing pip needs from the metadata location (aside from error messages in some edge cases).



def get_dist_name(dist: importlib.metadata.Distribution) -> str:
"""Get the distribution's project name.

The ``name`` attribute is only available in Python 3.10 or later. We are
targeting exactly that, but Mypy does not know this.
"""
return cast(Any, dist).name
Loading