-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use PEP 508 rules when setting deps from extras #724
Conversation
This commit addresses issue bazelbuild#720. [PEP 426](https://peps.python.org/pep-0426/#name) states that distribution names are case-insensitive and "-" is interchangeable with "_". The `pip-compile` command creates a lockfile where all package names are lowercase. The tool may also modify interchangeable characters. The following examples are all valid `requirements.txt` or `requirements_lock.txt` entries: ``` SQLAlchemy[postgresql_psycopg2binary]==1.4.36 sqlalchemy[postgresql_psycopg2binary]==1.4.36 sentry_sdk[flask]==1.5.8 sentry-sdk[flask]==1.5.8 ``` A distribution's `METADATA` file contains the stylization chosen by the publisher. By applying a "sanitise" function when building the `extras` dict and when performing lookups we can eliminate this difference as a concern.
def sanitise_requirement(requirement: str) -> str: | ||
"""Given a requirement string, returns a normalized version of the requirement name. | ||
The case and use of "_" vs "-" in a requirements file may vary. | ||
https://peps.python.org/pep-0426/#name | ||
""" | ||
return requirement.replace("-", "_").lower() | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more correct to run both sides (whl.name
and the entries in extras
) through the "official" packaging.utils.canonicalize_name
.
Or, if we don't have that dependency and don't want to take it on, use the code from PEP 503:
import re
def normalize(name):
return re.sub(r"[-_.]+", "-", name).lower()
More specifically, this code also normalizes periods and replaces all contiguous runs of [-_.]
with a single -
. So My-_._-Package
and my-package
are equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging that up @jvolkman.
Taking the regex from PEP 503 seems like a good fit.
I should probably update the name of the PR to reflect the more accurate PEP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, PEP-426 is Withdrawn
. PEP-508 or PEP-503 is what we should be referencing here.
We are in full control of the code. So normalizing when creating the mapping between binary distributions (wheels)-> extras requested
should be done when the requirementst.txt is parsed, and when installing such a wheel and setting up the relevant dependencies for the extras.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked through pypa/installer
and didn't find a method for normalizing the distribution name while parsing wheel metadata:
However, I did find that pypa/pkg_resources
provides an interface for normalization:
I applied this to the .name
property on Wheel
.
I had to apply the normalization in _parse_requirement_for_extra
because that's the closest spot where the distribution name is isolated from the extras
.
pkg_resources.Requirement
can also parse the extras
section but that applies safe_extra
to each entry.
That may be a suitable replacement for the current regex in _parse_requirement_for_extra
, as long as "fixing" the extras doesn't break their inclusion.
Edit: It looks like pkg_resources.Requirement
can do the job, but I'll have to change some tests that currently assert non-normalized extras. (The code that swaps the regex for the code from pkg_resources
is currently committed). I tried to find the context around the initial implementation to make sure this road hasn't been explored in the past but GitHub links to a different PR and the commit message is brief.
That seem viable or should I revert the latest commit and leave the extras-parsing as-is?
Edit 2: Oh, I see now. That method doesn't really handle anything outside of index packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a look at what piptools was doing. They use the vendored version of packaging
to normalize the name https://github.com/jazzband/pip-tools/blob/9fb2d425d6d7abb9d9b9c51ebb711af420da25f4/piptools/utils.py#L30
`pypa/installer` is used to parse Wheel metadata, but does not currently provide a method for normalizing distribution names: - pypa/installer#97 `pypa/pkg_resources` provides `Requirement.parse` which returns an instance of `Requirement` where `.key` is the canonical distribution name per PEP 503. The `Requirement` class can also parse `extras`, but it returns a normalized form that I believe could break the installation of the extras.
@@ -1,6 +1,8 @@ | |||
import re | |||
from typing import Dict, Optional, Set, Tuple | |||
|
|||
import pkg_resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment doesn't intend to block this PR (I think we need this fix).
Mostly an fyi for future travellers: pkg_resources
(we use it elsewhere, so shouldn't block this PR anyway) is on the slow path towards deprecation:
- https://github.com/pypa/setuptools/milestone/4
- https://setuptools.pypa.io/en/latest/pkg_resources.html
There are probably parts of the stdlib we could use instead: https://github.com/python/cpython/blob/9cc0afcb8782c22575fa91756c54029c7e774dfb/Lib/importlib/metadata/__init__.py#L867
Or just create our own function if that's easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we follow what's shown in pip-tools here –> #724 (comment)?
We already access pip._internal
for a bunch of stuff.
This replaces `pkg_resources.Requirement.parse` with `packaging.utils.canonicalize_name`. Doing this pulls in a vendored requirement from `pip`, which may be undesirable. The code we want is just: ``` re.sub(r"[-_.]+", "-", name).lower() ``` This commit also leaves a reference to `pkg_resources` in `wheel.py` which does not canonicalize the name.
I'm happy to push up either this approach or just write the function here. _canonicalize_regex = re.compile(r"[-_.]+")
_canonicalize_regex.sub("-", name).lower() I pushed up a commit which demonstrates the It looks like Pulling in the vendored Do we need to / should we canonicalize the name in the rules_python/python/pip_install/extract_wheels/lib/bazel.py Lines 389 to 402 in 79509b4
|
@mattoberle, I keep forgetting that authors don't have perms to merge these even after approval, do they? |
They do not, looks like merge is locked down to repository owners or another group. |
* Use PEP 426 rules when setting deps from extras This commit addresses issue bazelbuild#720. [PEP 426](https://peps.python.org/pep-0426/#name) states that distribution names are case-insensitive and "-" is interchangeable with "_". The `pip-compile` command creates a lockfile where all package names are lowercase. The tool may also modify interchangeable characters. The following examples are all valid `requirements.txt` or `requirements_lock.txt` entries: ``` SQLAlchemy[postgresql_psycopg2binary]==1.4.36 sqlalchemy[postgresql_psycopg2binary]==1.4.36 sentry_sdk[flask]==1.5.8 sentry-sdk[flask]==1.5.8 ``` A distribution's `METADATA` file contains the stylization chosen by the publisher. By applying a "sanitise" function when building the `extras` dict and when performing lookups we can eliminate this difference as a concern. * Use PEP 503 rules when sanitising extras * Normalize distribution name with pkg_resources `pypa/installer` is used to parse Wheel metadata, but does not currently provide a method for normalizing distribution names: - pypa/installer#97 `pypa/pkg_resources` provides `Requirement.parse` which returns an instance of `Requirement` where `.key` is the canonical distribution name per PEP 503. The `Requirement` class can also parse `extras`, but it returns a normalized form that I believe could break the installation of the extras. * Use Requirement.parse to populate extra reqs * Revert "Use Requirement.parse to populate extra reqs" This reverts commit f0faa97. * Test for distribution name normalization in extras * Replace pkg_resources with packaging.utils This replaces `pkg_resources.Requirement.parse` with `packaging.utils.canonicalize_name`. Doing this pulls in a vendored requirement from `pip`, which may be undesirable. The code we want is just: ``` re.sub(r"[-_.]+", "-", name).lower() ``` This commit also leaves a reference to `pkg_resources` in `wheel.py` which does not canonicalize the name. Co-authored-by: Jonathon Belotti <jonathon@canva.com> Co-authored-by: Alex Eagle <alex@aspect.dev>
PR Checklist
Please check if your PR fulfills the following requirements:
PR Type
What kind of change does this PR introduce?
What is the current behavior?
Issue Number: #720
What is the new behavior?
PEP 426 states that distribution
names are case-insensitive and "-" is interchangeable with "_".
The
pip-compile
command creates a lockfile where all package names are lowercase.The tool may also modify interchangeable characters.
The following examples are all valid
requirements.txt
orrequirements_lock.txt
entries:A distribution's
METADATA
file contains the case and delimiters chosen by the package publisher.By applying a "sanitise" function when building the
extras
dict and when performing lookups we can eliminate this difference as a concern.Does this PR introduce a breaking change?
Other information
While I believe this PR covers the spots required to fix the issue, it's pretty clunky.
One has to remember to sanitize keys before looking them up in the
extras
dict.We could get around that by sanitizing the wheel name parsed from the metadata but I didn't look into what consequences that might have.
Additionally, replacing
"-"
with"_"
was an arbitrary decision.It could easily be
"_"
to"-"
.