Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalisation and Requirement's str() #644

Open
henryiii opened this issue Dec 13, 2022 · 18 comments · Fixed by #696
Open

Normalisation and Requirement's str() #644

henryiii opened this issue Dec 13, 2022 · 18 comments · Fixed by #696

Comments

@henryiii
Copy link
Contributor

If packaging doesn't normalise dependency names when stringifying dependencies, why does it normalise their extras?

Originally posted by @layday in pypa/build#550 (comment)

Build's test suite is broken because packaging 22 is normalizing the extras when producing strings. But it doesn't do this with dependency names. It seems like it should be consistent.

@brettcannon
Copy link
Member

This is actually consistent in the face of https://peps.python.org/pep-0685/ requiring we normalize extras, but nothing requiring we normalize project names. We could potentially normalize project names as well, but that might break code that wants the original string.

@pradyunsg opinions?

@henryiii
Copy link
Contributor Author

henryiii commented Dec 13, 2022

The PEP requires the names to be normalized when comparing, it doesn't say they have to be changed to a normalized string by packaging, though, as far as I can tell.

Maybe this:

For tools writing core metadata, they MUST write out extra names in their normalized form.

This seems problematic in the short term, though, as previous versions don't consider these extra matching. I think it would be better to provide a transition period where extras can be written as specified, and comparison is normalized, before also normalizing the writing. Projects that historically have a a_b extra will suddenly break if the extra gets rewritten to a-b (which I've seen happen when projects were moving to hatchling, by the way - see wntrblm/nox#659 (comment) , where tox_to_nox was being rewritten to tox-to-nox, breaking all workflows with nox[tox_to_nox]` until comparisons are supported in pip and older pips are not in use. Hatchling has an emergency "allow-ambiguous-features" switch that we needed to use to make sure users were not broken).

@layday
Copy link
Member

layday commented Dec 13, 2022

The latest version of Requirement implements __eq__ but this isn't very useful if the components (the name and extras) aren't normalised for comparison:

In [1]: from packaging.requirements import Requirement

In [2]: Requirement('foo') == Requirement('foo')
Out[2]: True

In [3]: Requirement('foo') == Requirement('Foo')
Out[3]: False

In [4]: Requirement('foo[foo_bar]') == Requirement('foo[foo_bar]')
Out[4]: True

In [5]: Requirement('foo[foo_bar]') == Requirement('foo[foo-bar]')
Out[5]: False

Normalising the marker does appear to be required by the PEP. I think it'd be beneficial to review how requirements are represented and compared, e.g.:

  • __str__ should retain the original representation;
  • __eq__ should normalise all components;
  • a new property normalized_str could produce a normalised representation of the requirement - pipe it back into the constructor to get a requirement with all constituent members normalised.

@henryiii
Copy link
Contributor Author

henryiii commented Dec 14, 2022

I believe this change might be related to pypa/pip#11649 ? Specifically, the latest jupyterlab-server from six days ago is now requesting a normalized extra name jsonschema[format-nongpl] while packages built with older packaging are still requesting jsonschema[format_nongpl], causing a collision.

@pradyunsg
Copy link
Member

pradyunsg commented Dec 17, 2022

@pradyunsg opinions?

I think we should preserve them when we load them and represent them as-is; but compare them with normalisation. Basically, what @layday said. :)

@pradyunsg
Copy link
Member

I'm pretty sure a PR would be welcome (and, unless @brettcannon feels differently, likely merged subject to regular code reviews). :)

@brettcannon
Copy link
Member

I'm pretty sure a PR would be welcome (and, unless @brettcannon feels differently, likely merged subject to regular code reviews). :)

I was thinking about this today and I think there's a risk with the change, but one I think we an live with (i.e. I would be fine with a PR that changes the string representation to preserve the original string). Essentially it sucks when comparisons are non-obvious; the string you see will not be the string that comparisons are being made against. So we just need to be prepared to deal with that usability concern.

Hopefully as PEP 685 gets adopted this will become less of an issue.

@pradyunsg
Copy link
Member

We could also add a method that gives you a normalised form of the requirement. Unlike versions, this isn’t defined directly by the PEPs but we have clearly defined semantics for this?

@uranusjr
Copy link
Member

but we have clearly defined semantics for this

Does this mean to normalise the name, extras, versions in specifiers? Maybe additionally sort the extras and specifiers?

@pradyunsg
Copy link
Member

Yes, pretty much. Get the requirement into a string that can be compared with the same semantics as a requirement equality.

@pradyunsg pradyunsg changed the title Stringifying change in 22.0 Normalisation and Reqirement's str() Feb 13, 2023
@astrojuanlu
Copy link
Contributor

Upon reading https://packaging.python.org/en/latest/specifications/core-metadata/#name:

For comparison purposes, the names should be normalized before comparing.

However,

>>> Requirement("A.B-C-D") == Requirement("A.B-C_D")
False

whereas

>>> pkg_resources.Requirement.parse("A.B-C-D") == pkg_resources.Requirement.parse("A.B-C_D")
True

Is this intended?

astrojuanlu added a commit to kedro-org/kedro that referenced this issue May 30, 2023
See pypa/packaging#644 (comment)

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
@brettcannon brettcannon changed the title Normalisation and Reqirement's str() Normalisation and Requirement's str() May 31, 2023
@brettcannon
Copy link
Member

Is this intended?

I think this issue is about trying to come with a good solution to this to rectify that kind of problem.

@uranusjr
Copy link
Member

uranusjr commented Jun 1, 2023

I’d say preserving the original format for str() but normalising for hash() and == makes the most sense, similar to how case-preserving filesystems compare paths. And it probably wouldn’t be too disruptive since I’m not sure it makes too much practice sense for non-normalised forms to compare differently.

@brettcannon
Copy link
Member

I'm personally happy with making equality handle the normalization and not care about the string representation. And if we construct a tuple for everything in the end then we can reuse that tuple for hashing.

astrojuanlu added a commit to astrojuanlu/packaging that referenced this issue Jun 2, 2023
Fix pypagh-644.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
@astrojuanlu
Copy link
Contributor

Opened a pull request #696 let me know what you think.

astrojuanlu added a commit to kedro-org/kedro that referenced this issue Jun 5, 2023
See pypa/packaging#644 (comment)

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit to kedro-org/kedro that referenced this issue Jun 5, 2023
See pypa/packaging#644 (comment)

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit to kedro-org/kedro that referenced this issue Jun 5, 2023
See pypa/packaging#644 (comment)

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit to kedro-org/kedro that referenced this issue Jun 13, 2023
See pypa/packaging#644 (comment)

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit to kedro-org/kedro that referenced this issue Jun 13, 2023
* Make `kedro micropkg package` accept `--verbose`

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Improve error when `micropkg pull` does not find sdist
Fix gh-2542.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Stop using pkg_resources

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Do not rely on setup.py to generate sdist

See gh-2414.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Stop relying on .egg-info directories

Fix gh-2567.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Note change from pkg_requirements

See pypa/packaging#644 (comment)

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Improve code comments

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Fix equality checks of equivalent requirements

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Add micropackaging improvements to release notes

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Revert sdist check to make it more testable

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Fix micropkg pull error handling

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Add tests for new micropkg pull branches

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Remove untested path of private code

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Fix micropkg tests

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Add more detailed explanation of Requirement custom subclass

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

---------

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit to astrojuanlu/packaging that referenced this issue Jun 19, 2023
Fix pypagh-644.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit to astrojuanlu/packaging that referenced this issue Jun 26, 2023
Fix pypagh-644.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
@layday
Copy link
Member

layday commented Jul 7, 2023

IIUC #696 only canonicalises the name - shouldn't extras be canonicalised as well for comparison?

>>> Requirement('foo[foo_bar]') == Requirement('foo[foo-bar]')
False

The originally reported issue hasn't been addressed either - extra markers are canonicalised when the requirement is stringified but names aren't (nor are their extras):

>>> str(Requirement('scikit_learn; extra == "foo_bar"'))
'scikit_learn; extra == "foo-bar"'

Can we reopen this?

@brettcannon brettcannon reopened this Jul 7, 2023
@brettcannon
Copy link
Member

@layday reopened.

@abravalheri
Copy link
Contributor

Would it be possible to have at least a separated method that implements this normalisation (not necessarily str)?

That would facilitate a lot when writing METADATA files...

For example nowadays, we can use str(...), but it does not help when trying to write the requirement with the normalised extra (as required by PEP 685):

$ pip-run 'packaging==24.1'
>>> from packaging.requirements import Requirement
>>> req = Requirement('mariadb>=1.0.1; python_version >= "3" and extra == "mariadb_connector"')
>>> str(req)
'mariadb>=1.0.1; python_version >= "3" and extra == "mariadb_connector"'

(Optionally, this hypothetical method could also decide to normalise the requirement name, but that is not mandated by any PEP yet I believe).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants