Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poetry builds sdist against common packaging practices #6915

Closed
4 tasks done
brendan-morin opened this issue Oct 28, 2022 · 9 comments
Closed
4 tasks done

Poetry builds sdist against common packaging practices #6915

brendan-morin opened this issue Oct 28, 2022 · 9 comments
Labels
kind/question User questions (candidates for conversion to discussion)

Comments

@brendan-morin
Copy link

  • I am on the latest stable Poetry version, installed using a recommended method.
  • I have searched the issues of this repo and believe that this is not a duplicate.
  • I have consulted the FAQ and blog for any relevant entries or release notes.
  • If an exception occurs when executing a command, I executed it again in debug mode (-vvv option) and have included the output below.

Issue

Poetry 1.2.2 introduced this change (#6621) with the intention to normalize sdist naming conventions, which appears to put the generated sdist files in muddy waters regarding PEP625 and historical convention for sdist naming. The specification states:

The name of an sdist should be {distribution}-{version}.tar.gz.

Unless I'm missing something, it does not specifically state how to handle underscores in the distribution names themselves, but convention has been to have the package name contain dashes rather than underscores for sdist, and underscores for the wheel. E.g., for the example pyproject.toml, the previous behavior (v1.2.1) was:

# poetry build
Building my-package (0.0.1)
  - Building sdist
  - Built my-package-0.0.1.tar.gz
  - Building wheel
  - Built my_package-0.0.1-py3-none-any.whl

The introduced changes appear to violate this for packages which have an underscore/dash in them. In the latest version, it builds an sdist package with underscores rather than dashes:

# poetry build
Building my-package (0.0.1)
 - Building sdist
 - Built my_package-0.0.1.tar.gz
 - Building wheel
 - Built my_package-0.0.1-py3-none-any.whl

While as far as I can tell this does not technically run awry of the PEP, we are seeing in practice that our internal PyPI infrastructure in place seems to reject sdist packages with an error along the lines of:

Exception: cannot publish unexpected package name my_package-0.0.1.tar.gz (expected my-package-0.0.1.tar.gz)

I cannot pretend to understand all of the configuration or customizations that exist in our internal PyPI infrastucture, as the exact nature of this is obfuscated from me. I also understand that it would be easy to close this problem as WONTFIX, under the idea that "this does not sound like a poetry problem". But I think the changes we are seeing were not intended to be breaking, especially as a patch fix. But this change appears to have inadvertently injected a fairly opinionated change to how sdist packages are named that breaks implied backwards compatibility, without offering much in the way of clear upside. If we have issues on our system, I imagine others who rely on this historical convention may also see issues.

This thread in pypa/build offers some good insights here: pypa/build#369. Specifically, this helps contextualize this:

For example, the wheel always use _ because using - in the filename would make parsing ambiguous. An sdist can technically use either (the rightmost dash separates the name and version aprts); underscore is preferred for consistency with wheels, but setuptools use dash for backward compatibility reasons (its behaviour outdates wheels).

I think ultimately the intent of name normalization names sense, and yes, it is probably the fault of other systems for relying on convention over explicit specification. But in this case it seems as though there is a high potential this violates assumptions that other systems hold that poetry would ideally integrate nicely with. And in doing so, it's not clear that any specific value has been added in terms of coercing dashes to underscores as the means of normalization.

In thinking about how to resolve this, my recommendation would be the following:

  1. Revert the breaking behavior in a follow-on patch release.
  2. Add a flag that allows users to opt in to this normalized naming convention that is by default off
  3. Provide warnings that this behavior will be switched to default in a follow-on release (if we actually care - though TBH it seems like it's not really an issue, so I'm not sure what benefit there is to fixing it)
  4. In a subsequent minor release, flip this to default, allow users to opt in to legacy behavior (if the new behavior really is the desired default)

I'd be interested in hearing thoughts on what the intent and preferred way forward would be here.

@brendan-morin brendan-morin added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Oct 28, 2022
@Secrus
Copy link
Member

Secrus commented Oct 28, 2022

Unless I'm missing something, it does not specifically state how to handle underscores in the distribution names themselves

It actually states exactly how the distribution name should be normalized. PEP 625 links to wheel spec for distribution name normalization and there it states:

In distribution names, any run of -_. characters (HYPHEN-MINUS, LOW LINE and FULL STOP)
should be replaced with _ (LOW LINE), and uppercase characters should be replaced with
corresponding lowercase ones. This is equivalent to PEP 503 normalisation followed by replacing - with _. Tools consuming wheels must be prepared to accept . (FULL STOP) and uppercase letters, however, as these were allowed by an earlier version of this specification.

So in my eyes, this is pretty clearly stated.

PEP: https://peps.python.org/pep-0625/#specification
Wheel spec: https://packaging.python.org/en/latest/specifications/binary-distribution-format/#escaping-and-unicode

@brendan-morin
Copy link
Author

I see your point, and that's fair. At a minimum this issue can stand as a reference for other users who find that 1.2.2 was a breaking upgrade for them as well.

@dimbleby
Copy link
Contributor

not a bug, should be closed

@neersighted neersighted added kind/question User questions (candidates for conversion to discussion) and removed kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Oct 29, 2022
@neersighted
Copy link
Member

This is unfortunately just the state of the ecosystem due to past ambiguity. Poetry was broken in one way, and we have chosen instead to align with the new PEP that tries to remove this ambiguity, which may break users moderately in another way. That being said, Poetry is now well-behaved and it is only other tools with latent bugs like Poetry's previous versions that should present problems -- as they become compliant, this will solve itself.

@marcaurele
Copy link

@neersighted do you have some links to when Poetry became compliant to this naming ambiguity (PR / issue ...)?

@Secrus
Copy link
Member

Secrus commented Oct 31, 2022

@neersighted do you have some links to when Poetry became compliant to this naming ambiguity (PR / issue ...)?

Here: #6621

@hadrien
Copy link

hadrien commented Nov 30, 2022

From Normalized Names in PEP-503:

This PEP references the concept of a “normalized” project name. As per PEP 426 the only valid characters in a name are the ASCII alphabet, ASCII numbers, ., -, and _. The name should be lowercased with all runs of the characters ., -, or _ replaced with a single - character.

If I understand correctly, a package can be named my.package, but it has to be published and installed under my-package instead of (how it has been working for years) my.package?

Sorry if it is not the good forum, feel comfortable redirecting me in any case. Thanks

@neersighted
Copy link
Member

This is not the best forum, but hopefully the conversation is short:

What is being referenced here is sdist names, which are now normalized like wheel names. Technically a registry can still host a package using a pretty name (with dots or capitalization, etc) while serving distfiles with normalized names. However, as PyPI derives the project name from the name of the first artifact uploaded, the only way to name a PyPI project with a pretty name is to upload using a renamed sdist with Twine, so that PyPI uses the pretty name of the sdist.

After that point, you can use Poetry to upload and PyPI will correctly match the files to the project. But Poetry will only ever output normalized names as required by the spec, so unless you rename the files yourself you cannot manipulate PyPI this way.

Other indexes that don't infer a project name from an attempt to upload a non-existent name will be less obnoxious here. There's also the fact that the naming rules still create ambiguity and PyPI misbehaves here (but won't fix it under the under-specification is solved) e.g. https://discuss.python.org/t/amending-pep-427-and-pep-625-on-package-normalization-rules/17226.

tueda added a commit to tueda/donuts-python that referenced this issue May 10, 2023
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/question User questions (candidates for conversion to discussion)
Projects
None yet
Development

No branches or pull requests

6 participants