Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The metadata "homepage" link rendered twice on PyPI #11220

Closed
wimglenn opened this issue Apr 21, 2022 · 18 comments · Fixed by #11273
Closed

The metadata "homepage" link rendered twice on PyPI #11220

wimglenn opened this issue Apr 21, 2022 · 18 comments · Fixed by #11273
Labels
needs discussion a product management/policy issue maintainers and users should discuss

Comments

@wimglenn
Copy link
Contributor

wimglenn commented Apr 21, 2022

With the most current tools, and following the recommendations of PEP 621 – Storing project metadata in pyproject.toml, package publishers end up with duplicate homepage links rendered on the UI:

Screen Shot 2022-04-20 at 10 52 45 PM

https://test.pypi.org/project/issue11220/

This is probably actually the fault of setuptools and/or the spec, but perhaps warehouse can workaround by ignoring duplicates? What do you think? Otherwise we might start seeing it everywhere as people adopt PEP 621 recommendations.. :-\

Note: The package was produced from running python -m build on this source, a stripped down version of the example in the PEP:

[build-system]
requires = ["setuptools >= 40.6.0", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "issue11220"
version = "0.1"

[project.urls]
homepage = "https://github.com/pypa/warehouse/issues/11220"
@di
Copy link
Member

di commented Apr 21, 2022

The issue is that we're getting both of these metadata fields from the distributions:

Home-page: https://github.com/pypa/warehouse/issues/11220
Project-URL: homepage, https://github.com/pypa/warehouse/issues/11220

PEP 621 doesn't seem to explain what tools should do with the old Home-page field when homepage is present, but looks like setuptools has decided to duplicate the value into it. I'm not sure that's really necessary, as I think the long-term plan should be to deprecate this field and stop using it entirely.

It's also not clear to me how PyPI should expect to de-duplicate this. A pyproject.toml like this:

[project.urls]
homepage = "https://example.com/homepage"
Homepage = "https://example.com/Homepage"
Home-page = "https://example.com/Home-page"
Home_page = "https://example.com/Home_page"
Home__page = "https://example.com/Home__page"

Produces this metadata:

Metadata-Version: 2.1
Name: issue11220
Version: 0.1
Summary: foo
Home-page: https://example.com/Home__page
License: UNKNOWN
Project-URL: homepage, https://example.com/homepage
Project-URL: Homepage, https://example.com/Homepage
Project-URL: Home-page, https://example.com/Home-page
Project-URL: Home_page, https://example.com/Home_page
Project-URL: Home__page, https://example.com/Home__page
Platform: UNKNOWN

UNKNOWN

So it looks like it's just choosing the last one, which seems confusing.

FWIW, flit doesn't have this problem, and produces the following metadata:

Metadata-Version: 2.1
Name: issue11220
Version: 0.1
Summary: foo
Project-URL: Home-page, https://example.com/Home-page
Project-URL: Home__page, https://example.com/Home__page
Project-URL: Home_page, https://example.com/Home_page
Project-URL: Homepage, https://example.com/Homepage
Project-URL: homepage, https://example.com/homepage

Which IMO is the correct behavior here and probably what setuptools should be doing, instead of PyPI introducing workarounds for its behavior.

CC @abravalheri for your thoughts.

@di di added needs discussion a product management/policy issue maintainers and users should discuss and removed bug 🐛 labels Apr 21, 2022
@abravalheri
Copy link

abravalheri commented Apr 21, 2022

There is a discussion regarding this in:

The path I choose is the maximum backward compatibility, so I purposefully decided to backfill the Home-page field using the values in Project-URL. This is motivated by the fact that the Home-page field is not currently considered deprecated.

I agree with Dustin that in the case of multiple candidates choosing the last option might be confusing. Would it be better if instead we stick with the first?

@abravalheri
Copy link

Regarding the deduplication, if you write:

[project.urls]
Homepage = "https://github.com/pypa/warehouse/issues/11220"

PyPI will display only one URL.


@di in the case of

Project-URL: Homepage, https://example.com/Homepage
Project-URL: homepage, https://example.com/homepage

If the approach chosen by PyPI is to not do any de-duplucation, does it also make sense to not automatically change the case of the word?

@di
Copy link
Member

di commented Apr 21, 2022

I agree with Dustin that in the case of multiple candidates choosing the last option might be confusing. Would it be better if instead we stick with the first?

I think the best thing would be for setuptools to just stop duplicating the field. I'm not sure what would be depending on maintaining this backwards compatibility, but it's not PyPI and nothing else comes to mind.

If the approach chosen by PyPI is to not do any de-duplucation, does it also make sense to not automatically change the case of the word?

I'm not sure I understand your question! I don't think PyPI should do any modification to what the user has put in this field. I don't think tools should be either.

@dstufft
Copy link
Member

dstufft commented Apr 21, 2022

I agree that PyPI should not be doing modifications to what has been provided to us. I think that as the core metadata for Project-URL is specified we also probably shouldn't change the casing either under the same rationale, since as the metadata si currently defined Homepage and homepage are distinct entries.

I think that there's a good argument to be made that Project-URL keys should be case insensitive, and if we made that change then I think changing case could be fine, but unless that happens I think not changing makes sense.

I also think there is a good argument that the existing url fields that aren't Project-URL should just be deprecated

As far as what is valid for setuptools to do here, tools are generally free to generate the metadata using their inputs as they see fit. PEP 621 defines a standard for how to generate some of those fields, from a specific shared input (in this case, 1:1 mapping with no transformation) but anything not covered by PEP 621.. isn't covered by PEP 621.

Like if we remove PEP 621 from the equation, and someone wrote:

from setuptools import setup

setup(
    ...,
    project_urls={
        "Homepage": "...",
    },
)

Would it be valid for setuptools to backfill the Homepage metadata using that? Which I think is unequivocally yes.

Does it make sense for setuptools to do this? Eh, personally I don't think that field matters and having it duplicated probably confuses some people somewhere so I wouldn't bother if I were setuptools. I don't feel strongly about it personally though.

I think it probably does make sense to deprecate the two old URL fields and in that case PyPI could just stop showing that field and setuptools can easily justify no longer emitting it.

@abravalheri
Copy link

I think the best thing would be for setuptools to just stop duplicating the field.

I understand and sympathise with the point of view you guys are exposing here. Probably in the future we will remove the duplication, but there are a few other changes that might need to happen first, before we reach that point.

For example, distutils expects the field to be defined. It will complain loudly with a warning, which will make setuptools users very confused, and may break CI workflows configured with a more strict PYTHONWARNINGS1.

If Home-page gets deprecated as core metadata, then I would have some more leverage to push this kind of change.

Footnotes

  1. To workaround that I have to propose a PR to drop the warning in pypa/distutils. But the maintainers there may (rightfully) indicate that the field is not deprecated and therefore there is no obvious problem with the behaviour implemented in distutils. There is some precedence with other fields that makes me believe that would be the case.

@abravalheri
Copy link

abravalheri commented Apr 21, 2022

I'm not sure I understand your question! I don't think PyPI should do any modification to what the user has put in this field. I don't think tools should be either.

What is the current approach taken by PyPI for deduplication?
What happens if the user has 2 Project-URL with (exactly) the same label?

Based on previous experiments, it seems that PyPI will deduplicate Home-page: ... and Project-URL: Homepage, ... but not Project-URL: homepage, ....

@abravalheri
Copy link

abravalheri commented Apr 21, 2022

Regardless, this is not a real issue with this project, and warehouse should be free to decide how to display Project-URL however they see fit.

In terms of the setuptools implementation, the answer would be: it's complicated.

Right now, since core metadata is not technically incorrect when duplicating Home-page and Project-url: Homepage, ..., there is a high chance this issue would be treated as a wontfix or at least linger in the issue tracker for a while before we are in a position that we can tackle it properly.

@dstufft
Copy link
Member

dstufft commented Apr 21, 2022

What is the current approach taken by PyPI for deduplication?
What happens if the user has 2 Project-URL with (exactly) the same label?

From what I can tell, in the database it will store the duplicated URLs since it's storing it as a row per entry, with a singular string column for "Key, Value".

When we load that data, our data model makes it available in two forms:

  1. A list of strings in the "Key, Value" form.
  2. A Mapping of Key -> Value, and we "prefill" the mapping with a "Homepage" key taken from Home-page and Download from Download-URL.
    • This means that if someone has a "Homepage" project-url or a "Download" project URL (explicitly that, case sensitive) AND the older Home-page or Download-URL data, that the project-url will overwrite and take precedence. It also means that PyPI is already treating those fields as somewhat deprecated already.
    • It also means if someone has duplicates for the same Project URL Key twice, the latter will overwrite and "win". The core metadata spec is silent in this case, but I think we should probably change this case to be an error on upload, and just require unique keys for every Project-URL entry.

For actually sending that data to the end user, for the HTML pages and the JSON API we use the mapping in (2). In the XMLRPC API we return the list from (1).

  1. To workaround that I have to propose a PR to drop the warning in pypa/distutils. But the maintainers there may (rightfully) indicate that the field is not deprecated and therefore there is no obvious problem with the behaviour implemented in distutils. There is some precedence with other fields that makes me believe that would be the case.

It being deprecated or not isn't really important. It's not a required field, they're of course free to be stricter on what they require than the metadata spec requires, but the spec itself its' perfectly valid to emit metadata without that field regardless of this issue. The only required fields are Name, Version, and Metadata-Version.

@abravalheri
Copy link

Thank you very much for the explanation @dstufft, now it does make sense the behaviour I was observing with Project-url: Homepage, ....

they're of course free to be stricter on what they require than the metadata spec requires, but the spec itself its' perfectly valid to emit metadata without that field regardless of this issue.

This is more or less what is happening right now. Metadata will be emitted, but distutils will be stricter and use warnings to encourage users to fill some metadata it considers important.

@dstufft
Copy link
Member

dstufft commented Apr 21, 2022

I was mostly pointing it out because I'm pretty sure those warnings date back to when at least Download-URL was a mandatory value, I don't recall if Home-page was mandatory or not. It being made optional was a (relatively speaking) recent change (8? years 10? years ago).

@wimglenn
Copy link
Contributor Author

wimglenn commented Apr 21, 2022

Here is what I think setuptools should do:

  • Stop auto-populating a Project-URL into the Home-page.

Here is what I think distutils should do:

  • Nothing. Getting changes into distutils is useless, it's already on death row.

Here is what I think warehouse should do:

  • If there is any Project-URL with the key "homepage" or "home-page" (case insensitive), then use that link and render it as the Homepage link. Ignore whatever is in Home-page. This is just pragmatic, and makes sure PyPI does the right thing for people using the latest build tools without changing anything for people that are using old build tools.

Maybe setuptools could patch their vendored distutils to not warn on missing Home-page, if it's easy enough, but that's not really particularly important (distutils sdist builds already spam ignorable warnings about a bunch of other stuff)

Here is what I do in my build backend, if Home-page is duplicated in a Project-URL, then the Home-page gets kicked out.

@dstufft
Copy link
Member

dstufft commented Apr 21, 2022

Note:

In the Web UI PyPI does not have the concept of a "Homepage" link or a "Download" link. It has a Project URL mapping that gets rendered. PyPI will inject the Home-page and Download-URL metadata into that Project URL mapping at render time (but if the Project URLs contain "Homepage" or "Download" the injected values get over written.

The JSON/XMLRPC API differentiate between project URLs and Home-page/Download-URL and should not munge data between them. Those APIs are to get access to the underlying data, and should faithfully reproduce the data as given.

@wimglenn
Copy link
Contributor Author

wimglenn commented Apr 21, 2022

Yes, I understand. So my suggestion is just to extend the "injected values get overwritten" part to also overwrite with a project url of homepage or home-page or hoMepAGE (any one of them, first, last, doesn't matter as long as it deterministic). It should not also render the "homepage" project url separately if that got used to overwrite.

Happy to prepare a PR if we have some consensus on that 3rd bullet point.

Would it be valid for setuptools to backfill the Homepage metadata using that? Which I think is unequivocally yes.

Sorry to be contrary, but I think no, actually. Because the core metadata spec field is Home-page not Homepage. I think it should not backfill at all, but I could be convinced for it to backfill a Project-URL also called the Home-page into the metadata field Home-page.

@abravalheri where does this backfill happen in setuptools? I was spelunking through the code but could not find it..

@abravalheri
Copy link

abravalheri commented Apr 22, 2022

Hi @wimglenn.

When I was collecting feedback about PEP 621 implementation, cryptic distutils warnings were constantly mentioned by the users. Right now , we avoid messages about the url corresponding to Home-page by backfilling with the value in Project-URL, and therefore effectively promoting backward compatibility with distutils.

Please note that I am not against the changes proposed. I am just saying that the process is a bit more indirect and goes through either changing distutils first or creating the patches you propose. (pypa/distutils has been receiving a lot of attention recently and setuptools regularly merges it into the bundled copy).

It would be also helpful if there is consensus in the community and the Home-page field in the core metadata is officially deprecated.

My personal plan for the moment in terms of contributions to setuptools is to focus in other changes that I judge more urgent/important (unfortunately we all have to prioritize with the volunteering time we have available).

So I invite anyone that is interested in pushing this issue forward in the setuptools/distutils side to propose the changes via code contributions.

@abravalheri
Copy link

where does this backfill happen in setuptools? I was spelunking through the code but could not find it..

This is it: https://github.com/pypa/setuptools/blob/372652d6dadf8de6c29c83d0f2107fdd7b6e6571/setuptools/config/_apply_pyprojecttoml.py#L173

@wimglenn
Copy link
Contributor Author

@abravalheri Thanks - was not aware of pypa/distutils, and just assumed that setuptools was vendoring the stdlib code. Good to know! I'll prepare PR when I find the time, totally agree that it's not an urgent/important issue.

wimglenn added a commit to wimglenn/warehouse that referenced this issue Apr 28, 2022
…cified in the pkginfo twice (in the Home-page or Download-URL field and again in one of the Project-URL fields). closes pypi#11220
@wimglenn
Copy link
Contributor Author

the first two PR should fix the root cause
the third one alleviates the symptoms for pkgs made from existing releases of setuptools

@di di closed this as completed in #11273 Apr 28, 2022
di pushed a commit that referenced this issue Apr 28, 2022
…cified in the pkginfo twice (in the Home-page or Download-URL field and again in one of the Project-URL fields). closes #11220 (#11273)
domdfcoding pushed a commit to domdfcoding/warehouse that referenced this issue Jun 7, 2022
…cified in the pkginfo twice (in the Home-page or Download-URL field and again in one of the Project-URL fields). closes pypi#11220 (pypi#11273)
wimglenn added a commit to wimglenn/setuptools-ext that referenced this issue Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs discussion a product management/policy issue maintainers and users should discuss
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants