Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd rendering of author when using PEP 621 metadata. #9400

Open
domdfcoding opened this issue Apr 19, 2021 · 35 comments
Open

Odd rendering of author when using PEP 621 metadata. #9400

domdfcoding opened this issue Apr 19, 2021 · 35 comments
Labels
needs discussion a product management/policy issue maintainers and users should discuss

Comments

@domdfcoding
Copy link
Contributor

Describe the bug
PEP 621 allows project metadata to be defined in pyproject.toml. This uses a list of dictionaries to represent the project's authors. Each dictionary contains two keys, "name" and "email".

To map these fields to core metadata, PEP 621 says:

  1. If only name is provided, the value goes in Author.
  2. If only email is provided, the value goes in Author-email.
  3. If both email and name are provided, the value goes in Author-email, with the format {name} <{email}> (with appropriate quoting, e.g. using email.headerregistry.Address).

Based on that, in my build backend whey I am generating metadata that looks like:

Metadata-Version: 2.1
Name: tox-envlist
Version: 0.3.0
Summary: Allows selection of a different tox envlist.
Author-email: Dominic Davis-Foster <dominic@davis-foster.co.uk>

However, on PyPI this renders in the sidebar as:

image

(this example from https://pypi.org/project/tox-envlist/)

It's also wrong in the JSON API:

{
   "info": {
      "author": "",
      "author_email": "Dominic Davis-Foster <dominic@davis-foster.co.uk>"
   }
}

This causes further issues with tools using the API, such as https://pypistats.org, which leaves the author field blank:

image

Expected behavior

Compare this with another project created using setuptools:

image

where the metadata is:

Metadata-Version: 2.1
Name: domdf-python-tools
Version: 2.9.0
Summary: Helpful functions for Python 🐍 🛠️
Home-page: https://github.com/domdfcoding/domdf_python_tools
Author: Dominic Davis-Foster
Author-email: dominic@davis-foster.co.uk

and the response from the JSON API:

{
   "info":{
      "author":"Dominic Davis-Foster",
      "author_email":"dominic@davis-foster.co.uk"
   }
}

(this example from https://pypi.org/project/domdf-python-tools)

I would have expected warehouse to parse the Author-email field into the name and email address, and treat them the same as if they has been defined separately in Author and Author-email.

To Reproduce
Visible at https://pypi.org/project/tox-envlist/

See also https://pypi.org/project/flit/3.2.0/, which uses PEP 621 metadata and has the same problem but uses a different build backend.

My Platform

N/A

Additional context

@ewjoachim
Copy link
Contributor

I could be wrong, but I wonder if this would be on PyPI or on the tool you used for uploading (twine, or poetry, or...). I'm trying to investigate, but I can't say for sure as of now.

@ewjoachim
Copy link
Contributor

Hm, the definition of the metadata value seems in PEP-0345 et. al. seems to indicate that this should be supported. I can't find the PEP that defines the upload format but I think you're right.

@ewjoachim
Copy link
Contributor

ewjoachim commented Apr 20, 2021

I've tried looking at what it would mean on the code side, I should have known, really, but the author/author-email situation is a mess and the hole thing is probably a can of worms :D

I can make it so that the base case of PEP 621 is handled, but there's quite a few examples for which I have no idea what should be returned.

Author = A B, Author-Email = C D <e@f.gh>
Author = A B <a@b.cd>, C D <c@d.ef>, No Author-Email
Author = A B <a@b.cd>, Author-Email = E F <e@f.gh>

This is what we do today:

  {% if release.author_email %}
    <p><strong>{% trans %}Author:{% endtrans %}</strong> <a href="mailto:{{ release.author_email }}">{{ release.author or release.author_email }}</a></p>
  {% elif release.author %}
    <p><strong>{% trans %}Author:{% endtrans %}</strong> {{ release.author }}</p>
  {% endif %}

This is what the PEP says:

Author (optional):  A string containing the author's name at a minimum; additional contact information may be provided.

Example:
Author: C. Schultz, Universal Features Syndicate,
        Los Angeles, CA <cschultz@peanuts.example.com>

Author-email (optional): A string containing the author's e-mail address. It can contain a name and e-mail address in the legal forms for a RFC-822 From: header.

Example:

Author-email: "C. Schultz" <cschultz@example.com>

It's hinted here and there that multiple comma-separated authors are fine in the Author field.

This really makes me want to no try and parse anything smarter than the bare bare minimum :D

@di
Copy link
Member

di commented Apr 21, 2021

I think the PEP is wrong. If both Author and Author-Email are provided, it's much simpler to just keep them as two fields, otherwise our existing logic needs to become a lot more complex:

https://github.com/pypa/warehouse/blob/7fc3ce5bd7ecc93ef54c1652787fb5e7757fe6f2/warehouse/templates/includes/packaging/project-data.html#L78-L82

@brettcannon I think you might have written this? Any thoughts here?

@di di added the needs discussion a product management/policy issue maintainers and users should discuss label Apr 21, 2021
@ewjoachim
Copy link
Contributor

Rereading the whole thing I guess we could do the following:

  • If only Author-Email is defined, parse as RFC-822 From: header
    • If it works, we have our name and address
    • If it fails, display as-is
  • If only Author is defined, display as-is. Even if there might be email addresses in there, too bad
  • If both are defined:
    • If Author-Email parses as RFC-822 From: header, concatenate the Author field and the "name" part of the Author-Email field
    • If it doesn't parse, display as 2 distinct field

Would that work ?

@di
Copy link
Member

di commented Apr 22, 2021

It would probably work, but why should we be mangling two separate fields into one just to have to un-mangle it somewhere else? I don't see any advantage to it, and think it would be simpler to just change the PEP and the few tools (single tool?) that have already implemented it instead.

@ewjoachim
Copy link
Contributor

Ah, but then we'll never be able to have proper mailto: links ? I think I haven't understand what you'd want to do.

@di
Copy link
Member

di commented Apr 22, 2021

I'm not sure I follow, we have proper mailto: links now for non-PEP 521 metadata.

@brettcannon
Copy link
Contributor

@brettcannon I think you might have written this? Any thoughts here?

If you mean what's in the metadata spec, that's how it's always been, i.e. I didn't do it 😉 . PEP 621 just went with what was there and purposefully didn't touch the metadata spec (I tried to clean it up and got push-back from trying to do too much).

As for why PEP 621 uses Author-Email to its fullest extent based on the spec definition, I believe it was to avoid having to try and correlate Author and Author-Email when they were comma-separate fields since the data is inherently tied together.

@ewjoachim
Copy link
Contributor

ewjoachim commented Apr 22, 2021

I'm not sure the original metadata spec allows multiple comma-separated values to be in Author-Email. It says RFC-822 From: header, so I believe only a single email address should be sent. In Author, though, multiple values can be sent, it's free-form.

@brettcannon
Copy link
Contributor

I'm not sure the original metadata spec allows multiple comma-separated values to be in Author-Email.

It does. From https://packaging.python.org/specifications/core-metadata/#author-email:

A string containing the author’s e-mail address. It can contain a name and e-mail address in the legal forms for a RFC-822 From: header.

Example:

Author-email: "C. Schultz" <cschultz@example.com>

Per RFC-822, this field may contain multiple comma-separated e-mail addresses:

Author-email: cschultz@example.com, snoopy@peanuts.com

So my reading of "a string containing an author's emails address ... can contain a name and e-mail address" combined with "this field may contain multiple comma-separate e-mail addresses" is what led me to do what I did for PEP 621.

To be clear, I personally don't care if a change is made in regards to this; I'm not trying to specifically defend how PEP 621 does things as how things should continue to be done; I'm just trying to explain the logic of how it ended up the way it did. But it seems any change will require an update to the metadata spec and PEP 621 if you want to restrict what's valid for the author- and maintainer-related metadata fields.

@ewjoachim
Copy link
Contributor

That was the missing piece of the puzzle to me.
I was looking at the PEP text, where I should have been looking at the packaging doc.
The part on multiple email addresses was added by @di 3 years ago following an update of Warehouse where corresponding processing was added.

@ewjoachim
Copy link
Contributor

There is already one moment during release submission where we have assigned a variable containing the "name" part of the multi-email RFC-822 encoded string. So without a lot of additional complexity, just assigning this to the "Author" field of the release in case it's not already filled would probably be enough.

@ofek
Copy link
Contributor

ofek commented Oct 2, 2022

Any update on this?

@lwasser
Copy link
Contributor

lwasser commented Jan 2, 2023

hi all - just a note that i'm having this issue too with test pipy for my package stravalib and i also see the same issue with sourmash on pypi. i don't think its the build back end in this case.

my META from my wheel thanks to @pradyunsg for telling me how to check this is:

Maintainer: Jonatan Smoocha, Yihong
Maintainer-email: Leah Wasser <leah@pyopensci.org>, Hans Lellelid <hans@xmpl.org>

and on pypi i see
Screen Shot 2023-01-02 at 10 55 39 AM

it seems like it's being parsed incorrect by pypi ?? many thanks for your work on pypi btw!

@pradyunsg
Copy link
Contributor

FWIW, the TOML in pyproject.toml relevant to the above was (built with setuptools):

maintainers = [
     {name = "Leah Wasser", email = "leah@pyopensci.org"},
     {name = "Hans Lellelid", email = "hans@xmpl.org"},
     {name = "Jonatan Smoocha"},
     {name = "Yihong"},
]

x-ref stravalib/stravalib#304

@lwasser
Copy link
Contributor

lwasser commented Jan 2, 2023

oh yes - i'll reference this issue in my pr as well. for now i've removed emails.

@di
Copy link
Member

di commented Jan 2, 2023

So do we think the conversion from TOML -> metadata wrong, or is PyPI's interpretation of the metadata wrong? What were you expecting to happen here?

@pradyunsg
Copy link
Contributor

From https://packaging.python.org/en/latest/specifications/declaring-project-metadata/#authors-maintainers:

Using the data to fill in core metadata is as follows:

  1. If only name is provided, the value goes in Author or Maintainer as appropriate.

  2. If only email is provided, the value goes in Author-email or Maintainer-email as appropriate.

  3. If both email and name are provided, the value goes in Author-email or Maintainer-email as appropriate, with the format {name} <{email}>.

  4. Multiple values should be separated by commas.

I think it's on PyPI's end -- wherein it's only presenting the Maintainer key with Maintainer-Email as the link, even if the latter contains names and doesn't match the Maintainer key.

I think the pyproject.toml's author/maintainer -> METADATA mapping (as it stands) operates on the assumption that both the "{type}" and "{type}-email" would be used/presented; whereas PyPI tries to present only one entry (Author / Maintainer) and tries to use the "{type}-email" as a link for "{type}" if they're both present.

@pradyunsg
Copy link
Contributor

pradyunsg commented Jan 2, 2023

What were you expecting to happen here?

That's an excellent question -- I'd like to ask @lwasser to provide her thoughts on this. How would you have expect PyPI to present the information you added to pyproject.toml? :)

maintainers = [
     {name = "Leah Wasser", email = "leah@pyopensci.org"},
     {name = "Hans Lellelid", email = "hans@xmpl.org"},
     {name = "Jonatan Smoocha"},
     {name = "Yihong"},
]

One approach that I can think of is to not provide a single link to write an email to all authors/maintainers, and to instead split the keys on , and present them names individually (with those that have emails being linked to, on a per-person basis). For backwards-compat, we could keep the current linking behaviour (of Author w/ Author-Email as mailto:) if there's a single email with no name and a single name.

@di
Copy link
Member

di commented Jan 2, 2023

From https://packaging.python.org/en/latest/specifications/declaring-project-metadata/#authors-maintainers:

Given that maintainers rarely follow that guidance 😉, I think we still need to maintain some backwards compatibility with the expectation that Author/Maintainer is a string, Author-Email and Maintainer-Email is an email, and together they become a link.

@pradyunsg
Copy link
Contributor

Hence the suggestion of keeping the current behaviour when there's only one email + one name. 😉

@lwasser
Copy link
Contributor

lwasser commented Jan 2, 2023

absolutely @di @pradyunsg
My understanding of how this works is that (id expect authors to operate the same!)

in my table here:

maintainers = [
     {name = "Name One", email = "nameone@email.org"},
     {name = "Name Two", email = "nametwo@email.org"},
     {name = "Name Three"},
     {name = "Name Four"},
]

i'm specifying 4 maintainers. Thus on pypi, it would render as follows

<a href="mailto:nameone@email.org">Name One</a>, <a href="mailto:nametwo@email.org">Name Two</a>, Name Three, Name Four

But instead it seems to do this:

<a href="mailto:name one <nameone@email.org>, name one <nameone@email.org>">Name Three</a>, <a href="name one <nameone@email.org>, name one <nameone@email.org>">Name Four</a>

I guess i would expect it to

  1. first list the maintainers in the order that they appear in the pyproject.toml and
  2. add the email link just to the items with an email?

@di
Copy link
Member

di commented Jan 2, 2023

Hence the suggestion of keeping the current behaviour when there's only one email + one name. 😉

Sorry, missed this in the edit I think. So what should happen with:

Author: Google, Inc.
Author-email: something@google.com

I don't think that suggestion maps well onto maintaining existing behavior.

@ofek
Copy link
Contributor

ofek commented Jan 2, 2023

So what should happen with:

Author: Google, Inc.
Author-email: something@google.com

Just fyi if those are in the same entry/table then that wouldn't occur per PEP 621 pypa/packaging.python.org#1134 (comment)

@lwasser
Copy link
Contributor

lwasser commented Jan 2, 2023

If you are parsing 2 entries represented like this (i'm using setuptools to bld):

maintainers = [
     {name= "Google human"},
     {email = "another-human@email.com"},
    ]

you get this (2 unique humans are maintainers:

Maintainer: Google human
Maintainer-email: another-human@email.com

if you do this:

maintainers = [
     {name = "Google human",  email = "google-human@email.com"},
     {email = "another-human@email.com"},
    ]

you get this:

Maintainer-email: Google human <google-human@email.com>, another-human@email.com

Two name + email, one name only, one email only

maintainers = [
     {name= "Google human", email = "google-human@email.com"},
     {name = "Hans Lellelid", email = "test@test.org"},
     {name = "Human three"},
     {email = "another-human@email.com"},
    ]

Results in this:

Maintainer: Human three
Maintainer-email: Google human <google-human@email.com>, Hans Lellelid <test@test.org>, another-human@email.com

I suspect two things are happening:

If you have

  1. Two maintainers with associated emails two emails (example - sour mash - the HTML output looks like this where the entire string for both maintainers is turned into a mailto: link. Here i'd expect pypi to parse each name as a unique name and each email associated in htat element in the list of maintainers to be associated with the unique name.
<p><strong>Maintainer:</strong> <a href="mailto:Luiz Irber <luiz@sourmash.bio>, &quot;C. Titus Brown&quot; <titus@idyll.org>">Luiz Irber &lt;luiz@sourmash.bio&gt;, "C. Titus Brown" &lt;titus@idyll.org&gt;</a></p>
  1. If you have multiple maintainers and some have email others don't like this:
maintainers = [
     {name = "Leah Wasser", email = "testemail@testemail.org"},
     {name = "Hans Lellelid", email = "hans@test.org"},
     {name = "Jonatan Samoocha"},
     {name = "Yihong"},
    ]

You end up with a pypi entry like this:
Notice - that. here two of the maintainers are not listed. and BOTH have an email link that is a mixture of email and maintainer names similar to what you see with sourmash. i just fixed this by removing emails altogether and now test pypi just lists all 4 of our names.

test-pypi

I hope that is helpful. it just seems to me that things are being parsed differently depending on what combination of information is provided.

@matthewfeickert
Copy link

Coming from Issue #12877 (sorry for the duplicate Issue):


Paste of Issue 12877 content if useful for quick reference:

👋 Hi. Our project pyhf just switched (c.f. scikit-hep/pyhf#2095) from having our PyPI metadata in setup.cfg to pyproject.toml. In doing so, we also changed from having our author metadata for the 3 authors be across author and author_email to having it be contained in authors following PEP 621's requirements of

These fields accept an array of tables with 2 keys: name and email. Both values must be strings. The name value MUST be a valid email name (i.e. whatever can be put as a name, before an email, in RFC 822) and not contain commas. The email value MUST be a valid email address. Both keys are optional.

pip is recognizing all the metadata as we would expect

$ python -m pip show pyhf
Name: pyhf
Version: 0.7.1.dev43
Summary: pure-Python HistFactory implementation with tensors and autodiff
Home-page: 
Author: 
Author-email: Lukas Heinrich <lukas.heinrich@cern.ch>, Matthew Feickert <matthew.feickert@cern.ch>, Giordon Stark <gstark@cern.ch>
License: Apache-2.0
Location: /home/feickert/.pyenv/versions/3.10.6/envs/pyhf-dev-CPU/lib/python3.10/site-packages
Requires: click, jsonpatch, jsonschema, numpy, pyyaml, scipy, tqdm
Required-by:

However, when we published this to TestPyPI to check how things looked after switching over we noticed that TestPyPI is displaying only the first author and linking their email

testPyPI

Previously when we shoved all our names and emails into author and author_email we could at least have all our names be displayed (no surprise there as we were abusing the field)

PyPI-0 7 0

I assume that this behavior with authors is because warehouse uses only the core metadata here (?) following PEP 621's instructions of:

Using the data to fill in core metadata is as follows:

  1. If only name is provided, the value goes in Author/Maintainer as appropriate.
  2. If only email is provided, the value goes in Author-email/Maintainer-email as appropriate.
  3. If both email and name are provided, the value goes in Author-email/Maintainer-email as appropriate, with the format {name} <{email}> (with appropriate quoting, e.g. using email.headerregistry.Address).
  4. Multiple values should be separated by commas.

Would it be possible for warehouse to display all authors information if it exists? Or is that something that is outside the scope of how warehouse interacts with metadata?

Describe the solution you'd like

Have warehouse be able to parse the existence of PEP 621 authors and display all names and associated emails of authors on the package webpage.


We (pyhf) are seeing a similar problem with our authors and maintainers fields in our PEP 621 compliant pyproject.toml.

Metadata from relevant wheel

$ python -m pip download --index-url https://test.pypi.org/simple/ --no-deps 'pyhf==0.7.1.dev35'
$ unzip pyhf-0.7.1.dev35-py3-none-any.whl
$ head -n 12 pyhf-0.7.1.dev35.dist-info/METADATA
Metadata-Version: 2.1
Name: pyhf
Version: 0.7.1.dev35
Summary: pure-Python HistFactory implementation with tensors and autodiff
Project-URL: Documentation, https://pyhf.readthedocs.io/
Project-URL: Homepage, https://github.com/scikit-hep/pyhf
Project-URL: Issue Tracker, https://github.com/scikit-hep/pyhf/issues
Project-URL: Release Notes, https://pyhf.readthedocs.io/en/stable/release-notes.html
Project-URL: Source Code, https://github.com/scikit-hep/pyhf
Author-email: Lukas Heinrich <lukas.heinrich@cern.ch>, Matthew Feickert <matthew.feickert@cern.ch>, Giordon Stark <gstark@cern.ch>
Maintainer-email: The Scikit-HEP admins <scikit-hep-admins@googlegroups.com>
License: Apache-2.0

Authors

Our authors field is

authors = [
    { name = "Lukas Heinrich", email = "lukas.heinrich@cern.ch" },
    { name = "Matthew Feickert", email = "matthew.feickert@cern.ch" },
    { name = "Giordon Stark", email = "gstark@cern.ch" },
]

and pip is recognizing all the metadata as we would expect

$ python -m pip show pyhf
Name: pyhf
Version: 0.7.1.dev43
Summary: pure-Python HistFactory implementation with tensors and autodiff
Home-page: 
Author: 
Author-email: Lukas Heinrich <lukas.heinrich@cern.ch>, Matthew Feickert <matthew.feickert@cern.ch>, Giordon Stark <gstark@cern.ch>
License: Apache-2.0
Location: /home/feickert/.pyenv/versions/3.10.6/envs/pyhf-dev-CPU/lib/python3.10/site-packages
Requires: click, jsonpatch, jsonschema, numpy, pyyaml, scipy, tqdm
Required-by:

though for our render check upload to TestPyPI we noticed that TestPyPI is displaying only the first author and linking their email

testPyPI

with the generated HTML of

<p><strong>Author:</strong> <a href="mailto:lukas.heinrich@cern.ch">Lukas Heinrich</a></p>

Expectation / Desired Result

Have all of the authors have their name and emails be listed in a comma separated list according to the order they appear in the wheel metadata

$ grep "Author-email" pyhf-0.7.1.dev35.dist-info/METADATA 
Author-email: Lukas Heinrich <lukas.heinrich@cern.ch>, Matthew Feickert <matthew.feickert@cern.ch>, Giordon Stark <gstark@cern.ch>

with generated html of

<p><strong>Author:</strong> <a href="mailto:lukas.heinrich@cern.ch">Lukas Heinrich</a>, <a href="mailto:matthew.feickert@cern.ch">Matthew Feickert</a>, <a href="mailto:gstark@cern.ch">Giordon Stark</a></p>

Maintainers

Our maintainers field is

maintainers = [ {name = "The Scikit-HEP admins", email = "scikit-hep-admins@googlegroups.com"} ]

and the TestPyPI render is

TestPyPI-maintainer

with the generated HTML of

<p><strong>Maintainer:</strong> <a href="mailto:The Scikit-HEP admins &lt;scikit-hep-admins@googlegroups.com&gt;">The Scikit-HEP admins &lt;scikit-hep-admins@googlegroups.com&gt;</a></p>

Expectation / Desired Result

Have the maintainer name match the metadata of the wheel

$ grep "Maintainer-email" pyhf-0.7.1.dev35.dist-info/METADATA 
Maintainer-email: The Scikit-HEP admins <scikit-hep-admins@googlegroups.com>

and be a hyperlink to the mailto

<p><strong>Maintainer:</strong> <a href="mailto:scikit-hep-admins@googlegroups.com">The Scikit-HEP admins</a></p>

AustinT added a commit to dockstring/dockstring that referenced this issue Sep 2, 2023
Contains a few fixes/changes related to PyPI:

1. In the project README, mention the installation method from PyPI (`pip install dockstring`)

2. Fix authors list. First, I added Miguel's name without the accents. Second, on PyPI the release listed
  Gregor as the author but gave Miguel's email. It looks like this is a
  [known issue](pypi/warehouse#9400).
  To fix this I removed our emails. People can figure out how to contact us if they really need it.

3. Added missing entries in the CHANGELOG.

4. Added some extra metadata to pyproject.toml, including the github homepage.
@tobinus
Copy link

tobinus commented Sep 28, 2023

I encountered this bug today. We define four authors, where we don't have an email address for one of them. Pypi.org decided to only show one of them, specifically the author without an email address, and used the email address of a different author as the mailto:-link 😲

It seems to me like the core metadata specification is incompatible with the degree of freedom that PEP 621 promises.

For instance, how would you separate the following two cases? (click to expand)
# A PEP 621 project
[project]
# ...
authors = [
 { name = "Alice" },
 { email = "bob@example.com"},
]

which would become:

Author: Alice
Author-email: bob@example.com

and

# A "classic" project
setup(
  # ...
  author="Bob Bobbity",
  author_email="bob@example.com",
)

which would become:

Author: Bob Bobbity
Author-email: bob@example.com

In the first case, you would expect the name in Author to be listed separately from the email in Author-email, meanwhile you would want the name in the second case to be combined with the email in Author-email. But there is no way to tell the two cases apart based on the core metadata alone.


The gap between PEP 621 and the core metadata specification can be closed in two ways:

  • Restrict the freedom of the authors field in PEP 621 (bringing it in line with the core metadata spec)
  • Address the shortcomings of the core metadata specification (bringing it in line with PEP 621) and update the mapping from PEP 621 to core metadata
Some thoughts on how you could add new fields Authors and Maintainers to core metadata to support the data model of PEP 621

EDIT 2: I no longer think this is the best solution.

Possible solutions

If I were to come up with a "dream" solution, I would try to expand the core metadata specification with new fields, Authors and Maintainers. Note that they are plural, while the existing fields are singular. They would work exactly like Author-email and Maintainer-email, except you would be permitted to specify a name with no email address by using the same form as an email address with a name, but with the email address specified as an empty string. For instance: Alice <>.

To keep backwards compatibility with tools that don't know about the new fields, I would keep the algorithm described in PEP 621. However, tools that do know about the new fields should always disregard the old fields (Author and Author-email, or Maintainer and Maintainer-email) if the corresponding new field is present (Authors, or Maintainers). So the information in the authors and maintainers fields of pyproject.toml would be repeated twice in the core metadata: Once in the new field, and once in one of the old ones.

Here's what the example in PEP 621 would look like (click to expand)

The following definition in pyproject.toml:

[project]
authors = [
  {name = "Pradyun Gedam", email = "pradyun@example.com"},
  {name = "Tzu-Ping Chung", email = "tzu-ping@example.com"},
  {name = "Another person"},
  {email = "different.person@example.com"},
]
maintainers = [
  {name = "Brett Cannon", email = "brett@python.org"}
]

would be converted to the following core metadata:

Authors: "Pradyun Gedam" <pradyun@example.com>, "Tzu-Ping Chung" <tzu-ping@example.com>, "Another person" <>, different.person@example.com
Maintainers: "Brett Cannon" <brett@python.org>

# For backwards compatibility
Author: Another person
Author-email: "Pradyun Gedam" <pradyun@example.com>, "Tzu-Ping Chung" <tzu-ping@example.com>, different.person@example.com
Maintainer-email: "Brett Cannon" <brett@python.org>

I see that Maintainers was redundant in this example, since there is no confusion with Maintainer-email when everyone has an email address. So it's possible that the new field should only be used when there is at least one author/maintainer without an email address? But there's something to be said about being consistent.


The advantage of this approach is that you get the freedom to mix between authors with only a name, only an email, and both a name and an email address, in a way that is straight-forward to parse on the other end.

The downsides of this approach are that the new fields are easy to confuse with the old ones (since there's only a trailing s separating the two), and that information is repeated twice in the core metadata.

Alternatively, you could modify the definition of Author-email and Maintainer-email so that they may accept authors/maintainers without an email address, and use them for every author and maintainer when converting from PEP 621 (leaving out Author and Maintainer). But it feels a bit silly to put authors and maintainers without an email address inside Author-email or Maintainer-email. And tools out there may crash or behave weird if they were served Author-email: Alice <>?


EDIT: Put the "possible solutions" behind an accordion

EDIT 2: I no longer think the solution above would be the best one, there are simpler solutions.

@dstufft
Copy link
Member

dstufft commented Sep 28, 2023

I don't have a particular solution other than I think it would be great for someone to write a PEP that made this bit of metadata better :) There was even a recent thread on discuss.python.org where someone else had a related issue.

@tobinus
Copy link

tobinus commented Sep 28, 2023

I just realised that this GitHub issue should probably be split into multiple ones.

The original issue description from @domdfcoding, and the use case from @matthewfeickert, are about the case where only Author-email (and/or Maintainer-email) is supplied. There is no confusion about what name goes with what email address in that case. According to the issue reporters, Pypi.org does not handle this properly. I would think it is possible to fix this so that all listed authors or maintainers are shown, using their names as the label and falling back to displaying their email address when no name is given. This would only require changes in warehouse.

The case where you are specifying multiple authors and mixing between Author and Author-email would be left unsupported and broken by design – just like today, in other words. If we wish to guide users towards the supported use case, we can add some guidance to the description in PEP 621 so that it recommends either including an email address for every author, or including no email addresses at all.


The issue of supporting a mix between email and non-email authors should be a different issue, I think. It would include the use cases reported by @lwasser and @pradyunsg, and me, (EDIT: and backwards compatibility with the existing usage brought up by @di) and would likely involve changes to the core metadata spec, PEP 621, warehouse and the build module.

I imagine this would take a while, so it makes sense to fix the simpler issue first and handle this more complex issue separately.

@tobinus
Copy link

tobinus commented Sep 29, 2023

Solving the complex case by always including names in Author-email

Preferably, those thoughts should go in a new issue which is separate from this, per my previous comment. But I'm putting them here for the time being.

Warehouse should support multiple authors, as described in PEP 621. This proposed solution involves a small change to the algorithm used to convert pyproject.toml into core metadata. Additionally, an algorithm for parsing the core metadata back into multiple authors and maintainers should be added to the core metadata specification. Warehouse should be updated to use this new algorithm.

The following discussion of Author and Author-email also applies to Maintainer and Maintainer-email.

How do we know whether the author in Author is the same person or a different person from the email address in Author-email? The idea is to ensure that the core metadata produced by the algorithm in PEP 621 can be consistently detected as such, and handled by using the same algorithm backwards (but with the caveat that the Author field is unstructured).

The change made to PEP 621 is to always include a name for email addresses in Author-email. If no real name was provided, the email address should be repeated as the name. So authors = [{email = "hi@example.com"}] should result in Author-email: "hi@example.com" <hi@example.com>.

The PEP 621 example

The following pyproject.toml:

[project]
authors = [
  {name = "Pradyun Gedam", email = "pradyun@example.com"},
  {name = "Tzu-Ping Chung", email = "tzu-ping@example.com"},
  {name = "Another person"},
  {email = "different.person@example.com"},
]
maintainers = [
  {name = "Brett Cannon", email = "brett@python.org"}
]

would produce the following core metadata:

Author: Another person
Author-email: "Pradyun Gedam" <pradyun@example.com>, "Tzu-Ping Chung" <tzu-ping@example.com>, "different.person@example.com" <different.person@example.com>

Maintainer-email: "Brett Cannon" <brett@python.org>

Consumers of core metadata, such as Warehouse, should distinguish between two cases:

  1. The combined case: When both Author and Author-email are provided, and there is only one email address in Author-email, and it has no name. This should be handled like today, with Author being used as the label and Author-email being used as the mailto: target.
  2. The separated case: When there are multiple email addresses, or there is only one and it has a name, or only one of Author and Author-email is provided.
    • The value of the Author field, if present, should be assumed to contain information about authors that don't have any email address, but tools should not make any assumptions about its internal structure. So it should be displayed as it is written, but without linking anywhere.
    • The value of the Author-email field, if present, should be parsed into an additional list of authors where every author has an email address and a label/name. If no name was provided, the email address should be used as the label.
    • If both fields are present, their display should be joined with a comma and a space (, )
Examples of the combined case

Example of an author with an email address:

Author: John Doe
Author-email: john.doe@example.com

should be displayed as:

Author: John Doe

Example from the core metadata specification, but with an email address added:

Author: C. Schultz, Universal Features Syndicate, Los Angeles, CA <cschultz@peanuts.example.com>
Author-email: cschultz@peanuts.example.com

should be displayed as:

Author: C. Schultz, Universal Features Syndicate, Los Angeles, CA <cschultz@peanuts.example.com>


Examples of the separated case

Example with two authors, where one has only a name and another has only an email address:

Author: John Doe
Author-email: jane.doe@example.com <jane.doe@example.com>

should be displayed as:

Author: John Doe, jane.doe@example.com

Example of Author field from the core metadata specification:

Author: C. Schultz, Universal Features Syndicate, Los Angeles, CA <cschultz@peanuts.example.com>

should be displayed as:

Author: C. Schultz, Universal Features Syndicate, Los Angeles, CA <cschultz@peanuts.example.com>
Note: GitHub adds an automatic link above. The email address embedded in Author doesn't necessarily have to be a link.

Example from PEP 621 (as converted to core metadata in the example above):

Author: Another person
Author-email: "Pradyun Gedam" <pradyun@example.com>, "Tzu-Ping Chung" <tzu-ping@example.com>, "different.person@example.com" <different.person@example.com>

Maintainer-email: "Brett Cannon" <brett@python.org>

should be displayed as:

Author: Another person, Pradyun Gedam, Tzu-Ping Chung, different.person@example.com
Maintainer: Brett Cannon


Advantages to this approach include:

  • Compatibility with packages written using the existing rules and Warehouse's current behaviour
  • Compatibility with multiple authors specified using the format in PEP 621
  • No changes to the fields of PEP 621 or the core metadata specification. The changes are limited to clarifications of how the existing fields and their capabilities can be used to ensure unambiguous parsing by tools such as Warehouse
  • Users of setup.py may also follow the same rules to get the same effect as users of PEP 621

Limitations of this approach include:

  • Authors may not be listed using line breaks or bullet points, since we are unable to make any assumptions about the internal format of the Author field. Do the commas separate different authors, or are they used to separate the author name from their street address or organisation?
    • We could perhaps say that commas in the Author field should be assumed to separate different authors in the separated case only?
  • Changes must be made to the build module, and users must update it to get consistently working results
    • That said, many cases will work out of the box. The only case that doesn't work with the current implementation is the one where you combine authors with only a name and a single author with only an email address.
  • The display logic in Warehouse must be expanded. But this is inevitable if multiple authors should be supported

laurence-myers added a commit to Malvineous/pyopl that referenced this issue Jan 25, 2024
- Use SPDX license identifier. (Otherwise, PyPI tries to list the license file body)
- Change license classifier to specifically reference GPLv3 (only).
- Add an email for Adam Biser in `authors`, to avoid bugs in PyPI using a mix of authors with/without email.

@see pypi/warehouse#9400
laurence-myers added a commit to Malvineous/pyopl that referenced this issue Jan 25, 2024
- Use SPDX license identifier. (Otherwise, PyPI tries to list the license file body)
- Change license classifier to specifically reference GPLv3 (only).
- Add an email for Adam Biser in `authors`, to avoid bugs in PyPI using a mix of authors with/without email.

@see pypi/warehouse#9400
laurence-myers added a commit to Malvineous/pyopl that referenced this issue Jan 25, 2024
- Use SPDX license identifier. (Otherwise, PyPI tries to list the license file body)
- Change license classifier to specifically reference GPLv3 (only).
- Add an email for Adam Biser in `authors`, to avoid bugs in PyPI using a mix of authors with/without email.

@see pypi/warehouse#9400
laurence-myers added a commit to Malvineous/pyopl that referenced this issue Jan 25, 2024
- Use SPDX license identifier. (Otherwise, PyPI tries to list the license file body)
- Change license classifier to specifically reference GPLv3 (only).
- Add an email for Adam Biser in `authors`, to avoid bugs in PyPI using a mix of authors with/without email.

@see pypi/warehouse#9400
laurence-myers added a commit to Malvineous/pyopl that referenced this issue Jan 25, 2024
- Use SPDX license identifier. (Otherwise, PyPI tries to list the license file body)
- Change license classifier to specifically reference GPLv3 (only).
- Add an email for Adam Biser in `authors`, to avoid bugs in PyPI using a mix of authors with/without email.

@see pypi/warehouse#9400
laurence-myers added a commit to Malvineous/pyopl that referenced this issue Jan 25, 2024
- Use SPDX license identifier. (Otherwise, PyPI tries to list the license file body)
- Change license classifier to specifically reference GPLv3 (only).
- Add an email for Adam Biser in `authors`, to avoid bugs in PyPI using a mix of authors with/without email.

@see pypi/warehouse#9400
ubernostrum added a commit to ubernostrum/akismet that referenced this issue May 12, 2024
PyPI can misattribute email addresses when there are multiple authors
but not all of them have emails listed.

pypi/warehouse#9400 (comment)
dgw added a commit to sopel-irc/sopel that referenced this issue May 27, 2024
Since pypi/warehouse#9400 and pypi/warehouse#14813 (perhaps others too)
remain unresolved, let's keep our metadata simple. Incomplete metadata—
i.e. the email addresses this commit removes—is the only thing worse
than straight-up *incorrect* metadata. We don't want any author or
maintainer names to be attached to the wrong email addresses on the
published package page.
michele-riva added a commit to viperleed/viperleed that referenced this issue Jul 8, 2024
Seems that having multiple authors with emails is not supported correctly on PyPI: only the first author is rendered, and it may even have the wrong email address. See pypi/warehouse#9400

Remove it also from the maintainers just to be sure not to get the wrong assignment.

Better have fewer metadata that wrong ones...
jdeschamps added a commit to CAREamics/careamics that referenced this issue Aug 23, 2024
### Description

There are problems between TOML -> PyPi metadata (e.g.
pypi/warehouse#9400), so in our case only the
first author in the author list is shown, instead of all the authors...

Therefore, in this PR, I add the CAREamics team as first in the list
(with the rse email), and then alphabetically the people who have
contributed code.

I also update the list to account for the recent LVAE push!
@0cjs
Copy link

0cjs commented Aug 29, 2024

I just realised that this GitHub issue should probably be split into multiple ones....
I imagine this would take a while, so it makes sense to fix the simpler issue first and handle this more complex issue separately.

This seems to make sense to me; my only concern at the moment is that PyPI is displaying only the first author, rather than all the authors, in the simple case of having only an Author-email line looking like this:

Author-email: "Curt J. Sampson" <cjs@cynic.net>, Nishant Rodrigues <nishantjr@gmail.com>

This is exactly what issue #12877 is about, but that's been closed in favour of this issue. So does it make sense to re-open that one to cover just the "doesn't display all authors in Author-email line," then? That seems like something that could be fixed without getting into many of these other issues. (And this issue doesn't seem to be coming towards any kind of resolution or fix any time soon.)

@MikeBusuttil
Copy link

Agreed @0cjs , re-open #12877

@mdimopoulos
Copy link

Will #12877 be re-opened or has the issue been declared as a non-issue and abandoned completely?

@di
Copy link
Member

di commented Jan 15, 2025

I re-opened it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs discussion a product management/policy issue maintainers and users should discuss
Projects
None yet
Development

No branches or pull requests