Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing of author information #1040

Closed
wants to merge 5 commits into from

Conversation

yggi49
Copy link
Contributor

@yggi49 yggi49 commented Apr 16, 2019

Instead of relying on regular expressions, this patch leverages Python’s
builtin email.utils.parseaddr() functionality to parse an RFC-822-compliant
email address string into its name and address parts.

This should also resolve issues with special characters in the name part; see
issues #370 and #798.

Pull Request Check List

  • Added tests for changed code.
  • Updated documentation for changed code.

Instead of relying on regular expressions, this patch leverages Python’s
builtin `email.utils.parseaddr()` functionality to parse an RFC-822-compliant
email address string into its name and address parts.

This should also resolve issues with special characters in the name part; see
issues python-poetry#370 and python-poetry#798.

python-poetry#370
python-poetry#798
Copy link

@drunkwcodes drunkwcodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is something I want to do but I was still keeping finding an elegant solution.
And you just made it.

poetry/utils/helpers.py Show resolved Hide resolved
tests/utils/test_helpers.py Show resolved Hide resolved
Copy link

@drunkwcodes drunkwcodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thread is beyond the poetry doc thoroughly.

tests/utils/test_helpers.py Show resolved Hide resolved
poetry/utils/helpers.py Show resolved Hide resolved
Copy link

@drunkwcodes drunkwcodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I messed with re a while and found another solution.
It handles Non-RFC-conform cases as well.

# Non-RFC-conform cases with unquoted commas
name, email = parse_author("asf,dfu@t.b")
assert name == "asf"
assert email is None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert email is None
assert email == "dfu@t.b"


name, email = parse_author("asf,<dfu@t.b>")
assert name == "asf"
assert email is None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert email is None
assert email == "dfu@t.b"


name, email = parse_author("asf, dfu@t.b")
assert name == "asf"
assert email is None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert email is None
assert email == "dfu@t.b"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now @ID is supported. IPython somewhat needs '@' completion.

In [5]: parse_author("@asf")
Out[5]: ("@asf", None)

In [6]: parse_author("@asf <asf@t.b>")
Out[6]: ("@asf", "asf@t.b")

In [7]: parse_author("@asf,<asf@t.b>")
Out[7]: ("@asf", "asf@t.b")

In [8]: parse_author("@asf,asf@t.b")
Out[8]: ("@asf", "asf@t.b")

In [9]: parse_author("@asf, asf@t.b")
Out[9]: ("@asf", "asf@t.b")

In [10]: parse_author("@asf,")
Out[10]: ("@asf", None)

In [11]: parse_author("@asf <>")
Out[11]: ("@asf", None)

@drunkwcodes
Copy link

Here is the latest revision in my gist with runnable tests including in @ID, multi-commas, white-spaces cases and empty email cases:

https://gist.github.com/drunkwcodes/d2551bc41f54a1120434ad8f24dc49af

@yggi49
Copy link
Contributor Author

yggi49 commented Jun 8, 2019

Is it really necessary to attempt to make something out of even the most obscure construed use cases? For example, I don’t quite get why

parse_author("@asf,asf@t.b")

should result in

("@asf", "asf@t.b")

To me, that seems like a totally arbitrary, non-obvious choice.

—First and foremost, this PR attempts to resolve (in a simple way) the two issues that are linked at the top, which it does. Everything beyond it might be something for a separate feature request.

@drunkwcodes
Copy link

drunkwcodes commented Jun 8, 2019

@yggi49 The cases need to be polished of course.

The problem was the unclear exception. But I want to eliminate it.
And I think it is obvious if there isn't any comma (or certain ascii code punctuations) in the email address.

After all, the change will enable arbitrary input in pyproject.toml author field without any failure and the information will be reserved even when the function failed to parse.

@yggi49
Copy link
Contributor Author

yggi49 commented Jun 14, 2019

After all, the change will enable arbitrary input in pyproject.toml author field without any failure and the information will be reserved even when the function failed to parse.

I beg to disagree. Just as licenses are validated, I think it makes totally sense to validate the “author” field’s input as well, and make users aware and actively inform/educate them if their input does not correspond to a given, very common and widespread format.

That said, this PR is still lacking a second reviewer—maybe there is a third opinion here?

@drunkwcodes
Copy link

drunkwcodes commented Jun 14, 2019

@yggi49 The field may be referred to Author-email. https://www.python.org/dev/peps/pep-0345/#author-optional

The author field doesn't have the constraint by itself. And both fields are optional.
The required email field is a high demand for some users.

Maybe we can put the spell checking in poetry init, letting those pyproject.toml crafted by hands pass if the Author field has minimun information at least?

@stale
Copy link

stale bot commented Nov 13, 2019

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Nov 13, 2019
@sdispater sdispater added stale and removed wontfix labels Nov 14, 2019
@stale stale bot removed the stale label Nov 14, 2019
@neersighted neersighted self-assigned this Nov 11, 2021
@Secrus
Copy link
Member

Secrus commented May 17, 2022

Is there still interest in bringing this to Poetry? PR seems stale.

@yggi49
Copy link
Contributor Author

yggi49 commented May 18, 2022

I am definitely interested.

My main motivation was to support special characters in author names, e.g. my·fancy·company <development@my-fancy-company.example.com>; the initially suggested solution did just that.

However, as this conversation evolved, it seems that Poetry shall also support various other ways of data input for the author/email field. If someone could provide a concrete specification of which use cases shall be covered in which ways, I will gladly continue working on the implementation.

@Secrus Secrus added the status/needs-consensus Consensus among maintainers required label May 18, 2022
@Secrus
Copy link
Member

Secrus commented May 18, 2022

@python-poetry/core can I please get some eyes on that? seems consensus is needed before this could progress.

@neersighted neersighted deleted the branch python-poetry:develop August 23, 2022 01:19
@yggi49
Copy link
Contributor Author

yggi49 commented Aug 23, 2022

@neersighted, why was this PR closed without further feedback? As stated, I am willing to contribute on improving author information parsing; however, I will need direction as my initial solution was not deemed sufficient.

@neersighted
Copy link
Member

@neersighted, why was this PR closed without further feedback? As stated, I am willing to contribute on improving author information parsing; however, I will need direction as my initial solution was not deemed sufficient.

Hi, I have been doing some branch cleanup and it looks like this was an unintended casualty as it was opened against an ancient, stale branch. It looks like Github does not allow me to retarget this PR against master, so a new PR should be opened with a link to this one for the previous discussion.

@yggi49
Copy link
Contributor Author

yggi49 commented Aug 23, 2022

For the new PR: Does it still make sense to have the functionality work with Poetry 1.1, with Poetry 1.2 looming on the horizon? Or should I just target the 1.2 line?

@Secrus
Copy link
Member

Secrus commented Aug 23, 2022

For the new PR: Does it still make sense to have the functionality work with Poetry 1.1, with Poetry 1.2 looming on the horizon? Or should I just target the 1.2 line?

Target master branch, we will handle potential backports to 1.2 if needed.

Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/feature Feature requests/implementations status/needs-consensus Consensus among maintainers required
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants