-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parsing of author information #1040
Conversation
Instead of relying on regular expressions, this patch leverages Python’s builtin `email.utils.parseaddr()` functionality to parse an RFC-822-compliant email address string into its name and address parts. This should also resolve issues with special characters in the name part; see issues python-poetry#370 and python-poetry#798. python-poetry#370 python-poetry#798
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is something I want to do but I was still keeping finding an elegant solution.
And you just made it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This thread is beyond the poetry doc thoroughly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I messed with re
a while and found another solution.
It handles Non-RFC-conform cases as well.
# Non-RFC-conform cases with unquoted commas | ||
name, email = parse_author("asf,dfu@t.b") | ||
assert name == "asf" | ||
assert email is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert email is None | |
assert email == "dfu@t.b" |
|
||
name, email = parse_author("asf,<dfu@t.b>") | ||
assert name == "asf" | ||
assert email is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert email is None | |
assert email == "dfu@t.b" |
|
||
name, email = parse_author("asf, dfu@t.b") | ||
assert name == "asf" | ||
assert email is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert email is None | |
assert email == "dfu@t.b" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now @ID
is supported. IPython somewhat needs '@' completion.
In [5]: parse_author("@asf")
Out[5]: ("@asf", None)
In [6]: parse_author("@asf <asf@t.b>")
Out[6]: ("@asf", "asf@t.b")
In [7]: parse_author("@asf,<asf@t.b>")
Out[7]: ("@asf", "asf@t.b")
In [8]: parse_author("@asf,asf@t.b")
Out[8]: ("@asf", "asf@t.b")
In [9]: parse_author("@asf, asf@t.b")
Out[9]: ("@asf", "asf@t.b")
In [10]: parse_author("@asf,")
Out[10]: ("@asf", None)
In [11]: parse_author("@asf <>")
Out[11]: ("@asf", None)
Here is the latest revision in my gist with runnable tests including in https://gist.github.com/drunkwcodes/d2551bc41f54a1120434ad8f24dc49af |
Is it really necessary to attempt to make something out of even the most obscure construed use cases? For example, I don’t quite get why parse_author("@asf,asf@t.b") should result in ("@asf", "asf@t.b") To me, that seems like a totally arbitrary, non-obvious choice. —First and foremost, this PR attempts to resolve (in a simple way) the two issues that are linked at the top, which it does. Everything beyond it might be something for a separate feature request. |
@yggi49 The cases need to be polished of course. The problem was the unclear exception. But I want to eliminate it. After all, the change will enable arbitrary input in |
I beg to disagree. Just as licenses are validated, I think it makes totally sense to validate the “author” field’s input as well, and make users aware and actively inform/educate them if their input does not correspond to a given, very common and widespread format. That said, this PR is still lacking a second reviewer—maybe there is a third opinion here? |
@yggi49 The field may be referred to The author field doesn't have the constraint by itself. And both fields are optional. Maybe we can put the spell checking in |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Is there still interest in bringing this to Poetry? PR seems stale. |
I am definitely interested. My main motivation was to support special characters in author names, e.g. However, as this conversation evolved, it seems that Poetry shall also support various other ways of data input for the author/email field. If someone could provide a concrete specification of which use cases shall be covered in which ways, I will gladly continue working on the implementation. |
@python-poetry/core can I please get some eyes on that? seems consensus is needed before this could progress. |
@neersighted, why was this PR closed without further feedback? As stated, I am willing to contribute on improving author information parsing; however, I will need direction as my initial solution was not deemed sufficient. |
Hi, I have been doing some branch cleanup and it looks like this was an unintended casualty as it was opened against an ancient, stale branch. It looks like Github does not allow me to retarget this PR against master, so a new PR should be opened with a link to this one for the previous discussion. |
For the new PR: Does it still make sense to have the functionality work with Poetry 1.1, with Poetry 1.2 looming on the horizon? Or should I just target the 1.2 line? |
Target |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Instead of relying on regular expressions, this patch leverages Python’s
builtin
email.utils.parseaddr()
functionality to parse an RFC-822-compliantemail address string into its name and address parts.
This should also resolve issues with special characters in the name part; see
issues #370 and #798.
Pull Request Check List