Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict email address validation #17688

Merged
merged 15 commits into from
Mar 14, 2022
Merged

Conversation

lunny
Copy link
Member

@lunny lunny commented Nov 17, 2021

This didn't follow the RFC but it's a subset of that. I think we should narrow the allowed chars at first and discuss more possibility in future PRs.

Below are the rules to validate an email.

  1. An email address should only contain [0-9a-zA-Z.!#$%&'*+-/=?^_{|}~@]`.
  2. First char should only be [0-9a-zA-Z].
  3. Strings after @ should only contains [0-9a-zA-Z.] charactors.

@Gusted
Copy link
Contributor

Gusted commented Nov 17, 2021

What about a email address like postmaster+gitea@gusted.xyz?

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Nov 17, 2021
@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Nov 17, 2021
@Gusted
Copy link
Contributor

Gusted commented Nov 17, 2021

Just a thought:

  • This could be seen as too restrictive as other characters like ! { } & etc. are allowed and even more is allowed by escaping characters like IHaveASpace\ Wow@gusted.xyz.
  • Existing emails didn't go trough this check.

@lunny lunny added the pr/breaking Merging this PR means builds will break. Needs a description what exactly breaks, and how to fix it! label Nov 17, 2021
@wxiaoguang
Copy link
Contributor

If we want to follow RFC, the rules are more complex.

https://stackoverflow.com/questions/2049502/what-characters-are-allowed-in-an-email-address

 addr-spec   =  local-part "@" domain        ; global address     
 local-part  =  word *("." word)             ; uninterpreted
                                             ; case-preserved
 
 domain      =  sub-domain *("." sub-domain)     
 sub-domain  =  domain-ref / domain-literal     
 domain-ref  =  atom                         ; symbolic reference
The local-part of the email address may use any of these ASCII characters:

    uppercase and lowercase Latin letters A to Z and a to z;
    digits 0 to 9;
    special characters !#$%&'*+-/=?^_`{|}~;
    dot ., provided that it is not the first or last character unless quoted, and provided also that it does not appear consecutively unless quoted (e.g. John..Doe@example.com is not allowed but "John..Doe"@example.com is allowed);
    space and "(),:;<>@[\] characters are allowed with restrictions (they are only allowed inside a quoted string, as described in the paragraph below, and in addition, a backslash or double-quote must be preceded by a backslash);
    comments are allowed with parentheses at either end of the local-part; e.g. john.smith(comment)@example.com and (comment)john.smith@example.com are both equivalent to john.smith@example.com.

@silverwind
Copy link
Member

silverwind commented Nov 17, 2021

How about using mail.ParseAddress? May need to pass Foo <$addr> to make it reject the long form with name.

Nevermind, I see the code does that, but it's probably deemed too lax.

@lunny lunny force-pushed the lunny/restrict_email_validate branch from 1270d9c to d7293bb Compare November 18, 2021 06:05
@silverwind
Copy link
Member

New test may be failing.

@lunny
Copy link
Member Author

lunny commented Nov 19, 2021

New test may be failing.

; is not accepted by mail.ParseAddress.

@lunny lunny force-pushed the lunny/restrict_email_validate branch from d7293bb to c4b9235 Compare November 19, 2021 11:38
@zeripath
Copy link
Contributor

Parsing email addresses with regexps seems like a bad idea.

If we want to follow RFC, the rules are more complex.

https://stackoverflow.com/questions/2049502/what-characters-are-allowed-in-an-email-address

 addr-spec   =  local-part "@" domain        ; global address     
 local-part  =  word *("." word)             ; uninterpreted
                                             ; case-preserved
 
 domain      =  sub-domain *("." sub-domain)     
 sub-domain  =  domain-ref / domain-literal     
 domain-ref  =  atom                         ; symbolic reference
The local-part of the email address may use any of these ASCII characters:

    uppercase and lowercase Latin letters A to Z and a to z;
    digits 0 to 9;
    special characters !#$%&'*+-/=?^_`{|}~;
    dot ., provided that it is not the first or last character unless quoted, and provided also that it does not appear consecutively unless quoted (e.g. John..Doe@example.com is not allowed but "John..Doe"@example.com is allowed);
    space and "(),:;<>@[\] characters are allowed with restrictions (they are only allowed inside a quoted string, as described in the paragraph below, and in addition, a backslash or double-quote must be preceded by a backslash);
    comments are allowed with parentheses at either end of the local-part; e.g. john.smith(comment)@example.com and (comment)john.smith@example.com are both equivalent to john.smith@example.com.

Since 2012 you can include characters above U+007f if they're encoded as UTF-8.


Why are we doing this? Why not just parse the email address with mail.ParseAddress? What's the aim?

@lunny
Copy link
Member Author

lunny commented Nov 23, 2021

Parsing email addresses with regexps seems like a bad idea.

If we want to follow RFC, the rules are more complex.
https://stackoverflow.com/questions/2049502/what-characters-are-allowed-in-an-email-address

 addr-spec   =  local-part "@" domain        ; global address     
 local-part  =  word *("." word)             ; uninterpreted
                                             ; case-preserved
 
 domain      =  sub-domain *("." sub-domain)     
 sub-domain  =  domain-ref / domain-literal     
 domain-ref  =  atom                         ; symbolic reference
The local-part of the email address may use any of these ASCII characters:

    uppercase and lowercase Latin letters A to Z and a to z;
    digits 0 to 9;
    special characters !#$%&'*+-/=?^_`{|}~;
    dot ., provided that it is not the first or last character unless quoted, and provided also that it does not appear consecutively unless quoted (e.g. John..Doe@example.com is not allowed but "John..Doe"@example.com is allowed);
    space and "(),:;<>@[\] characters are allowed with restrictions (they are only allowed inside a quoted string, as described in the paragraph below, and in addition, a backslash or double-quote must be preceded by a backslash);
    comments are allowed with parentheses at either end of the local-part; e.g. john.smith(comment)@example.com and (comment)john.smith@example.com are both equivalent to john.smith@example.com.

Since 2012 you can include characters above U+007f if they're encoded as UTF-8.

Why are we doing this? Why not just parse the email address with mail.ParseAddress? What's the aim?

The invisible unicode characters could pass mai.ParseAddress. Do we really want to allow them?

@adelowo
Copy link
Member

adelowo commented Jan 28, 2022

Parsing email addresses with regexps seems like a bad idea.

If we want to follow RFC, the rules are more complex.
https://stackoverflow.com/questions/2049502/what-characters-are-allowed-in-an-email-address

 addr-spec   =  local-part "@" domain        ; global address     
 local-part  =  word *("." word)             ; uninterpreted
                                             ; case-preserved
 
 domain      =  sub-domain *("." sub-domain)     
 sub-domain  =  domain-ref / domain-literal     
 domain-ref  =  atom                         ; symbolic reference
The local-part of the email address may use any of these ASCII characters:

    uppercase and lowercase Latin letters A to Z and a to z;
    digits 0 to 9;
    special characters !#$%&'*+-/=?^_`{|}~;
    dot ., provided that it is not the first or last character unless quoted, and provided also that it does not appear consecutively unless quoted (e.g. John..Doe@example.com is not allowed but "John..Doe"@example.com is allowed);
    space and "(),:;<>@[\] characters are allowed with restrictions (they are only allowed inside a quoted string, as described in the paragraph below, and in addition, a backslash or double-quote must be preceded by a backslash);
    comments are allowed with parentheses at either end of the local-part; e.g. john.smith(comment)@example.com and (comment)john.smith@example.com are both equivalent to john.smith@example.com.

Since 2012 you can include characters above U+007f if they're encoded as UTF-8.

Why are we doing this? Why not just parse the email address with mail.ParseAddress? What's the aim?

yes. Just using email verification links is a saner solution to be honest

@lunny lunny force-pushed the lunny/restrict_email_validate branch from c4b9235 to d9f335c Compare March 11, 2022 16:30
@lunny lunny added this to the 1.17.0 milestone Mar 11, 2022
@lunny lunny added the topic/security Something leaks user information or is otherwise vulnerable. Should be fixed! label Mar 11, 2022
@lunny lunny force-pushed the lunny/restrict_email_validate branch 3 times, most recently from 144a5e3 to 654acf3 Compare March 13, 2022 16:51
@lunny lunny force-pushed the lunny/restrict_email_validate branch from c847923 to 1210467 Compare March 14, 2022 13:01
@lunny lunny force-pushed the lunny/restrict_email_validate branch from 1210467 to b523619 Compare March 14, 2022 15:30
Copy link
Member

@6543 6543 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests passed localy

6543 pushed a commit to 6543-forks/gitea that referenced this pull request Mar 14, 2022
@6543
Copy link
Member

6543 commented Mar 14, 2022

-> #19085

@6543 6543 added the backport/done All backports for this PR have been created label Mar 14, 2022
@6543 6543 merged commit 18033f4 into go-gitea:main Mar 14, 2022
6543 added a commit that referenced this pull request Mar 14, 2022
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
@lunny lunny deleted the lunny/restrict_email_validate branch March 15, 2022 01:01
zjjhot added a commit to zjjhot/gitea that referenced this pull request Mar 15, 2022
* giteaofficial/main:
  Frontport Changelogs (go-gitea#19088)
  Restrict email address validation (go-gitea#17688)
  Fix lfs bug (go-gitea#19072)
6543 pushed a commit that referenced this pull request Mar 18, 2022
Even with #17688 email addresses that contain an initial `-` may still be present in the db and it may in future still be possible to imagine a situation whereby initial `-` are repermitted.

This PR simply updates the documentation to warn users to set their SENDMAIL_ARGS with a terminal `--` to prevent this possibility email addresses being interpreted as options.

Signed-off-by: Andrew Thornton <art27@cantab.net>
Chianina pushed a commit to Chianina/gitea that referenced this pull request Mar 28, 2022
This didn't follow the RFC but it's a subset of that. I think we should narrow the allowed chars at first and discuss more possibility in future PRs.
Chianina pushed a commit to Chianina/gitea that referenced this pull request Mar 28, 2022
Even with go-gitea#17688 email addresses that contain an initial `-` may still be present in the db and it may in future still be possible to imagine a situation whereby initial `-` are repermitted.

This PR simply updates the documentation to warn users to set their SENDMAIL_ARGS with a terminal `--` to prevent this possibility email addresses being interpreted as options.

Signed-off-by: Andrew Thornton <art27@cantab.net>
@go-gitea go-gitea locked and limited conversation to collaborators Apr 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/done All backports for this PR have been created lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. pr/breaking Merging this PR means builds will break. Needs a description what exactly breaks, and how to fix it! topic/security Something leaks user information or is otherwise vulnerable. Should be fixed!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants