Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standardize email pattern #3524

Merged

Conversation

kashifkhan0771
Copy link
Contributor

@kashifkhan0771 kashifkhan0771 commented Oct 29, 2024

Description:

Standardized the email pattern in all detectors using emailPat
Added pattern test cases for relevant detectors to ensure changes do not impact detection accuracy
Added email pattern test cases
Enhanced email detection logic where applicable

Jira Ticket:

OSS-47 (Update Detectors to use standard Email pattern)

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@kashifkhan0771 kashifkhan0771 requested a review from a team as a code owner October 29, 2024 11:02
@@ -7,7 +7,7 @@ import (
"strings"
)

const EmailPattern = `\b(?:[a-z0-9!#$%&'*+/=?^_\x60{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_\x60{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])\b`
const EmailPattern = `\b((?:[a-z0-9!#$%&'*+/=?^_\x60{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_\x60{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\]))\b`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because our detectors rely on FindAllStringSubmatch, which specifically searches for group matches. Open to any ideas you may have!

Copy link
Collaborator

@zricethezav zricethezav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check each and every detector integration but nothing here stands out to me as a red flag. We'll find out if this breaks anything soon enough ;)

nice work

@zricethezav zricethezav merged commit 84ec670 into trufflesecurity:main Oct 30, 2024
13 checks passed
@kashifkhan0771 kashifkhan0771 deleted the update/standard-email-pattern branch October 31, 2024 05:53
@bugbaba
Copy link

bugbaba commented Nov 15, 2024

Why are we accepting special characters for emails? Just got a match where aaa'a/xyz@abc.example.com was detected as email

image

@rgmz
Copy link
Contributor

rgmz commented Nov 15, 2024

The specification is technically very permissible.

I think it'd be best to match probable emails, not everything permitted by the specification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants