Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: limit match length of email regular expression #9

Closed
wants to merge 1 commit into from

Conversation

llimllib
Copy link

@llimllib llimllib commented Aug 14, 2024

Initial checklist

  • I read the support docs
  • I read the contributing guide
  • I agree to follow the code of conduct
  • I searched issues and couldn’t find anything (or linked relevant results below)
  • If applicable, I’ve added docs and tests

Description of changes

This is a quick fix for the problem described in #8, where a long line causes the findEmail regular expression to exhibit pathological behavior.

The solution presented here is to replace every + in the regular expression with {1,255}; it will still be super-linear on long lines but the buffer is now small enough that the function will complete in an acceptable period of time.

The maximum length of a valid email address is 255 according to the IETF here, but I've left room in the regex for 64 before and 255 after the @.

This only improves matters, doesn't fix the problem entirely.

Before this PR, line lenghts of up to about 50,000 cause recursion failure:

image

With this PR, I can run the regex on strings of length up to 10 megabytes, which seems like a very comfortable line length, certainly a big improvement:

image

but it still fails with a recursion limit above that

closes #8

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Aug 14, 2024
@llimllib llimllib mentioned this pull request Aug 14, 2024
4 tasks
@llimllib llimllib force-pushed the fix-pathological-regex branch from 588c484 to e3c3d38 Compare August 14, 2024 20:52
@llimllib llimllib closed this Aug 14, 2024

This comment has been minimized.

wooorm added a commit that referenced this pull request Aug 19, 2024
@wooorm wooorm added the 🤷 no/invalid This cannot be acted upon label Aug 19, 2024
@github-actions github-actions bot added 👎 phase/no Post cannot or will not be acted on and removed 🤞 phase/open Post is being triaged manually labels Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤷 no/invalid This cannot be acted upon 👎 phase/no Post cannot or will not be acted on
Development

Successfully merging this pull request may close these issues.

Performance issue with long lines
2 participants