-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OF-2628: Avoid polynomial regular expression used on uncontrolled data #2221
OF-2628: Avoid polynomial regular expression used on uncontrolled data #2221
Conversation
Prevent possibility of regular expression execution depending on user-provided values taking exceptionally long / many resources to complete This commit replaces the regular expression with a solution that doesn't use regular expressions. Arguably, the complexity of this code does not change much.
Is this a solution that covers all edge cases, and is roughly not much worse in performance as compared to the original? |
I wonder if we can test efficacy AND performance with a unit test pre- and post-change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, and is better in that it allows hex char references.
I can't offer any opinion on performance, other than perhaps suggest it probably doesn't matter too much unless it's awful, which I doubt?
Added some unit tests and ran them on main and on this branch a bunch of times via |
I won't merge unit Guus gives my unit tests a look. They're really very brief... |
Might want to replace @test with @RepeatedTest(10_000) - or some other number, but they look pretty quick to run - to get a bigger sample for timing purposes. |
The original should also have allowed for hex-based character references - but the fact that that wasn't clear is probably a good indication that this change isn't the worst, from a readability perspective. As for the test:
|
This reverts commit dc11312.
Thanks Guus. I totally didn't spot those, and they're much better functional coverage. Reverted. I agree with not using unit tests for automated load comparison in the general case. It requires constrained and understood code paths and an environment where the performance variables are somewhat understood. Understanding the results are also important (e.g. one run with a 10% isn't interesting, but a consistent 10% execution time change very much is). |
Prevent possibility of regular expression execution depending on user-provided values taking exceptionally long / many resources to complete
This commit replaces the regular expression with a solution that doesn't use regular expressions. Arguably, the complexity of this code does not change much.