-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify range data construction. #496
Conversation
size_t usize = utf_caseless_extend(start_char, *ptr++, options, buffer); | ||
if (buffer != NULL) buffer += usize; | ||
total_size += usize; | ||
size = utf_caseless_extend(start_char, *ptr++, options, buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this will revert Phillip's last fix for -Wshadow
, is that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I wanted to remove the size_t
but forgot it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind, shold had pulled it first before commenting, guess that is why it was using size
in the original to begin with then ;), nice work, and yes GitHub is acting weird today with comments, yours didn't refresh after I posted mine.
is this the last from your fixes to close #469?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no. I found another issue with negated ascii classes. And this is still just the range merge, the logarithmic search is still very far.
d664f93
to
2c74c97
Compare
8a78a3a
to
0713f09
Compare
0713f09
to
22d43ce
Compare
22d43ce
to
c64a84b
Compare
@PhilipHazel if you agree my suggestion in #497 , this patch is ready |
I think the following assertion is not correct:
in PCRE2 it can, and the reasons are historic and described in #186. In summary our "/u" Perl equivalent requires both |
I am not sure I understand that part, it talks about configuring the modifier. Normally if |
The point is that without Agree with you that they "shouldn't" match and that is arguably a bug, but it is the currently expected behaviour when ONLY one of those options are set. The "ambiguity" is resolved at compile time by the redefinition of
|
indeed I think this might had just introduced a regression:
At least this one works, but the
|
It is fixed that regression, and this is what I am talking about. \D matches anything not [0-9], which includes all > 255 characters.
The |
This patch uses the computed ranges to generate byte code rather than using
add_to_class
. It is a considerable simplification of the code.