You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the user passes in a charset currently, it's only used for dot. I'm expanding this to be use intersection between the charset passed in, and categories like \w\s\d as well, but don't intend to for literals.
Should it apply to character classes? I'm not sure.
For some, like [^\w] it's pretty clear it should (once Unicode support lands), but others like [a-z_] are already fairly limited.
The text was updated successfully, but these errors were encountered:
#27 takes a stab at this by allowing the caller to provide different subsets for each category, which if used sanely allows control over expansion of \w, \d and \s.
This helps a little towards re.UNICODE, as it brings charset= into the internals in a way that can be described and reasoned about when the categories have multiple possible values.
It doesnt address explicit subsets like [a-z_] - I think there is more pain than joy in using charset= as a way to reduce the result space in cases like this - it could only be used if the input regex is quite predictable to the developer.
When the user passes in a
charset
currently, it's only used for dot. I'm expanding this to be use intersection between thecharset
passed in, and categories like\w\s\d
as well, but don't intend to for literals.Should it apply to character classes? I'm not sure.
For some, like
[^\w]
it's pretty clear it should (once Unicode support lands), but others like[a-z_]
are already fairly limited.The text was updated successfully, but these errors were encountered: