You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why is behavior different between regexp and re2 (re2 seems to be more consistent)?
Why is "\xd1\xd1" matched against both "." and ".."? I can understand if it is matched against one or another, but not both; is it one character or two?
go version devel +b0532a9 Mon Jun 8 05:13:15 2015 +0000 linux/amd64
The text was updated successfully, but these errors were encountered:
Here are other examples of disagreement between regexp and re2 for invalid utf-8:
re=".$" str="\xb1\x98" regexp=true re2=false
panic: regexp and re2 disagree on regexp match
re=".*(..b)." str="(.a|.b\xdb|" regexp=true re2=false
panic: regexp and re2 disagree on regexp match
re="\\Q\xb4\\Q" regexp=<nil> re2=false
panic: regexp and re2 disagree on regexp validity
re="\\QT\x82\\E\\QT\\E" str="c^|^\\QTt\\c" regexp=<nil> re2=false
panic: regexp and re2 disagree on regexp validity
re="^((?:.*)+?(?:.*)+?)$" str="\xff\xbf\x80\x80$^^.^^^^((?.^^^" regexp=true re2=false
panic: regexp and re2 disagree on regexp match
re="\\Q\x8a-" str="o\\Q" regexp=<nil> re2=false
panic: regexp and re2 disagree on regexp validity
re="." str="\xd6" regexp=true re2=false
panic: regexp and re2 disagree on regexp match
re="[^-9]+z" str="\xbfz)^(?:" regexp=true re2=false
panic: regexp and re2 disagree on regexp match
In Go, "." matches a single malformed UTF-8 sequence; in RE2 it does not. This is mainly due to the implementation details of each but I wouldn't change either now.
As for the second question, "xx" matches against both "." and ".." too.
The following program:
prints:
While the following C++ program:
prints:
This raises 2 questions:
go version devel +b0532a9 Mon Jun 8 05:13:15 2015 +0000 linux/amd64
The text was updated successfully, but these errors were encountered: