-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting "Invalid UTF-8 character for UTF control characters' #362
Comments
Hyperscan supports UTF8 patterns in 2 ways: has utf8 flag set, or has (*UTF8) control verbs at the beginning of a pattern. The following code just checks UTF8 validity of an expression body. FYI. hyperscan/src/compiler/compiler.cpp Line 171 in 64a995b
|
Thank you for the quick response! if (hs_compile(test1, HS_FLAG_DOTALL|HS_FLAG_UTF8, HS_MODE_BLOCK, NULL, &database, Can you please suggest how to proceed? |
Can you provide us the full test code? Better in .txt attachment. |
Hi, Thank you! |
Hi |
Hey sorry for being late. Your code should be fine, because the error comes from a bug in our utf8 validity function, where we mistreat 0x7f as an invalid one-byte utf8 case: hyperscan/src/parser/utf8_validate.cpp Lines 74 to 76 in 64a995b
Should be "s[i] <= 0x7f" here. Your first byte of char string happens to fall into the corner cases. We'll push the fix recently. You might currently do manually modification if needed. |
Please refer to latest develop branch. |
fix github issue intel#362
Hi,
While compiling expression along with utf 8 chars getting below error. If I remove HS_FLAG_UTF8 flag then it compiles fine. IS there any restriction for utf8 control characters?
"bob logged in from �"
code snippet
if (hs_compile(test1, HS_FLAG_DOTALL|HS_FLAG_UTF8, HS_MODE_BLOCK, NULL, &database,
&compile_err) != HS_SUCCESS) {
fprintf(stderr, "ERROR: Unable to compile pattern "%s": %s\n",
test1, compile_err->message);
hs_free_compile_error(compile_err);
ERROR: Unable to compile pattern "bob logged in from ": Expression is not valid UTF-8.
The text was updated successfully, but these errors were encountered: