-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hyperscan] Broken UTF-8 support #1308
Comments
That just means your platform's default encoding isn't UTF-8. Do you need to use UTF-8? If your platform encoding does all you need, not forcing UTF-8 should work just fine. |
Can you clarify a bit? |
Your JVM is obviously not using UTF-8. Check the value of the file.encoding system property. |
That's funny, I get the same error here, but if I remove HS_FLAG_UTF8 it works fine. Can you confirm? That sounds like a bug in Hyperscan... |
Sounds like issue intel/hyperscan#362 maybe? |
Yeah everything works ok without the flag. |
Looking through code it seems it's not the reason
I have Cyrillic symbols, so it's not case. Also note that turning the flag off disables caseless UTF-8 matching, for example, the following test fails now:
Again, the other bindings work as expected, so I tend to think there's something off with bindings itself. I have the following set of flags:
|
We can get UTF-8 bytes with String.getBytes() and call functions directly with a pointer to that and it still fails. So unless there's something wrong with how the JVM encodes in UTF-8, it's a problem with Hyperscan itself. Other bindings may offer some sort of workaround for that. Someone should investigate that. |
Oh look at that. When building from the develop branch, the error doesn't show up anymore. So the bug has been fixed since 5.4.0. I guess we could upgrade the presets to the develop branch since they don't appear to make releases anymore... |
Looks like that's been fixed in 5.4.1. Please give it a try with the snapshots: http://bytedeco.org/builds/ |
Ah, no, this is still happening with 5.4.1, but only when it gets built on CentOS 7. When I build locally with Fedora 36, this isn't happening. So I guess this is a bug in the compiler... Well, CentOS 7 EOL is next year, and then I think I'll just switch all builds to Ubuntu, and that should take care of this bug. If you need something before that, either build from source with a compiler that works, or figure out how to make this work with compilers available for CentOS 7! |
The problem perists with 5.4.2-1.5.9 when using maven central artifact. When I compiled the preset myself it worked fine, so it must have been something the way it was compiled.
outputs |
More explicitly, |
This should be fixed now. Please give it a try with the snapshots: http://bytedeco.org/builds/ |
Still fails. |
I wonder which version of the compiler fixes that... |
So far I figured the bug occurs in Parser.cpp which is generated by Ragel. Centos 7 uses Development (!) version 7 of Ragel, while Ragel 6 seems to work fine. I haven't managed to install Ragel 6 on Centos 7. |
Ok, I found it. It is Ragel. Version 6 has to be used. I had it compiled manually because no package for Centos 7 exists. Also had to build cmake, bmake, libfsm and kelbt. What a fun day. EDIT: |
Yes, I see, thanks. I forgot to update the builds to Ubuntu. Now the snapshots work. 👍 |
Something is broken with Hyperscan build. The following code doesn't work.
Output:
Meanwhile, https://github.com/gliwka/hyperscan-java binaries work as expected
The text was updated successfully, but these errors were encountered: