-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: sz_find incorrectly finds the substring with length=5 #154
Comments
OK, maybe the root cause is at higher level, not exactly in this function. I couldn't figure out the root cause yet. |
Interesting, which CPU model are you running on? |
Oh, it must be a binding issue. When I avoid passing the extra arguments I get:
So should be coming from |
I run on different Intel CPUs: Core i7-8700 or Core i7-12700H. |
Also I see similar issue in C code. |
Wow, that's dangerous! Any chance there is an existing test you can extend in |
Thank you for spotting this! I've fixed the bug and will merge soon 🤗 |
Hi, I noticed the first byte is equal where I used random strings of length n[0]: 0xe0
n[1]: 0x87
n_length: 2
sz_find_neon: 1968. h[found_offset0]: 0xe0, h[found_offset0 + 1]: 0x87 <-- first and 2nd bytes good
sz_find_neon_too_smart: 780. h[found_offset1]: 0xe0, h[found_offset1 + 1]: 0x1c <-- first byte good, 2nd not Thanks! edit: tested with lengths 1-32, and only "2" has this issue |
Hi @hillelme! That seems to be a different bug. Any chance you have a minimal string example I can use to reproduce this? |
I have just released the v3.9.0, which patches the original issue. @hillelme can you please create a new one if your issue persists? Thanks for reporting those 🤗 |
The fix is confirmed. Thanks! |
Describe the bug
StringZilla incorrectly finds this case
"Hello, world!".find("world", 0, 11)
which must return -1.Function
sz_equal_serial
doesn't handle correctly the suffix of substring"world"
(suffix == "d"
). It goes exactly to last statementreturn (sz_bool)(a_end == a);
wherea_end == "!"
anda == "!"
which is very strange because at the first line of the function there is an assignmentsz_ptr_t const a_end = a + length;
wherelength == 1
in debugger.It looks like a while loop before
return
statement is written incorrectly for this case.b++
(b was empty"\0"
string before entering the loop) remains empty randomly because subsequent memory is also clean (zeroes).I'm aware about #72 though it is just a wish for optimization.
Steps to reproduce
Simple to test it in Python:
and
Expected behavior
StringZilla version
3.8.4
Operating System
Ubuntu 22.04 and Windows 10/11 64-bit
Hardware architecture
x86
Which interface are you using?
Python bindings
Contact Details
No response
Are you open to being tagged as a contributor?
.git
history as a contributorIs there an existing issue for this?
Code of Conduct
The text was updated successfully, but these errors were encountered: