-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
str::find(char) is slower than it ought ot be #46693
Comments
Another benchmark, the intent being to eliminate just the cost of the unnecessary utf8 decoding (that I assume is happening). This gives a meager 60% reduction in time. #[bench]
fn bench_byte_find(b: &mut Bencher) {
b.iter(|| {
let s = test::black_box(DEMO_STRING);
s.bytes().position(|b| b == b'\n')
});
}
|
I have fixes. We have two problems. One is that The other is that we reuse the same |
#46713 fixes |
I wonder if similar concerns apply for (and then as a bonus, we can get non-ASCII |
Yes, this is true. I need to think about that more. There is a question of how useful that will be -- for most multibyte utf8 text the first byte is quite common. But it should still be faster to do the dumb memchr-based thing instead of actually iterating chars. |
Comparison to GNU libc Of course, not 100% apples to oranges since C strings are null-terminated #![feature(test)]
extern crate libc;
extern crate test;
use libc::c_char;
use std::ffi::{CString};
use test::{black_box, Bencher};
const NEEDLE: &str = "there";
const HAYSTACK: &str = "this is a string with a decent number of ascii characters and \n there is a new line in the middle which it should find";
fn strstr(haystack: *const c_char, needle: *const c_char) -> Option<usize> {
let found = unsafe { libc::strstr(haystack, needle) };
if found.is_null() { None }
else { Some(found as usize - haystack as usize) }
}
#[bench]
fn bench_find(b: &mut Bencher) {
b.iter(|| {
let haystack = test::black_box(HAYSTACK);
let needle = test::black_box(NEEDLE);
haystack.find(needle)
});
}
#[bench]
fn bench_strstr(b: &mut Bencher) {
let haystack = CString::new(HAYSTACK).unwrap();
let haystack = haystack.as_ptr();
let needle = CString::new(NEEDLE).unwrap();
let needle = needle.as_ptr();
b.iter(|| {
let haystack = test::black_box(haystack);
let needle = test::black_box(needle);
strstr(haystack, needle)
});
}
|
Duplicate of #41993? |
I think this is probably related, but not exactly a duplicate. The code |
Use memchr for str::find(char) This is a 10x improvement for searching for characters. This also contains the patches from #46713 . Feel free to land both separately or together. cc @mystor @alexcrichton r? @bluss fixes #46693
Use memchr for str::find(char) This is a 10x improvement for searching for characters. This also contains the patches from #46713 . Feel free to land both separately or together. cc @mystor @alexcrichton r? @bluss fixes #46693
If you try to find a specific character in a string, it performs significantly worse than the theoretical optimal implementation built on
memchr
.For example, consider the following benchmark. On my laptop the
find
method runs at about 90ns/iter, and thememchr
method runs at 4ns/iter. I would imagine that we should optimize them such that they are the same.The text was updated successfully, but these errors were encountered: