-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add read_until_slice #6531
Add read_until_slice #6531
Conversation
I am reluctant to add this. It's not cancellation safe, so there is likely to be people who use it incorrectly. There's also no precedent for it in std. |
Totally fair, thats what I thought anyway. Maybe it could go in tokio-utils? Its pretty useful to lazy load files. |
Also it would be possible to make it cancel safe by checking for a partial needle at the end of the existing buffer. It adds some complexity and some edge cases, but doable. If that changes you decision. |
It can't be cancel safe. Back before the 1.0 release, we considered changing |
Right make sense, is there a trait in tokio-utils where this code could live? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but you can add a function similar to tokio_util::io::read_buf
.
/// Returns the first index matching the `needle` in the `haystack`. `match_len` specifies how | ||
/// many bytes from the needle were already matched during the previous lookup. | ||
/// If we reach the end of the `haystack` with a partial match, then this is a partial match, | ||
/// and we update the `match_len` value accordingly, even though we still return `None`. | ||
fn search(needle: &[u8], haystack: &[u8], match_len: &mut usize) -> Option<usize> { | ||
let haystack_len = haystack.len(); | ||
let needle_len = needle.len(); | ||
#[allow(clippy::needless_range_loop)] | ||
for i in 0..haystack_len { | ||
if haystack[i] == needle[*match_len] { | ||
*match_len += 1; | ||
if *match_len == needle_len { | ||
return Some(i + 1); | ||
} | ||
} else if *match_len > 0 { | ||
*match_len = 0; | ||
} | ||
} | ||
None | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this logic is really tricky. It's going to need really good tests. Probably something that uses some loops to try a lot of different situations. For example, for a few different lengths of search strings, try to split some input in all possible places and ensure that you get the same result no matter how the input is split.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed but I am testing it in our app already and I didnt catch a bug yet so I am confident it is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Famous last words. :)
I'm closing this due to inactivity, but you are welcome to reopen if you would like to continue working on this. |
Motivation
Right now there is no built-in way to read until a sequence of bytes is found from a
BufReader
.I first improved the algorithm in https://github.com/programingjd/read_until_slice but I think this should be supported natively.
Solution
The code is very similar to
read_until
, but it keeps track of amatch_len
in the needle. That allows us to match more than one character across multiple reads, but it does make the function not cancel safe. I find this solution elegant since it doesn't require us to keep another buffer.