Skip to content

Commit

Permalink
Faster memchr, memchr2 and memchr3 in generic version
Browse files Browse the repository at this point in the history
Current generic ("all") implementation checks that a chunk (`usize`)
contains a zero byte, and if it is, iterates over bytes of this
chunk to find the index of zero byte. Instead, we can use more bit
operations to find the index without loops.

Context: we use `memchr`, but many of our strings are short.
Currently SIMD-optimized `memchr` processes bytes one by one when
the string length is shorter than SIMD register. I suspect it can
be made faster if we take `usize` bytes a chunk which does not fit
into SIMD register and process it with such utility, similarly to
how AVX2 implementation falls back to SSE2. So I looked at generic
implementation to reuse it in SIMD-optimized version, but there
were none. So here is it.
  • Loading branch information
stepancheg committed Jun 13, 2024
1 parent e8bdf6b commit 885bbc0
Showing 1 changed file with 289 additions and 54 deletions.
Loading

0 comments on commit 885bbc0

Please sign in to comment.