Add unstable <[T]>::{position, rposition} functions #84058

thomcc · 2021-04-10T13:54:29Z

These are essentially analogous to what <[T]>::contains is to slice::Iter::any.

The APIs in question are (see the commit diff for the doc comments and such, ofc)

impl [T] {
    pub fn position(&self, x: &T) -> Option<usize> where T: PartialEq;

    pub fn rposition(&self, x: &T) -> Option<usize> where T: PartialEq;
}

The primary reason for adding these when the iterator methods already exist, is that for primitives they can be specialized to use significantly faster search operations. Right now, this is done for u8 and i8, since we already have a serviceable implementation of memchr right there (even if it's not the fastest, it will likely eventually be improved, and it's easily faster than iter().position()).

Note however, that i8/u8 are not the only reasons to add these. Eventually (such as when portable simd is more usable) a specialization for u16, i16, u32, i32 f32, char... (and really, all the primitives) could be provided with simd and give slices of those types a substantial performance boost.

Regarding the name, I'm not really that tied to this one, but it seems unambiguous given that it mirrors the name on iterators.

I'll file a tracking issue once I have a better idea if these will land or not.

rust-highfive · 2021-04-10T13:54:32Z

r? @joshtriplett

(rust-highfive has picked a reviewer for you, use r? to override)

the8472 · 2021-04-10T13:56:30Z

The primary reason for adding these when the iterator methods already exist, is that for primitives they can be specialized to use significantly faster search operations.

We don't need separate methods for that though specialization on slice iterators can achieve the same.

thomcc · 2021-04-10T13:57:45Z

The slice iterator functions take closures. There's no way for the specialization to know the closure is just performing an equality check, so no, sadly that's not possible.

the8472 · 2021-04-10T13:59:29Z

Hrrm, granted, but do the loops not end up vectorizing to the same result?

thomcc · 2021-04-10T14:01:46Z

Nope, IME not even close! Sometimes it tries and does a... middling job at best. It's very hard for autovectorization to beat a hand-tuned method like this, and will also benefit from any optimizations we do to the libcore memchr implementation in the future.

leonardo-m · 2021-04-10T15:34:58Z

Could you show one difference (with iterator vs this function) for the generated Asm?

thomcc · 2021-04-10T20:12:53Z

Our memchr function isn't inlined, so you can't really see it. Also, I can't call it on godbolt. That said, you can look at it's source https://github.com/rust-lang/rust/blob/master/library/core/src/slice/memchr.rs and know that it processes things in 2x-unrolled size_of::<usize>() chunks.

The position loop turns into something like this https://rust.godbolt.org/z/xsPYPWc5E

example::foo:
        test    rsi, rsi
        je      .LBB0_1
        xor     eax, eax
        xor     edx, edx
.LBB0_3:
        cmp     byte ptr [rdi + rdx], 0
        je      .LBB0_4
        add     rdx, 1
        cmp     rsi, rdx
        jne     .LBB0_3
        mov     rdx, rsi
        ret
.LBB0_1:
        xor     edx, edx
        xor     eax, eax
        ret
.LBB0_4:
        mov     eax, 1
        ret

E.g. byte at a time processing. It's hard for it to do much better with this style of loop (which requires knowing the position of the match), and in practice, even if LLVM were to do a good job on it, it's unclear that it would be a good thing to inline, since it would lead to code bloat (this is often a problem with LLVM's autovectorization IME however).

In practice you can get at least a size_of::<usize>()-times speedup by using a memchr like this on reasonably sized inputs (Probably twice that).

We should be able to improve that further with SIMD, and improve the current code on smaller and less aligned slices too (which is something this memchr does okay for, but the memchr crate does worse on the larger the vector width you use gets). For example we can do similar things to what I did in <[u8]>::is_ascii with unaligned loads in order to avoid per-byte processing on the unaligned prefix/suffix (which contrary to popular belief, is a win on most modern arches, which for the most part have cheap unaligned loads, but even if they don't, we do fewer non-load operations that way).

That said, that's unrelated, and while I'd like to do it (it would benefit all uses of memchr inside libcore/libstd — personally notably is the \0 search done in the slices for std::ffi::{CStr,CString}), there are lower hanging fruits.

Anyway, I get that maybe we wish that Iterator::position could be optimal, but honestly I don't even think the existence of this function is unpalatable — it's the same idea as indexOf or lastIndexOf in other langs, just without an extra step.

That is, I don't have to remember to do |p| *p == o rather than |p| p == o, or use iter().copied()... I actually think this is a nice function from an ergonomics perspective, similar to how I don't thing that <[T]>::contains is a bad thing to have given the existence of Iterator::any.

SkiFire13 · 2021-04-11T11:53:04Z

Iterator::position and [T]::position would have the same name but with different APIs. Can't we find a better name?

The reason we can't specialize Iterator::position is because it takes a closure but this has nothing to do with Iterator itself. Can't we add a method with the same API on Iterator and specialize that?

thomcc · 2021-04-12T19:50:55Z

Iterator::position and [T]::position would have the same name but with different APIs. Can't we find a better name?

Sure. I'm not picky about names at all, and always open to suggestions. Perhaps index_of and last_index_of have enough precedent in other languages to be better?

The reason we can't specialize Iterator::position is because it takes a closure but this has nothing to do with Iterator itself. Can't we add a method with the same API on Iterator and specialize that?

This is viable, but I'm not a fan for several reasons:

I think it's way more out of place for this to be on Iterator, since the vast majority of iterators would be unable to specialize this — basically only iterators that boil down to operating on slices would.

On Iterator, it's more obviously a hack for optimization, whereas IMO on slice there's a slight ergonomics benefit as well (it's a direct answer to a new user's question of "where is the equivalent of indexOf", for example, and .position(&v) is meaningfully less to type and IMO to think about than .iter().position(|x| *x == v))

Despite that it doesn't apply to most iterators, "iterators that boil down to operating on slices" is a large number of iterators in the stdlib alone. For example, if a function for this is on Iterator, it plausibly should be specialized for:

slice::Iter
slice::IterMut
alloc::vec::IntoIter
iter::Copied<slice::Iter>
...

And several more. This is... tedious. Even if these are not done initially, over time we should expect to gain them as people file optimization PRs for their own specific cases. Putting it on Iterator for the purposes of being specializable, to me, implies we're open to this.

Note that this is especially true when it needs to be specialized for the iterated type as well, e.g. u8/i8 in this case, but we should eventually expect it to include other primitives as well.

Finally, putting it on slices already has the precedent of being where <[T]>::contains is as well.

the8472 · 2021-04-14T20:49:32Z

Since this is mostly about using memchr for performance reasons how about only putting them on [u8] and [i8]? For anything else the iterator methods should do. Ideally it would be for any [T] where {size_of::<T>() == 1} but last time I checked associated type bounds still weren't powerful enough to express something like that.

thomcc · 2021-04-14T20:50:57Z

As I mentioned a few times, it's not solely about i8/u8:

Note however, that i8/u8 are not the only reasons to add these. Eventually (such as when portable simd is more usable) a specialization for u16, i16, u32, i32 f32, char... (and really, all the primitives) could be provided with simd and give slices of those types a substantial performance boost.

jyn514 · 2021-04-14T20:55:07Z

Sure. I'm not picky about names at all, and always open to suggestions. Perhaps index_of and last_index_of have enough precedent in other languages to be better?

str::find is somewhat close:

pub fn find<'a, P>(&'a self, pat: P) -> Option<usize> where
    P: Pattern<'a>;

Maybe we could name this 'find'?

thomcc · 2021-04-14T21:06:31Z

Hm, I guess this is redundant with #56345. I guess this can be closed since presumably someday someone will implement that mammoth of an RFC.

Add unstable <[T]>::{position, rposition} functions

36379a8

rust-highfive assigned joshtriplett Apr 10, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 10, 2021

jyn514 added A-slice Area: `[T]` T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Apr 14, 2021

thomcc closed this Apr 14, 2021

SkiFire13 mentioned this pull request Mar 4, 2022

Poor codegen for string search (i.e., memchr) with iterators #94573

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add unstable <[T]>::{position, rposition} functions #84058

Add unstable <[T]>::{position, rposition} functions #84058

Uh oh!

thomcc commented Apr 10, 2021

Uh oh!

rust-highfive commented Apr 10, 2021

Uh oh!

the8472 commented Apr 10, 2021

Uh oh!

thomcc commented Apr 10, 2021

Uh oh!

the8472 commented Apr 10, 2021

Uh oh!

thomcc commented Apr 10, 2021

Uh oh!

leonardo-m commented Apr 10, 2021

Uh oh!

thomcc commented Apr 10, 2021 •

edited

Loading

Uh oh!

SkiFire13 commented Apr 11, 2021

Uh oh!

thomcc commented Apr 12, 2021

Uh oh!

the8472 commented Apr 14, 2021

Uh oh!

thomcc commented Apr 14, 2021

Uh oh!

jyn514 commented Apr 14, 2021

Uh oh!

thomcc commented Apr 14, 2021

Uh oh!

Uh oh!

Add unstable <[T]>::{position, rposition} functions #84058

Add unstable <[T]>::{position, rposition} functions #84058

Uh oh!

Conversation

thomcc commented Apr 10, 2021

Uh oh!

rust-highfive commented Apr 10, 2021

Uh oh!

the8472 commented Apr 10, 2021

Uh oh!

thomcc commented Apr 10, 2021

Uh oh!

the8472 commented Apr 10, 2021

Uh oh!

thomcc commented Apr 10, 2021

Uh oh!

leonardo-m commented Apr 10, 2021

Uh oh!

thomcc commented Apr 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SkiFire13 commented Apr 11, 2021

Uh oh!

thomcc commented Apr 12, 2021

Uh oh!

the8472 commented Apr 14, 2021

Uh oh!

thomcc commented Apr 14, 2021

Uh oh!

jyn514 commented Apr 14, 2021

Uh oh!

thomcc commented Apr 14, 2021

Uh oh!

Uh oh!

thomcc commented Apr 10, 2021 •

edited

Loading