Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unstable <[T]>::{position, rposition} functions #84058

Closed
wants to merge 1 commit into from

Conversation

thomcc
Copy link
Member

@thomcc thomcc commented Apr 10, 2021

These are essentially analogous to what <[T]>::contains is to slice::Iter::any.

The APIs in question are (see the commit diff for the doc comments and such, ofc)

impl [T] {
    pub fn position(&self, x: &T) -> Option<usize> where T: PartialEq;

    pub fn rposition(&self, x: &T) -> Option<usize> where T: PartialEq;
}

The primary reason for adding these when the iterator methods already exist, is that for primitives they can be specialized to use significantly faster search operations. Right now, this is done for u8 and i8, since we already have a serviceable implementation of memchr right there (even if it's not the fastest, it will likely eventually be improved, and it's easily faster than iter().position()).

Note however, that i8/u8 are not the only reasons to add these. Eventually (such as when portable simd is more usable) a specialization for u16, i16, u32, i32 f32, char... (and really, all the primitives) could be provided with simd and give slices of those types a substantial performance boost.

Regarding the name, I'm not really that tied to this one, but it seems unambiguous given that it mirrors the name on iterators.

I'll file a tracking issue once I have a better idea if these will land or not.

@rust-highfive
Copy link
Collaborator

r? @joshtriplett

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 10, 2021
@the8472
Copy link
Member

the8472 commented Apr 10, 2021

The primary reason for adding these when the iterator methods already exist, is that for primitives they can be specialized to use significantly faster search operations.

We don't need separate methods for that though specialization on slice iterators can achieve the same.

@thomcc
Copy link
Member Author

thomcc commented Apr 10, 2021

The slice iterator functions take closures. There's no way for the specialization to know the closure is just performing an equality check, so no, sadly that's not possible.

@the8472
Copy link
Member

the8472 commented Apr 10, 2021

Hrrm, granted, but do the loops not end up vectorizing to the same result?

@thomcc
Copy link
Member Author

thomcc commented Apr 10, 2021

Nope, IME not even close! Sometimes it tries and does a... middling job at best. It's very hard for autovectorization to beat a hand-tuned method like this, and will also benefit from any optimizations we do to the libcore memchr implementation in the future.

@leonardo-m
Copy link

Could you show one difference (with iterator vs this function) for the generated Asm?

@thomcc
Copy link
Member Author

thomcc commented Apr 10, 2021

Our memchr function isn't inlined, so you can't really see it. Also, I can't call it on godbolt. That said, you can look at it's source https://github.com/rust-lang/rust/blob/master/library/core/src/slice/memchr.rs and know that it processes things in 2x-unrolled size_of::<usize>() chunks.

The position loop turns into something like this https://rust.godbolt.org/z/xsPYPWc5E

example::foo:
        test    rsi, rsi
        je      .LBB0_1
        xor     eax, eax
        xor     edx, edx
.LBB0_3:
        cmp     byte ptr [rdi + rdx], 0
        je      .LBB0_4
        add     rdx, 1
        cmp     rsi, rdx
        jne     .LBB0_3
        mov     rdx, rsi
        ret
.LBB0_1:
        xor     edx, edx
        xor     eax, eax
        ret
.LBB0_4:
        mov     eax, 1
        ret

E.g. byte at a time processing. It's hard for it to do much better with this style of loop (which requires knowing the position of the match), and in practice, even if LLVM were to do a good job on it, it's unclear that it would be a good thing to inline, since it would lead to code bloat (this is often a problem with LLVM's autovectorization IME however).

In practice you can get at least a size_of::<usize>()-times speedup by using a memchr like this on reasonably sized inputs (Probably twice that).

We should be able to improve that further with SIMD, and improve the current code on smaller and less aligned slices too (which is something this memchr does okay for, but the memchr crate does worse on the larger the vector width you use gets). For example we can do similar things to what I did in <[u8]>::is_ascii with unaligned loads in order to avoid per-byte processing on the unaligned prefix/suffix (which contrary to popular belief, is a win on most modern arches, which for the most part have cheap unaligned loads, but even if they don't, we do fewer non-load operations that way).

That said, that's unrelated, and while I'd like to do it (it would benefit all uses of memchr inside libcore/libstd — personally notably is the \0 search done in the slices for std::ffi::{CStr,CString}), there are lower hanging fruits.


Anyway, I get that maybe we wish that Iterator::position could be optimal, but honestly I don't even think the existence of this function is unpalatable — it's the same idea as indexOf or lastIndexOf in other langs, just without an extra step.

That is, I don't have to remember to do |p| *p == o rather than |p| p == o, or use iter().copied()... I actually think this is a nice function from an ergonomics perspective, similar to how I don't thing that <[T]>::contains is a bad thing to have given the existence of Iterator::any.

@SkiFire13
Copy link
Contributor

Iterator::position and [T]::position would have the same name but with different APIs. Can't we find a better name?

The reason we can't specialize Iterator::position is because it takes a closure but this has nothing to do with Iterator itself. Can't we add a method with the same API on Iterator and specialize that?

@thomcc
Copy link
Member Author

thomcc commented Apr 12, 2021

Iterator::position and [T]::position would have the same name but with different APIs. Can't we find a better name?

Sure. I'm not picky about names at all, and always open to suggestions. Perhaps index_of and last_index_of have enough precedent in other languages to be better?

The reason we can't specialize Iterator::position is because it takes a closure but this has nothing to do with Iterator itself. Can't we add a method with the same API on Iterator and specialize that?

This is viable, but I'm not a fan for several reasons:

I think it's way more out of place for this to be on Iterator, since the vast majority of iterators would be unable to specialize this — basically only iterators that boil down to operating on slices would.

On Iterator, it's more obviously a hack for optimization, whereas IMO on slice there's a slight ergonomics benefit as well (it's a direct answer to a new user's question of "where is the equivalent of indexOf", for example, and .position(&v) is meaningfully less to type and IMO to think about than .iter().position(|x| *x == v))

Despite that it doesn't apply to most iterators, "iterators that boil down to operating on slices" is a large number of iterators in the stdlib alone. For example, if a function for this is on Iterator, it plausibly should be specialized for:

  • slice::Iter
  • slice::IterMut
  • alloc::vec::IntoIter
  • iter::Copied<slice::Iter>
  • ...

And several more. This is... tedious. Even if these are not done initially, over time we should expect to gain them as people file optimization PRs for their own specific cases. Putting it on Iterator for the purposes of being specializable, to me, implies we're open to this.

Note that this is especially true when it needs to be specialized for the iterated type as well, e.g. u8/i8 in this case, but we should eventually expect it to include other primitives as well.

Finally, putting it on slices already has the precedent of being where <[T]>::contains is as well.

@the8472
Copy link
Member

the8472 commented Apr 14, 2021

Since this is mostly about using memchr for performance reasons how about only putting them on [u8] and [i8]? For anything else the iterator methods should do. Ideally it would be for any [T] where {size_of::<T>() == 1} but last time I checked associated type bounds still weren't powerful enough to express something like that.

@thomcc
Copy link
Member Author

thomcc commented Apr 14, 2021

As I mentioned a few times, it's not solely about i8/u8:

Note however, that i8/u8 are not the only reasons to add these. Eventually (such as when portable simd is more usable) a specialization for u16, i16, u32, i32 f32, char... (and really, all the primitives) could be provided with simd and give slices of those types a substantial performance boost.

@jyn514
Copy link
Member

jyn514 commented Apr 14, 2021

Sure. I'm not picky about names at all, and always open to suggestions. Perhaps index_of and last_index_of have enough precedent in other languages to be better?

str::find is somewhat close:

pub fn find<'a, P>(&'a self, pat: P) -> Option<usize> where
    P: Pattern<'a>;

Maybe we could name this 'find'?

@jyn514 jyn514 added A-slice Area: `[T]` T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Apr 14, 2021
@thomcc
Copy link
Member Author

thomcc commented Apr 14, 2021

Hm, I guess this is redundant with #56345. I guess this can be closed since presumably someday someone will implement that mammoth of an RFC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-slice Area: `[T]` S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants