Skip to content

Commit c8eff68

Browse files
authored
Auto merge of #33090 - bluss:special-zip-2, r=aturon
Specialize .zip() for efficient slice and slice iteration The idea is to introduce a private trait TrustedRandomAccess and specialize .zip() for random access iterators into a counted loop. The implementation in the PR is internal and has no visible effect in the API Why a counted loop? To have each slice iterator compile to just a pointer, and both pointers are indexed with the same loop counter value in the generated code. When this succeeds, copying loops are readily recognized and replaced with memcpy and addition loops autovectorize well. The TrustedRandomAccess approach works very well on the surface. Microbenchmarks optimize well, following the ideas above, and that is a dramatic improvement of .zip()'s codegen. ```rust // old zip before this PR: bad, byte-for-byte loop // with specialized zip: memcpy pub fn copy_zip(xs: &[u8], ys: &mut [u8]) { for (a, b) in ys.iter_mut().zip(xs) { *a = *b; } } // old zip before this PR: single addition per iteration // with specialized zip: vectorized pub fn add_zip(xs: &[f32], ys: &mut [f32]) { for (a, b) in ys.iter_mut().zip(xs) { *a += *b; } } // old zip before this PR: single addition per iteration // with specialized zip: vectorized (!!) pub fn add_zip3(xs: &[f32], ys: &[f32], zs: &mut [f32]) { for ((a, b), c) in zs.iter_mut().zip(xs).zip(ys) { *a += *b * *c; } } ``` Yet in more complex situations, the .zip() loop can still fall back to its old behavior where phantom null checks throw in fake premature end of the loop conditionals. Remember that a NULL inside Option<(&T, &T)> makes it a `None` value and a premature (in this case) end of the loop. So even if we have 1) an explicit `Some` in the code and 2) the types of the pointers are `&T` or `&mut T` which are nonnull, we can still get a phantom null check at that point. One example that illustrates the difference is `copy_zip` with slice versus Vec arguments. The involved iterator types are exactly the same, but the Vec version doesn't compile down to memcpy. Investigating into this, the function argument metadata emitted to llvm plays the biggest role. As eddyb summarized, we need nonnull for the loop to autovectorize and noalias for it to replace with memcpy. There was an experiment to use `assume` to add a non-null assumption on each of the two elements in the specialized zip iterator, but this only helped in some of the test cases and regressed others. Instead I think the nonnull/noalias metadata issue is something we need to solve separately anyway. These have conditionally implemented TrustedRandomAccess - Enumerate - Zip These have not implemented it - Map is sideeffectful. The forward case would be workable, but the double ended case is complicated. - Chain, exact length semantics unclear - Filter, FilterMap, FlatMap and many others don't offer random access and/or exact length
2 parents be203ac + 5df05c6 commit c8eff68

File tree

7 files changed

+263
-21
lines changed

7 files changed

+263
-21
lines changed

Diff for: src/libcore/iter/iterator.rs

+2-1
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ use super::{Chain, Cycle, Cloned, Enumerate, Filter, FilterMap, FlatMap, Fuse,
2323
use super::ChainState;
2424
use super::{DoubleEndedIterator, ExactSizeIterator, Extend, FromIterator,
2525
IntoIterator};
26+
use super::ZipImpl;
2627

2728
fn _assert_is_object_safe(_: &Iterator<Item=()>) {}
2829

@@ -383,7 +384,7 @@ pub trait Iterator {
383384
fn zip<U>(self, other: U) -> Zip<Self, U::IntoIter> where
384385
Self: Sized, U: IntoIterator
385386
{
386-
Zip{a: self, b: other.into_iter()}
387+
Zip::new(self, other.into_iter())
387388
}
388389

389390
/// Takes a closure and creates an iterator which calls that closure on each

Diff for: src/libcore/iter/mod.rs

+165-20
Original file line numberDiff line numberDiff line change
@@ -301,7 +301,9 @@
301301

302302
use clone::Clone;
303303
use cmp;
304+
use default::Default;
304305
use fmt;
306+
use iter_private::TrustedRandomAccess;
305307
use ops::FnMut;
306308
use option::Option::{self, Some, None};
307309
use usize;
@@ -622,7 +624,8 @@ impl<A, B> DoubleEndedIterator for Chain<A, B> where
622624
#[stable(feature = "rust1", since = "1.0.0")]
623625
pub struct Zip<A, B> {
624626
a: A,
625-
b: B
627+
b: B,
628+
spec: <(A, B) as ZipImplData>::Data,
626629
}
627630

628631
#[stable(feature = "rust1", since = "1.0.0")]
@@ -631,29 +634,13 @@ impl<A, B> Iterator for Zip<A, B> where A: Iterator, B: Iterator
631634
type Item = (A::Item, B::Item);
632635

633636
#[inline]
634-
fn next(&mut self) -> Option<(A::Item, B::Item)> {
635-
self.a.next().and_then(|x| {
636-
self.b.next().and_then(|y| {
637-
Some((x, y))
638-
})
639-
})
637+
fn next(&mut self) -> Option<Self::Item> {
638+
ZipImpl::next(self)
640639
}
641640

642641
#[inline]
643642
fn size_hint(&self) -> (usize, Option<usize>) {
644-
let (a_lower, a_upper) = self.a.size_hint();
645-
let (b_lower, b_upper) = self.b.size_hint();
646-
647-
let lower = cmp::min(a_lower, b_lower);
648-
649-
let upper = match (a_upper, b_upper) {
650-
(Some(x), Some(y)) => Some(cmp::min(x,y)),
651-
(Some(x), None) => Some(x),
652-
(None, Some(y)) => Some(y),
653-
(None, None) => None
654-
};
655-
656-
(lower, upper)
643+
ZipImpl::size_hint(self)
657644
}
658645
}
659646

@@ -664,6 +651,61 @@ impl<A, B> DoubleEndedIterator for Zip<A, B> where
664651
{
665652
#[inline]
666653
fn next_back(&mut self) -> Option<(A::Item, B::Item)> {
654+
ZipImpl::next_back(self)
655+
}
656+
}
657+
658+
// Zip specialization trait
659+
#[doc(hidden)]
660+
trait ZipImpl<A, B> {
661+
type Item;
662+
fn new(a: A, b: B) -> Self;
663+
fn next(&mut self) -> Option<Self::Item>;
664+
fn size_hint(&self) -> (usize, Option<usize>);
665+
fn next_back(&mut self) -> Option<Self::Item>
666+
where A: DoubleEndedIterator + ExactSizeIterator,
667+
B: DoubleEndedIterator + ExactSizeIterator;
668+
}
669+
670+
// Zip specialization data members
671+
#[doc(hidden)]
672+
trait ZipImplData {
673+
type Data: 'static + Clone + Default + fmt::Debug;
674+
}
675+
676+
#[doc(hidden)]
677+
impl<T> ZipImplData for T {
678+
default type Data = ();
679+
}
680+
681+
// General Zip impl
682+
#[doc(hidden)]
683+
impl<A, B> ZipImpl<A, B> for Zip<A, B>
684+
where A: Iterator, B: Iterator
685+
{
686+
type Item = (A::Item, B::Item);
687+
default fn new(a: A, b: B) -> Self {
688+
Zip {
689+
a: a,
690+
b: b,
691+
spec: Default::default(), // unused
692+
}
693+
}
694+
695+
#[inline]
696+
default fn next(&mut self) -> Option<(A::Item, B::Item)> {
697+
self.a.next().and_then(|x| {
698+
self.b.next().and_then(|y| {
699+
Some((x, y))
700+
})
701+
})
702+
}
703+
704+
#[inline]
705+
default fn next_back(&mut self) -> Option<(A::Item, B::Item)>
706+
where A: DoubleEndedIterator + ExactSizeIterator,
707+
B: DoubleEndedIterator + ExactSizeIterator
708+
{
667709
let a_sz = self.a.len();
668710
let b_sz = self.b.len();
669711
if a_sz != b_sz {
@@ -680,12 +722,106 @@ impl<A, B> DoubleEndedIterator for Zip<A, B> where
680722
_ => unreachable!(),
681723
}
682724
}
725+
726+
#[inline]
727+
default fn size_hint(&self) -> (usize, Option<usize>) {
728+
let (a_lower, a_upper) = self.a.size_hint();
729+
let (b_lower, b_upper) = self.b.size_hint();
730+
731+
let lower = cmp::min(a_lower, b_lower);
732+
733+
let upper = match (a_upper, b_upper) {
734+
(Some(x), Some(y)) => Some(cmp::min(x,y)),
735+
(Some(x), None) => Some(x),
736+
(None, Some(y)) => Some(y),
737+
(None, None) => None
738+
};
739+
740+
(lower, upper)
741+
}
742+
}
743+
744+
#[doc(hidden)]
745+
#[derive(Default, Debug, Clone)]
746+
struct ZipImplFields {
747+
index: usize,
748+
len: usize,
749+
}
750+
751+
#[doc(hidden)]
752+
impl<A, B> ZipImplData for (A, B)
753+
where A: TrustedRandomAccess, B: TrustedRandomAccess
754+
{
755+
type Data = ZipImplFields;
756+
}
757+
758+
#[doc(hidden)]
759+
impl<A, B> ZipImpl<A, B> for Zip<A, B>
760+
where A: TrustedRandomAccess, B: TrustedRandomAccess
761+
{
762+
fn new(a: A, b: B) -> Self {
763+
let len = cmp::min(a.len(), b.len());
764+
Zip {
765+
a: a,
766+
b: b,
767+
spec: ZipImplFields {
768+
index: 0,
769+
len: len,
770+
}
771+
}
772+
}
773+
774+
#[inline]
775+
fn next(&mut self) -> Option<(A::Item, B::Item)> {
776+
if self.spec.index < self.spec.len {
777+
let i = self.spec.index;
778+
self.spec.index += 1;
779+
unsafe {
780+
Some((self.a.get_unchecked(i), self.b.get_unchecked(i)))
781+
}
782+
} else {
783+
None
784+
}
785+
}
786+
787+
#[inline]
788+
fn size_hint(&self) -> (usize, Option<usize>) {
789+
let len = self.spec.len - self.spec.index;
790+
(len, Some(len))
791+
}
792+
793+
#[inline]
794+
fn next_back(&mut self) -> Option<(A::Item, B::Item)>
795+
where A: DoubleEndedIterator + ExactSizeIterator,
796+
B: DoubleEndedIterator + ExactSizeIterator
797+
{
798+
if self.spec.index < self.spec.len {
799+
self.spec.len -= 1;
800+
let i = self.spec.len;
801+
unsafe {
802+
Some((self.a.get_unchecked(i), self.b.get_unchecked(i)))
803+
}
804+
} else {
805+
None
806+
}
807+
}
683808
}
684809

685810
#[stable(feature = "rust1", since = "1.0.0")]
686811
impl<A, B> ExactSizeIterator for Zip<A, B>
687812
where A: ExactSizeIterator, B: ExactSizeIterator {}
688813

814+
#[doc(hidden)]
815+
unsafe impl<A, B> TrustedRandomAccess for Zip<A, B>
816+
where A: TrustedRandomAccess,
817+
B: TrustedRandomAccess,
818+
{
819+
unsafe fn get_unchecked(&mut self, i: usize) -> (A::Item, B::Item) {
820+
(self.a.get_unchecked(i), self.b.get_unchecked(i))
821+
}
822+
823+
}
824+
689825
/// An iterator that maps the values of `iter` with `f`.
690826
///
691827
/// This `struct` is created by the [`map()`] method on [`Iterator`]. See its
@@ -982,6 +1118,15 @@ impl<I> DoubleEndedIterator for Enumerate<I> where
9821118
#[stable(feature = "rust1", since = "1.0.0")]
9831119
impl<I> ExactSizeIterator for Enumerate<I> where I: ExactSizeIterator {}
9841120

1121+
#[doc(hidden)]
1122+
unsafe impl<I> TrustedRandomAccess for Enumerate<I>
1123+
where I: TrustedRandomAccess
1124+
{
1125+
unsafe fn get_unchecked(&mut self, i: usize) -> (usize, I::Item) {
1126+
(self.count + i, self.iter.get_unchecked(i))
1127+
}
1128+
}
1129+
9851130
/// An iterator with a `peek()` that returns an optional reference to the next
9861131
/// element.
9871132
///

Diff for: src/libcore/iter_private.rs

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
// Copyright 2016 The Rust Project Developers. See the COPYRIGHT
2+
// file at the top-level directory of this distribution and at
3+
// http://rust-lang.org/COPYRIGHT.
4+
//
5+
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
6+
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
7+
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
8+
// option. This file may not be copied, modified, or distributed
9+
// except according to those terms.
10+
11+
12+
use iter::ExactSizeIterator;
13+
14+
/// An iterator whose items are random accessible efficiently
15+
///
16+
/// # Safety
17+
///
18+
/// The iterator's .len() and size_hint() must be exact.
19+
///
20+
/// .get_unchecked() must return distinct mutable references for distinct
21+
/// indices (if applicable), and must return a valid reference if index is in
22+
/// 0..self.len().
23+
#[doc(hidden)]
24+
pub unsafe trait TrustedRandomAccess : ExactSizeIterator {
25+
unsafe fn get_unchecked(&mut self, i: usize) -> Self::Item;
26+
}
27+

Diff for: src/libcore/lib.rs

+1
Original file line numberDiff line numberDiff line change
@@ -156,4 +156,5 @@ pub mod hash;
156156
pub mod fmt;
157157

158158
// note: does not need to be public
159+
mod iter_private;
159160
mod tuple;

Diff for: src/libcore/slice.rs

+15
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ use result::Result::{Ok, Err};
5050
use ptr;
5151
use mem;
5252
use marker::{Copy, Send, Sync, self};
53+
use iter_private::TrustedRandomAccess;
5354

5455
#[repr(C)]
5556
struct Repr<T> {
@@ -1942,3 +1943,17 @@ macro_rules! impl_marker_for {
19421943

19431944
impl_marker_for!(BytewiseEquality,
19441945
u8 i8 u16 i16 u32 i32 u64 i64 usize isize char bool);
1946+
1947+
#[doc(hidden)]
1948+
unsafe impl<'a, T> TrustedRandomAccess for Iter<'a, T> {
1949+
unsafe fn get_unchecked(&mut self, i: usize) -> &'a T {
1950+
&*self.ptr.offset(i as isize)
1951+
}
1952+
}
1953+
1954+
#[doc(hidden)]
1955+
unsafe impl<'a, T> TrustedRandomAccess for IterMut<'a, T> {
1956+
unsafe fn get_unchecked(&mut self, i: usize) -> &'a mut T {
1957+
&mut *self.ptr.offset(i as isize)
1958+
}
1959+
}

Diff for: src/libcoretest/iter.rs

+31
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ use core::{i8, i16, isize};
1313
use core::usize;
1414

1515
use test::Bencher;
16+
use test::black_box;
1617

1718
#[test]
1819
fn test_lt() {
@@ -1030,3 +1031,33 @@ fn bench_max(b: &mut Bencher) {
10301031
it.map(scatter).max()
10311032
})
10321033
}
1034+
1035+
pub fn copy_zip(xs: &[u8], ys: &mut [u8]) {
1036+
for (a, b) in ys.iter_mut().zip(xs) {
1037+
*a = *b;
1038+
}
1039+
}
1040+
1041+
pub fn add_zip(xs: &[f32], ys: &mut [f32]) {
1042+
for (a, b) in ys.iter_mut().zip(xs) {
1043+
*a += *b;
1044+
}
1045+
}
1046+
1047+
#[bench]
1048+
fn bench_zip_copy(b: &mut Bencher) {
1049+
let source = vec![0u8; 16 * 1024];
1050+
let mut dst = black_box(vec![0u8; 16 * 1024]);
1051+
b.iter(|| {
1052+
copy_zip(&source, &mut dst)
1053+
})
1054+
}
1055+
1056+
#[bench]
1057+
fn bench_zip_add(b: &mut Bencher) {
1058+
let source = vec![1.; 16 * 1024];
1059+
let mut dst = vec![0.; 16 * 1024];
1060+
b.iter(|| {
1061+
add_zip(&source, &mut dst)
1062+
});
1063+
}

Diff for: src/test/codegen/zip.rs

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
// Copyright 2016 The Rust Project Developers. See the COPYRIGHT
2+
// file at the top-level directory of this distribution and at
3+
// http://rust-lang.org/COPYRIGHT.
4+
//
5+
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
6+
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
7+
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
8+
// option. This file may not be copied, modified, or distributed
9+
// except according to those terms.
10+
11+
// compile-flags: -C no-prepopulate-passes -O
12+
13+
#![crate_type = "lib"]
14+
15+
// CHECK-LABEL: @zip_copy
16+
#[no_mangle]
17+
pub fn zip_copy(xs: &[u8], ys: &mut [u8]) {
18+
// CHECK: memcpy
19+
for (x, y) in xs.iter().zip(ys) {
20+
*y = *x;
21+
}
22+
}

0 commit comments

Comments
 (0)