Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] New implementation of the slice iterators using NonZero #46223

Closed
wants to merge 1 commit into from

Conversation

bluss
Copy link
Member

@bluss bluss commented Nov 23, 2017

Reimplement the slice iterators using NonZero where possible.

  1. Iter, IterMut now have a non-nullable field
    a. The usual enum layout optimizations apply
    b. Automatic and more principled nonnull metadata for that pointer
  2. Big change for zero sized elements. The end field is a counter.
    The produced pointer value is equivalent to slice.as_ptr() and
    thus &slice[i] for each index i in this case.

Implementation Notes

Regular case, T is not zero-sized

(Nothing changes in this case, but this is the documentation in the code.)

The iterator represents a contiguous range of elements.
ptr is a pointer to the start of the range.
end is a pointer to one past the end of the range.

Initial values are ptr = slice.as_ptr() and end = ptr.offset(slice.len())

The iterator is empty when ptr == end.
The iterator is stepped at the front by incrementing ptr, the
yielded element is the old value of ptr ("post increment" semantic).
The iterator is stepped at the back by decrementing end, the
yielded element is the new value of end ("pre decrement" semantic).

When T is a Zero-sized type (ZST)

ptr is a pointer to the start of the range.
end is used as an integer: the count of elements in the range.

Initial values are ptr = slice.as_ptr() and end = self.len() as *const T

The iterator is empty when end == 0
The iterator is stepped at the front by decrementing end.
The iterator is stepped at the back by decrementing end.

For both front and back, the yielded element is ptr, which does not change
during the iteration.

NonZero

ptr is derived from slice.as_ptr() and just like it is is non-null.
end is allowed to be null (used by the ZST implementation); this is
required since for iterating a [(); usize::MAX] all values of usize (and
equivalently the raw pointer) are used, including zero.

@bluss bluss changed the title New implementation of the slice iterators using NonZero [WIP] New implementation of the slice iterators using NonZero Nov 23, 2017
@bluss
Copy link
Member Author

bluss commented Nov 23, 2017

I haven't run so many benchmarks yet, but the ones I do are unchanged. These ones below are of very simple loops.

from ndarray "cargo bench --bench higher-order":

 name                   xx ns/iter  yy ns/iter  diff ns/iter  diff %
 fold_axis              681         676                   -5  -0.73%
 map_axis_0             451         452                    1   0.22%
 map_axis_1             452         453                    1   0.22%
 map_regular            172         169                   -3  -1.74%
 map_stride_double_f64  504         506                    2   0.40%
 map_stride_f64         652         666                   14   2.15%
 map_stride_u32         690         681                   -9  -1.30%

@kennytm kennytm added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Nov 23, 2017
@Gankra
Copy link
Contributor

Gankra commented Nov 23, 2017

These differences seem small enough to just be noise; is the generated assembly actually identical?

@eddyb
Copy link
Member

eddyb commented Nov 23, 2017

cc @rust-lang/libs

@bluss
Copy link
Member Author

bluss commented Nov 23, 2017

@gankro Sure, I intended to present them as exactly that, noise. However I'll have to find better benchmarks. I removed the zip benchmarks -- those are entirely in a specialization and don't go through Iterator::next().

The PR posted here so that the full test suite is ran and for comments.

@bluss
Copy link
Member Author

bluss commented Nov 24, 2017

Having looked through some benchmarks, nothing stood out. (Some examples a few pct better and apparently some a few pct worse, but it needs to be reduced to very small examples to tell us something.)

I found this code gen example that autovectorizes on nightly but not using a compiler with this PR:

#[no_mangle]
pub fn zip_and(a: &[i32], b: &[i32]) -> i32 {
    let mut result = 0;
    let mut iter_a = a.iter();
    let mut iter_b = b.iter();
    loop {
        if let Some(&elt_a) = iter_a.next() {
            if let Some(&elt_b) = iter_b.next() {
                result |= elt_a | elt_b;
                continue;
            }
        }
        break;
    }
    result
}

A "small" example of 450 lines of unoptimized IR without debug info.

This PR removes the assume(!end.is_null()) or equivalent, and maybe that causes this regression. Otherwise I am open to suggestions. (Edit: I checked, it doesn't seem to help to use a NonZero temprary around end in Iterator::next, for the sized case. 😕 )

@eddyb
Copy link
Member

eddyb commented Nov 24, 2017

Yeah, LLVM is inconsistent with its metadata, keeping the assume would still be useful. The only reason we don't automatically introduce assumes is because we're assuming they're expensive.

Reimplement the slice iterators using `NonZero` where possible.

1. Iter, IterMut now have a non-nullable field
   a. The usual enum layout optimizations apply
   b. Automatic and more principled nonnull metadata for that pointer
2. Big change for zero sized elements. The end field is a counter.
   The produced pointer value is equivalent to `slice.as_ptr()` and
   thus `&slice[i]` for each index `i` in this case.

* Implementation Notes

** Regular case, T is not zero-sized

(Nothing changes in this case, but this is the documentation in the code.)

The iterator represents a contiguous range of elements.
`ptr` is a pointer to the start of the range.
`end` is a pointer to one past the end of the range.

Initial values are ptr = slice.as_ptr() and end = ptr.offset(slice.len())

The iterator is empty when ptr == end.
The iterator is stepped at the front by incrementing ptr, the
yielded element is the old value of ptr ("post increment" semantic).
The iterator is stepped at the back by decrementing end, the
yielded element is the new value of end ("pre decrement" semantic).

** When T is a Zero-sized type (ZST)

`ptr` is a pointer to the start of the range.
`end` is used as an integer: the count of elements in the range.

Initial values are ptr = slice.as_ptr() and end = self.len() as *const T

The iterator is empty when end == 0
The iterator is stepped at the front by decrementing end.
The iterator is stepped at the back by decrementing end.

For both front and back, the yielded element is `ptr`, which does not change
during the iteration.

** NonZero

`ptr` is derived from `slice.as_ptr()` and just like it is is non-null.
`end` is allowed to be null (used by the ZST implementation); this is
required since for iterating a [(); usize::MAX] all values of usize (and
equivalently the raw pointer) are used, including zero.
@bluss
Copy link
Member Author

bluss commented Nov 26, 2017

(Rebased & updated with some cleanups, no major changes).

I've officially given up trying to fix the regression in #46223 (comment) I prototyped this out of tree, which was a bit quicker, and that "regression" does not care about assumes or other changes I thought were natural. The change to make_ref is literally the culprit for the "regression".

That's the macro that changes from:

macro_rules! make_ref {
    ($ptr:expr) => {{
        let ptr = $ptr;
        if size_from_ptr(ptr) == 0 {
            // Use a non-null pointer value
            &*(1 as *mut _)
        } else {
            &*ptr
        }
    }};
}

to this:

macro_rules! make_ref {
    ($ptr:expr) => { &*$ptr };
}

Even funnier, the macro can even be this (which is nonsense but correct in the new impl) and the vectorization does not regress:

macro_rules! make_ref {
    ($ptr:expr) => {{
        let ptr = $ptr;
        if false && size_from_ptr(ptr) == 0 {
            // Use a non-null pointer value
            &*(1 as *mut _)
        } else {
            &*ptr
        }
    }};
}

but we must refuse to insert nonsense of that degree; then I think there's no principle but some strange accident that is causing the original #46223 (comment) code to compile well.

@eddyb
Copy link
Member

eddyb commented Nov 28, 2017

@bluss Oh, have you tried having a ptr variable, instead of applying &* directly on $ptr?

@bluss
Copy link
Member Author

bluss commented Nov 28, 2017

@eddyb oh I think so, I tried a lot of things. I have resolved one bounds check regression in the nom testsuite (related to the unrolling / try_fold code ping @arthurprs ; unregressing it involved just tweaking how the ptrdistance was computed in that loop. This WIP PR is not entirely up to date since I'm still trying to fix regressions locally.)

@arthurprs
Copy link
Contributor

You mean this #45501? That's an awesome side effect 😃

@bluss
Copy link
Member Author

bluss commented Nov 28, 2017

Oh just maybe, it was actually just regress (due to this PR) and unregress (due to WIP on this PR). Just thinking of it since it's the same issue.

@kennytm
Copy link
Member

kennytm commented Dec 6, 2017

Hi @bluss, is there anything supposed to happen for this PR to proceed?

@bluss
Copy link
Member Author

bluss commented Dec 7, 2017

@kennytm I've been trying to work on a no-performance-regressions version with no success. I'll have to come back to this with a different goal, probably demonstrating some code savings in some places and code gen regressions in other places.

I was using nom benchmarks for the no-performance-regressions goal. cc @Geal

@aidanhs
Copy link
Member

aidanhs commented Dec 14, 2017

Visiting for triage: just to be clear, it sounds like you're going to continue work in this PR rather than create a new one when ready?

@bluss bluss closed this Dec 21, 2017
@bluss
Copy link
Member Author

bluss commented Dec 21, 2017

Yes. I just haven't had time, sorry. Someone else can also pick it up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants