-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iterator methods produce slow code #18193
Comments
I'm getting different results: $ rustc --opt-level=3 --test slow_position.rs running 4 tests test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured (the assembly version segfaulted on me). |
wow. What platform? |
On a macbook. Here's another run without a bunch of stuff in the background screwing up the variance: $ rustc --opt-level=3 --test slow_position.rs running 4 tests |
The IL for .loop.preheader: ; preds = %.noexc15
br label %.loop
for_loopback.i: ; preds = %for_body.i14
%16 = icmp eq i8* %18, %14
br i1 %16, label %.noexc6.loopexit, label %.exit
.loop: ; preds = %.loop.preheader, %for_loopback.i
%i = phi i64 [ %21, %for_loopback.i ], [ 0, %.loop.preheader ]
%ix = phi i8* [ %18, %for_loopback.i ], [ %.pre, %.loop.preheader ]
%17 = icmp eq i8* %ix, null ; WTF? spurious null check
br i1 %17, label %.noexc6.loopexit, label %for_body.i14
for_body.i14: ; preds = .loop
%18 = getelementptr inbounds i8* %ix, i64 1
%19 = load i8* %ix, align 1
%20 = icmp eq i8 %19, 1
%21 = add i64 %i, 1
br i1 %20, label %.noexc6.loopexit, label %for_loopback.i Which looks sane, except for the spurious null check (which remains in the assembly). Benchmarks on my machine:
|
The spurious null check is the issue – writing a |
New code: #![feature(asm)]
#![allow(unstable)]
extern crate test;
use test::{Bencher};
#[inline(never)]
fn gen() -> Vec<u8> {
(0..1024*65).map(|_| 0).collect()
}
#[bench]
fn position(b: &mut Bencher) {
let v = gen();
b.iter(|| {
test::black_box(v.as_slice().iter().position(|&c| c == 1));
});
}
#[bench]
fn iter(b: &mut Bencher) {
let v = gen();
b.iter(|| {
let mut res = None;
let mut i = 0us;
for &b in v.as_slice().iter() {
if b == 1 {
res = Some(i);
break;
}
i += 1;
}
test::black_box(res);
});
}
#[bench]
fn enumerate(b: &mut Bencher) {
let v = gen();
b.iter(|| {
let mut res = None;
for (i, &b) in v.as_slice().iter().enumerate() {
if b == 1 {
res = Some(i);
break;
}
}
test::black_box(res);
});
}
#[bench]
fn _range(b: &mut Bencher) {
let v = gen();
b.iter(|| {
let mut res = None;
for i in (0..v.len()) {
if v[i] == 1 {
res = Some(i);
break;
}
}
test::black_box(res);
});
}
#[bench]
fn assembly(b: &mut Bencher) {
let v = gen();
b.iter(|| {
unsafe {
let mut start = v.as_ptr();
let end = start.offset(v.len() as isize);
asm!("
dec $0
.align 16, 0x90
AGAIN:
inc $0
cmp $0, $1
je EXIT
cmpb $$1, ($0)
jne AGAIN
EXIT:
" : "+r"(start) : "r"(end));
if start < end {
test::black_box(Some(start as usize - v.as_ptr() as usize));
} else {
test::black_box(None::<u8>);
}
}
});
} New results:
Unacceptable performance. |
The data pointer used in the slice is never null, using assume() to tell LLVM about it gets rid of various unneeded null checks when iterating over the slice. Since the snapshot compiler is still using an older LLVM version, omit the call in stage0, because compile times explode otherwise. Benchmarks from rust-lang#18193 ```` running 5 tests test _range ... bench: 33329 ns/iter (+/- 417) test assembly ... bench: 33299 ns/iter (+/- 58) test enumerate ... bench: 33318 ns/iter (+/- 83) test iter ... bench: 33311 ns/iter (+/- 130) test position ... bench: 33300 ns/iter (+/- 47) test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured ```` Fixes rust-lang#18193
The data pointer used in the slice is never null, using assume() to tell LLVM about it gets rid of various unneeded null checks when iterating over the slice. Since the snapshot compiler is still using an older LLVM version, omit the call in stage0, because compile times explode otherwise. Benchmarks from #18193 ```` running 5 tests test _range ... bench: 33329 ns/iter (+/- 417) test assembly ... bench: 33299 ns/iter (+/- 58) test enumerate ... bench: 33318 ns/iter (+/- 83) test iter ... bench: 33311 ns/iter (+/- 130) test position ... bench: 33300 ns/iter (+/- 47) test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured ```` Fixes #18193
The data pointer used in the slice is never null, using assume() to tell LLVM about it gets rid of various unneeded null checks when iterating over the slice. Since the snapshot compiler is still using an older LLVM version, omit the call in stage0, because compile times explode otherwise. Benchmarks from rust-lang#18193 ```` running 5 tests test _range ... bench: 33329 ns/iter (+/- 417) test assembly ... bench: 33299 ns/iter (+/- 58) test enumerate ... bench: 33318 ns/iter (+/- 83) test iter ... bench: 33311 ns/iter (+/- 130) test position ... bench: 33300 ns/iter (+/- 47) test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured ```` Fixes rust-lang#18193
fix: Don't report a startup error when a discover command is configured Previously, r-a would show an error if both fetch_workspaces_queue and discover_workspace_queue were empty. We're in this state at startup, so users would see an error if they'd configured discover_workspace_config. Instead, allow the fetch_workspaces_queue to have zero items if discover_workspace_config is set. Whilst we're here, prefer "failed to fetch" over "failed to discover", so the error message better reflects what this function is doing.
Consider the following benchmark:
Which produces the following output:
test _range ... bench: 65200 ns/iter (+/- 1033)
test assembly ... bench: 60802 ns/iter (+/- 248)
test enumerate ... bench: 64441 ns/iter (+/- 566)
test iter ... bench: 91170 ns/iter (+/- 465)
test position ... bench: 91112 ns/iter (+/- 384)
position
is the correct abstraction for this but its code is 50% slower than the naive assembly implementation and 40% slower thanenumerate
.The text was updated successfully, but these errors were encountered: