Skip to content

Conversation

@mzabaluev
Copy link
Contributor

Instead of a fast branch with a sized iterator falling back to a potentially poorly optimized iterate-and-push loop, a single efficient loop can serve all cases.

In my benchmark runs, I see some good gains, but also some regressions, possibly due to different inlining choices by the compiler. YMMV.

Use one loop, efficient for both sized and size-ignorant iterators
(including iterators lying about their size).
For the first ever element to put into a vector, the branching
conditions are more predictable.
Implement both Vec::from_iter and extend in terms of an internal
method working with Iterator. Otherwise, the code below ends up
using two monomorphizations of extend, differing only in the
implementation of IntoIterator:

let mut v = Vector::from_iter(iterable1);
v.extend(iterable2);
@rust-highfive
Copy link
Contributor

r? @gankro

(rust_highfive has picked a reviewer for you, use r? to override)

@Gankra
Copy link
Contributor

Gankra commented Feb 22, 2015

Can you post your experimental results.

@mahkoh
Copy link
Contributor

mahkoh commented Feb 22, 2015

Does it optimize to a memcpy if the src is contiguous? If not then it will have to be changed anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use while let here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can, good point.

@mzabaluev
Copy link
Contributor Author

Results of benchmarking on i686: https://gist.github.com/mzabaluev/df41c2f50464416b0a26

tl;dr: Notable losers are (observed consistently over multiple bench runs):

Test Percentage
bit::bit_vec_bench::bench_bit_vec_big_iter 189%
slice::bench::sort_big_random_small 143%
slice::bench::zero_1kb_from_elem 184%
vec::tests::bench_extend_0000_1000 119%
vec::tests::bench_from_elem_0100 129%
vec::tests::bench_from_iter_0010 121%
vec::tests::bench_from_iter_0100 125%

Winners:

Test Percentage
string::tests::bench_push_char_two_bytes 95%
string::tests::from_utf8_lossy_100_invalid 38%
vec::tests::bench_extend_0000_0010 87%
vec::tests::bench_extend_0000_0100 89%
vec::tests::bench_from_elem_1000 48%
vec::tests::bench_from_iter_1000 81%

Overall, I don't think too much trust should be put in microbenchmarks repeatedly testing one call-site optimized function.

@huonw
Copy link
Contributor

huonw commented Feb 22, 2015

How much is "winning" and "losing'"?

@mzabaluev
Copy link
Contributor Author

@mahkoh These changes should not drastically change the performance with slice iterators, in which case it's as close to reserve + type-aligned memcpy as the optimizer can make it. The main purpose is to provide comparable efficiency for size-unaware or pessimistic iterators.

@mzabaluev
Copy link
Contributor Author

@huonw I've added percentages to the comment above.

@pczarn
Copy link
Contributor

pczarn commented Feb 23, 2015

Two of these methods have #[inline], but extend_desugared does not; are benchmarks affected? Is there some benefit to having extend_desugared vs calling vector.extend(iterator)? Does extend_desugared change inlining choices, too?

@mzabaluev
Copy link
Contributor Author

I actually ran benchmarks once with #[inline] on extend_desugared, but the results did not look more different than some jitter. I don't think #[inline] has any effect on generic methods, as they are always available in metadata, so the compiler can always choose to inline them across crates. Even less so for unit tests.

@mzabaluev
Copy link
Contributor Author

There should be more benchmarks with extending from sizeless/pessimistic iterators. Any good candidates in libcollections?

@Gankra
Copy link
Contributor

Gankra commented Feb 23, 2015

BTreeMap's range iter, VecMap's iter, and bitvset's iter have no idea how many elements they contain, but they're all also doing a non-trivial amount of work to actually yield those elements.

@mzabaluev
Copy link
Contributor Author

@pczarn Without looking too closely into the results (I get easily frustrated with the long build times of the library crates and their tests), I assume the optimizer has more "reason" to share extend_desugared if it is used in various places, unless the tradeoff between cache/branch predictor efficiency and removing the overhead of calling leans towards the latter.

@steveklabnik
Copy link
Contributor

How do we all feel about this PR today?

@Gankra
Copy link
Contributor

Gankra commented Apr 22, 2015

Oh geez, this slipped right through the cracks! I don't have a great gut on this since it seems to just be shuffling perf around. r? @huonw

@rust-highfive rust-highfive assigned huonw and unassigned Gankra Apr 22, 2015
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, does this actually improve performance, over the simpler:

let mut vector = Vec::with_capacity(iterator.size_hint().0);
vector.extend_desugared(iterator);
vector

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This avoids a branch that is present in extend_desugared. That branch tends to be not taken, but in the first iteration the vector is always expanded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, why it does it avoid that branch? It seems that if the iterator has a non-zero size hint the with_capacity will ensure that it isn't taken, and in the case that the iterator has a zero-size hint it seems both with and without this are equally bad off?

Basically, I'm asking if this was noticeable in practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, and it was somehow lost on me that Vec::with_capacity(1) allocates exactly one element. I should simplify the code to your suggestion; the performance difference will likely be negligible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the suggested change, vec::bench_from_iter_0000 regresses
from 29 ns/iter (+/- 2) to 87 ns/iter (+/- 48) in my testing
(rebased against commit 3dbfa74), the other benchmarks seemingly unaffected. I wonder if it can be considered an edge case.

@Gankra
Copy link
Contributor

Gankra commented May 19, 2015

@huonw @mzabaluev What's up with this?

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label May 26, 2015
@Gankra
Copy link
Contributor

Gankra commented May 28, 2015

Closing due to inactivity.

@Gankra Gankra closed this May 28, 2015
@mzabaluev
Copy link
Contributor Author

@gankro Ah sorry, I let your request slip by. I believe the branch is good as it is. @huonw's suggestion, while good from code clarity point of view, resulted in some performance degradation as mentioned in the discussion above.

@Gankra
Copy link
Contributor

Gankra commented May 30, 2015

@huonw Any reason not to merge this?

@Gankra
Copy link
Contributor

Gankra commented Jun 4, 2015

huon's busy

@Gankra Gankra reopened this Jun 4, 2015
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling size_hint in a loop seems really bad. This is not necessarily a straight-forward or cheap method. Is hoisting it out not worth it in your testing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is called exponentially rarely, so it seems fine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh derp, sure.

@Gankra
Copy link
Contributor

Gankra commented Jun 16, 2015

At worst I like this better than the old code. r+

@huonw
Copy link
Contributor

huonw commented Jun 16, 2015

@bors r+

@bors
Copy link
Collaborator

bors commented Jun 16, 2015

📌 Commit 7b464d3 has been approved by huonw

@bors
Copy link
Collaborator

bors commented Jun 17, 2015

⌛ Testing commit 7b464d3 with merge 0250ff9...

bors added a commit that referenced this pull request Jun 17, 2015
Instead of a fast branch with a sized iterator falling back to a potentially poorly optimized iterate-and-push loop, a single efficient loop can serve all cases.

In my benchmark runs, I see some good gains, but also some regressions, possibly due to different inlining choices by the compiler. YMMV.
@bors bors merged commit 7b464d3 into rust-lang:master Jun 17, 2015
@brson brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Jun 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

relnotes Marks issues that should be documented in the release notes of the next release. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants