Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More efficient Vec::reserve strategy #14264

Closed
wants to merge 1 commit into from

Conversation

stepancheg
Copy link
Contributor

Before this commit new capacity was always chosen as the next power of
two for requested capacity.

It is inefficient in scenarios where several elements are pushed to the
vector at once, and it is final operations. Example:

let v = Vec::new();
v.grow(10, false);
// do something with vector

With patch applied, new capacity is chosen as max of current capacity
doubled and requested capacity.

Before this patch resulting capacity of the vector from the example
above is 16. With patch applied, capacity is 10.

Similar strategy is used, for instance, to reserve capacity of a vector in
libc++ (1) (vector::__recommend()) or to reserve capacity of ArrayList
in OpenJDK (2) (ArrayList.grow()).

Patch also includes minor adjustment that does not affect reallocations:
capacity is compared to self.cap, not to self.len in reserve().

(1) http://llvm.org/viewvc/llvm-project/libcxx/trunk/include/vector?view=markup
(2) http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/ArrayList.java

Before this commit new capacity was always chosen as the next power of
two for requested capacity.

It is inefficient in scenarios where several elements are pushed to the
vector at once, and it is final operations. Example:

```
let v = Vec::new();
v.grow(10, false);
// do something with vector
```

With patch applied, new capacity is chosen as max of current capacity
doubled and requested capacity.

Before this patch resulting capacity of the vector from the example
above is 16. With patch applied, capacity is 10.

Similar strategy is used, for instance, to reserve capacity of a vector in
libc++ (1) (`vector::__recommend()`) or to reserve capacity of ArrayList
in OpenJDK (2) (`ArrayList.grow()`).

Patch also includes minor adjustment that does not affect reallocations:
capacity is compared to `self.cap`, not to `self.len` in `reserve()`.

(1) http://llvm.org/viewvc/llvm-project/libcxx/trunk/include/vector?view=markup
(2) http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/ArrayList.java
@lilyball
Copy link
Contributor

This is only more efficient when you're doing a single mutation to the vector. If you push anything else to the vector afterwards, then it's markedly less efficient:

let v = Vec::new();
v.grow(10, false);
v.grow(6, true);

You cannot assume that any given operation that lengthens a vector is the only operation that's going to be applied. In fact, it's relatively common to append several collections to a vector in a row.

If you know you're only going to be doing the one mutation operation, you can use .reserve_exact() ahead of time to avoid the excess capacity. But libstd doesn't have enough knowledge to know when this is appropriate. Only clients of the library can make that call.

@stepancheg
Copy link
Contributor Author

@kballard Consider a.push_all(b) Basically, there are two mutually exclusive cases:

  1. a.len() <= b.len(). In this case patched reserve is as efficient as original reserve
    statistically: for random sets of lengths of a and b. For example, patched reserve loses in grow(10); grow(6); and wins in grow(10); grow(7);
  2. a.len() < b.len(). In this case patched reserve is clear winner.

Since probablility of case 2 is greater than zero, patched reserve is statistically better.

reserve_exact is not panacea, because reserve* are rarely used directly, but rather implicitly in push_all, push_all_move, append and similar operations.

@lilyball
Copy link
Contributor

@stepancheg

Asymptotically? Vectors don't grow asymptotically. And the patched reserve doesn't win in grow(10); grow(7);. Both the original and the patched behavior has to reallocate on the grow(7), which means that it's at best a tie. Although the patched reserve will end up with a capacity of 20 and the original with a capacity of 32. In the general case, it's hard to say which is better (it largely depends on whether the vector will continue to grow).

I also disagree that the a.len() < b.len() case is a "clear winner". This is equivalent to the grow(7) case, in that both approaches will have to reallocate, but the patched reserve will end up with a smaller capacity. In fact, the patched reserve is very likely to be worse-performing here, because even appending a single element will require another reallocation (as it did an exact reserve).

Which is to say, once again, the patched reserve definitively loses if you have to grow the vector again.

But of course we're also missing the obvious reason why this is bad, which is that it's disregarding proper allocator behavior. Allocating powers of 2 is generally the best way to handle allocators (although I believe @thestinger has indicated that, for jemalloc, we should actually be using powers of 2 only once the allocation size is >=4096, and should be using a different scheme with IIRC a 1.5x multiplier below that).

@stepancheg
Copy link
Contributor Author

@kballard I meant statistically.

Although the patched reserve will end up with a capacity of 20 and the original with a capacity of 32. In the general case, it's hard to say which is better (it largely depends on whether the vector will continue to grow).

I also disagree that the a.len() < b.len() case is a "clear winner". This is equivalent to the grow(7) case, in that both approaches will have to reallocate, but the patched reserve will end up with a smaller capacity. In fact, the patched reserve is very likely to be worse-performing here, because even appending a single element will require another reallocation (as it did an exact reserve).

If vector continues to grow, it is a tie (statistically).

Which is to say, once again, the patched reserve definitively loses if you have to grow the vector again.

No, it is not. It has the same number of reallocations and the same average capacity overhead. Patched realloc wins if last push pushes more than vec already contains.

But of course we're also missing the obvious reason why this is bad, which is that it's disregarding proper allocator behavior. Allocating powers of 2 is generally the best way to handle allocators (although I believe @thestinger has indicated that, for jemalloc, we should actually be using powers of 2 only once the allocation size is >=4096, and should be using a different scheme with IIRC a 1.5x multiplier below that).

I guess, it is quite the contrary: allocators use powers of two for small allocations. But it is not important right now, and you made a good point.

BTW, if rust's going to rely on implementation details of concrete allocator (jemalloc), I suppose it should query allocator for actual allocated memory size (something like, je_sallocx function) and store it in vector cap field. In this case, patched realloc will be even better.

Anyway, I think this patch is not very important, so I better close this PR.

@stepancheg stepancheg closed this May 18, 2014
@thestinger
Copy link
Contributor

@kballard: The crossover point is actually far larger than 4096 bytes, but otherwise you're remembering correctly. Using two as the multiplier is the worst case at small sizes because it misses any chance to do in-place growth.

flip1995 pushed a commit to flip1995/rust that referenced this pull request Feb 27, 2025
The `IoBufRead` diagnostic has been added during the latest rustup.

changelog: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants