Tiny Vecs are dumb. #72227

nnethercote · 2020-05-15T07:20:25Z

Currently, if you repeatedly push to an empty vector, the capacity
growth sequence is 0, 1, 2, 4, 8, 16, etc. This commit changes the
relevant code (the "amortized" growth strategy) to skip 1 and 2, instead
using 0, 4, 8, 16, etc. (You can still get a capacity of 1 or 2 using
the "exact" growth strategy, e.g. via reserve_exact().)

This idea (along with the phrase "tiny Vecs are dumb") comes from the
"doubling" growth strategy that was removed from RawVec in #72013.
That strategy was barely ever used -- only when a VecDeque was grown,
oddly enough -- which is why it was removed in #72013.

(Fun fact: until just a few days ago, I thought the "doubling" strategy
was used for repeated push case. In other words, this commit makes
Vecs behave the way I always thought they behaved.)

This change reduces the number of allocations done by rustc itself by
10% or more. It speeds up rustc, and will also speed up any other Rust
program that uses Vecs a lot.

In theory, the change could increase memory usage, but in practice it
doesn't. It would be an unusual program where very small Vecs having a
capacity of 4 rather than 1 or 2 would make a difference. You'd need a
lot of very small Vecs, and/or some very small Vecs with very
large elements.

r? @Amanieu

nnethercote · 2020-05-15T07:23:03Z

Some local instruction count results, for check-full and debug-full runs:

many-assoc-items-check
        avg: -3.8%      min: -3.8%      max: -3.8%
many-assoc-items-debug
        avg: -3.7%      min: -3.7%      max: -3.7%
issue-46449-debug
        avg: -3.4%      min: -3.4%      max: -3.4%
unused-warnings-check
        avg: -3.1%      min: -3.1%      max: -3.1%
deeply-nested-debug
        avg: -2.8%      min: -2.8%      max: -2.8%
unused-warnings-debug
        avg: -2.8%      min: -2.8%      max: -2.8%
regression-31157-debug
        avg: -2.7%      min: -2.7%      max: -2.7%
cargo-debug
        avg: -2.7%      min: -2.7%      max: -2.7%
webrender-debug
        avg: -2.6%      min: -2.6%      max: -2.6%
hyper-2-debug
        avg: -2.6%      min: -2.6%      max: -2.6%
syn-debug
        avg: -2.5%      min: -2.5%      max: -2.5%
encoding-check
        avg: -2.5%      min: -2.5%      max: -2.5%
regex-check
        avg: -2.4%      min: -2.4%      max: -2.4%
piston-image-debug
        avg: -2.4%      min: -2.4%      max: -2.4%
clap-rs-debug
        avg: -2.3%      min: -2.3%      max: -2.3%
syn-check
        avg: -2.2%      min: -2.2%      max: -2.2%
webrender-check
        avg: -2.1%      min: -2.1%      max: -2.1%
cranelift-codegen-debug
        avg: -2.1%      min: -2.1%      max: -2.1%
ripgrep-check
        avg: -2.1%      min: -2.1%      max: -2.1%
futures-debug
        avg: -2.1%      min: -2.1%      max: -2.1%
encoding-debug
        avg: -2.1%      min: -2.1%      max: -2.1%
packed-simd-debug
        avg: -2.0%      min: -2.0%      max: -2.0%
hyper-2-check
        avg: -2.0%      min: -2.0%      max: -2.0%
packed-simd-check
        avg: -2.0%      min: -2.0%      max: -2.0%
piston-image-check
        avg: -1.9%      min: -1.9%      max: -1.9%
regex-debug
        avg: -1.9%      min: -1.9%      max: -1.9%
cranelift-codegen-check
        avg: -1.8%      min: -1.8%      max: -1.8%
cargo-check
        avg: -1.8%      min: -1.8%      max: -1.8%
futures-check
        avg: -1.8%      min: -1.8%      max: -1.8%
tokio-webpush-simple-debug
        avg: -1.8%      min: -1.8%      max: -1.8%
unicode_normalization-debug
        avg: -1.7%      min: -1.7%      max: -1.7%
clap-rs-check
        avg: -1.7%      min: -1.7%      max: -1.7%
serde-debug
        avg: -1.6%      min: -1.6%      max: -1.6%
webrender-wrench-check
        avg: -1.6%      min: -1.6%      max: -1.6%
ripgrep-debug
        avg: -1.5%      min: -1.5%      max: -1.5%
serde-check
        avg: -1.5%      min: -1.5%      max: -1.5%
webrender-wrench-debug
        avg: -1.5%      min: -1.5%      max: -1.5%
regression-31157-check
        avg: -1.5%      min: -1.5%      max: -1.5%
tokio-webpush-simple-check
        avg: -1.4%      min: -1.4%      max: -1.4%
html5ever-debug
        avg: -1.4%      min: -1.4%      max: -1.4%
unicode_normalization-check
        avg: -1.3%      min: -1.3%      max: -1.3%
await-call-tree-debug
        avg: -1.3%      min: -1.3%      max: -1.3%
html5ever-check
        avg: -1.1%      min: -1.1%      max: -1.1%
ucd-debug
        avg: -1.1%      min: -1.1%      max: -1.1%
await-call-tree-check
        avg: -1.0%      min: -1.0%      max: -1.0%

@bors try @rust-timer queue

rust-timer · 2020-05-15T07:23:05Z

Awaiting bors try build completion

bors · 2020-05-15T07:23:15Z

⌛ Trying commit dac8c474a4a0cc81ba4936bf0d55841c79e1fb0d with merge 291f8c65f12a6ed4401e0c6cb477f3429c32b9ac...

Amanieu · 2020-05-15T09:06:31Z

I wonder if we can do the same thing for HashMap: rust-lang/hashbrown#47. I left that issue open because I wasn't sure how often small (1-3 elements) hashmaps are used.

leonardo-m · 2020-05-15T09:24:48Z

I left that issue open because I wasn't sure how often small (1-3 elements) hashmaps are used.

In Python even small associative arrays are used often, in Rust I think they are quite less common (for various reasons, one reason is the lack of handy hashmap literal syntax).

bors · 2020-05-15T10:21:39Z

☀️ Try build successful - checks-azure
Build commit: 291f8c65f12a6ed4401e0c6cb477f3429c32b9ac (291f8c65f12a6ed4401e0c6cb477f3429c32b9ac)

rust-timer · 2020-05-15T10:21:40Z

Queued 291f8c65f12a6ed4401e0c6cb477f3429c32b9ac with parent 0271499, future comparison URL.

ollie27 · 2020-05-15T11:14:03Z

Fun fact: until just a few days ago, I thought the "doubling" strategy
was used for repeated push case.

I thought the same. It turns out that used to be the case until #50739. We will definitely need a regression test for this.

rust-timer · 2020-05-15T12:57:55Z

Finished benchmarking try commit 291f8c65f12a6ed4401e0c6cb477f3429c32b9ac, comparison URL.

Amanieu · 2020-05-15T14:21:43Z

max-rss shows a regressions in the results, but I'm not sure how reliable that indicator is or how much we care about it.

Amanieu · 2020-05-15T14:23:25Z

I wonder how much more speedup we can get by skipping straight to 8 elements, but I am hesitant since we don't seem to have a good way of tracking memory usage.

hanna-kruppe · 2020-05-15T16:43:59Z

I'm not sure how representative the rustc results really are, since SmallVec is used heavily wherever profiling revealed that many vectors with few elements were allocated. In other words, rustc went through a lot of effort to not have tons of small (say, 1-3 elements) vectors in the first place, while many other programs with similar workloads probably haven't.

nnethercote · 2020-05-15T22:30:18Z

@Amanieu: The bad news is that max-rss is highly unreliable, alas. E.g. look at this noise run, which compares two revisions with no signficant differences.

The good news is that I can get accurate peak heap memory measurements with DHAT. I can do some measurements with skip-to-4 and skip-to-8 on Monday.

nnethercote · 2020-05-15T22:37:05Z

rustc went through a lot of effort to not have tons of small (say, 1-3 elements) vectors in the first place, while many other programs with similar workloads probably haven't.

It's true that rustc uses SmallVec a lot. But this means that the rustc measurements might underestimate the speed improvements that other programs might get. Think about vecs with two elements:

If you use SmallVec<[T; 2]>, you've already eliminated the allocations and this PR has no effect.
If you use Vec<T>, this PR changes the number of allocations from 2 to 1, saving time.

For vecs with four or more elements, the number of allocations in the Vec<T> case is reduced by 2, even better.

nnethercote · 2020-05-16T00:07:25Z

Having said that, I can see that rustc's heavy use of SmallVec might underplay the memory usage impact as well.

But in general I'm not worried about memory usage. Vec makes no promises about capacities. If you have a program where the capacity of 1 and 2 length Vec's has a critical impact on memory usage (e.g. due to having many short vectors and/or short vectors with very large elements) then you should use a collection type that provides clear guarantees about capacity, such as SmallVec. And I think such cases will be rare, anyway.

hanna-kruppe · 2020-05-16T09:55:15Z

rustc went through a lot of effort to not have tons of small (say, 1-3 elements) vectors in the first place, while many other programs with similar workloads probably haven't.

It's true that rustc uses SmallVec a lot. But this means that the rustc measurements might underestimate the speed improvements that other programs might get. Think about vecs with two elements:
* If you use `SmallVec<[T; 2]>`, you've already eliminated the allocations and this PR has no effect.

* If you use `Vec<T>`, this PR changes the number of allocations from 2 to 1, saving time.
For vecs with four or more elements, the number of allocations in the Vec<T> case is reduced by 2, even better.

Fair point, but there are also factors pulling in the opposite direction. For Vecs with 1 element (also a common case in some parts of rustc), you don't save any reallocations. Even with 2 elements, you "only" save one reallocation instead of two, so depending on the distribution of Vec sizes in a program you'll see a proportionally smaller speed-up (imagine an application spending about as much time on reallocating Vecs as rustc does today, but more of it on tiny Vecs). And of course, the impact on memory usage is not entirely separate from performance (e.g. in a typical size-class-based allocator, Vecs with 1-2 small elements may waste half of each cache line that could hold other useful data).

None of this is to say I think this is a bad change overall, it's just not quite so obvious to me how the effects on rustc will translate to other code bases and I really wish we had a good benchmark suite for runtime (not just compile time) of Rust applications.

ChrisJefferson · 2020-05-17T09:59:29Z

Would it be worth making the size increase dependent on the size of the unserlying object? In particular, if it is very small (1 or 2 bytes), you probably lose nothing at all starting at an 8 or 16 byte allocation (depending on memory manager)

nnethercote · 2020-05-17T19:22:03Z

@ChrisJefferson: that's a good idea. And if the elements are really big (256B? 1024B?) we could ratchet down the minimum capacity. I will do some measurements.

nnethercote · 2020-05-18T04:39:23Z

Here are the results for four different minimum non-zero sizes: 1 (original), 2, 4, and 8.

CACHEGRIND: INSTRUCTIONS EXECUTED (M)
            Tiny1     Tiny2          Tiny4          Tiny8
cargo       27,675M   27,420(-255)   27,179(-241)   27,131(-48)
cranelift   21,654M   21,441(-213)   21,264(-177)   21,246(-18)
futures      2,799M    2,774(-25)     2,748(-26)     2,737(-11)
many-assoc   6,433M    6,392(-41)     6,189(-203)    6,275(+86)
regex        3,005M    2,969(-36)     2,933(-36)     2,922(-11)
webrender   11,130M   11,007(-123)   10,895(-112)   10,871(-24)

Tiny2 is a clear improvement. Tiny4 is roughly 2x better than Tiny4 on all except many-assoc-items where it is almost 6x better. Tiny8 is a little better than Tiny4 on all except many-assoc-items where it is notably worse.

DHAT: PEAK HEAP SIZE (MB)
            Tiny1  Tiny2     Tiny4     Tiny8     
cargo       248    250(+2)   258(+8)   276(+18)  
cranelift   314    315(+1)   321(+6)   336(+15)  
futures      30     30(+0)    30(+0)    32(+ 2)  
many-assoc  101    102(+1)   104(+2)   109(+ 5)  
regex        53     53(+0)    54(+1)    57(+ 3)  
webrender   151    152(+1)   155(+3)   164(+ 9)

Tiny2 gives a tiny increase, Tiny4 a little bigger, and Tiny8 much more.

Things to note:

DHAT measures the requested size, not the actual size, which is often larger than the requested size due to rounding up. This means that the numbers above represent an upper limit for the peak increase.
The heap is only part of total memory usage; there is also stacks, static data, and code. max-rss figures for the above benchmarks are roughly double the peak heap, which means the max-rss increases will be roughly half the proportions in this table.

Tiny4 looks like the sweet spot, giving near-maximal speed improvements for a modest peak memory cost. I have uploaded code containing Tiny4 with slight modifications:

It skips to 8 if the element size is 1.
It skips to 1 if the element size is > 1024.

I measured that too and it gave results incredibly similar to Tiny4, but it could help with some cases that might show up sometimes.

nnethercote · 2020-05-18T04:39:53Z

r? @Amanieu

Currently, if you repeatedly push to an empty vector, the capacity growth sequence is 0, 1, 2, 4, 8, 16, etc. This commit changes the relevant code (the "amortized" growth strategy) to skip 1 and 2 in most cases, instead using 0, 4, 8, 16, etc. (You can still get a capacity of 1 or 2 using the "exact" growth strategy, e.g. via `reserve_exact()`.) This idea (along with the phrase "tiny Vecs are dumb") comes from the "doubling" growth strategy that was removed from `RawVec` in rust-lang#72013. That strategy was barely ever used -- only when a `VecDeque` was grown, oddly enough -- which is why it was removed in rust-lang#72013. (Fun fact: until just a few days ago, I thought the "doubling" strategy was used for repeated push case. In other words, this commit makes `Vec`s behave the way I always thought they behaved.) This change reduces the number of allocations done by rustc itself by 10% or more. It speeds up rustc, and will also speed up any other Rust program that uses `Vec`s a lot.

src/liballoc/raw_vec.rs

ChrisJefferson · 2020-05-18T14:21:26Z

With regards the explicit check for size 1, it looks to me like when you allocate memory in Rust you get back the "actually allocated" amount of space -- is that right? In that case, assuming that the malloc never returns less than 8 bytes of space there is no need for an explicit check. However, there seems to be various different memroy interfaces with different properties in this area.

(I was trying to do this myself, but got stuck trying to add debugging output inside liballoc, sorry)

nnethercote · 2020-05-19T01:36:15Z

With regards the explicit check for size 1, it looks to me like when you allocate memory in Rust you get back the "actually allocated" amount of space -- is that right?

That's a good question!

The short answer is "no", and for the purposes of this PR, it means that choosing a capacity of 8 for single-byte elements is reasonable, and so I don't need to make any changes to this PR's code.

The long answer is murkier.

RawVec uses AllocRef::alloc to allocate new storage, and AllocRef::grow to reallocate existing storage. In the normal case, the AllocRef trait is implemented by Global. First, the docs.

The current docs for AllocRef::alloc are here. They say "On success, returns a MemoryBlock meeting the size and alignment guarantees of layout. The returned block may have a larger size than specified by layout.size()". Which suggests that the returned size need not be the actual size.
The current docs for AllocRef::grow are here. They say something stronger: "Returns a new MemoryBlock containing a pointer and the actual size of the allocated memory." The "actual" suggests it should return the actual size, not the requested size.

Next, the implementation.

The current implementation of Global::alloc is here. It returns the requested size.
The current implementation of Global::grow is here. It returns the requested size.

In conclusion: Global currently always returns the requested size, and the docs are a bit contradictory and unclear about whether that's the right thing to do. And this example shows that if you push two single-byte elements onto an empty Vec you get a capacity of 1 and then 2, which fits with that.

There is also GlobalAlloc, which is separate, and doesn't implement the AllocRef trait. I'm not sure the relationship between Global and GlobalAlloc.

It's a shame that there doesn't seem to be a way to accurately get the actual size of an allocation with Global. (One could write an implementation of AllocRef that always accurately returns the actual size, but that wouldn't help in the general case.) As you suggest, we could do better with the capacity sizing that way.

@Amanieu: have I got all that right? Any additional thoughts?

@Amanieu

…owup, r=Amanieu Adjust the zero check in `RawVec::grow`. This was supposed to land as part of rust-lang#72227. (I wish `git push` would abort when you have uncommited changes.) r? @Amanieu

@Amanieu

…owup, r=Amanieu Adjust the zero check in `RawVec::grow`. This was supposed to land as part of rust-lang#72227. (I wish `git push` would abort when you have uncommited changes.) r? @Amanieu

upsuper · 2020-05-28T07:58:44Z

In theory, the change could increase memory usage, but in practice it
doesn't. It would be an unusual program where very small Vecs having a
capacity of 4 rather than 1 or 2 would make a difference. You'd need a
lot of very small Vecs, and/or some very small Vecs with very
large elements.

I'd like to chime in and mention that, this does indeed increase memory usage in practice, not theoretically.

In the past, Rust has exactly this behavior, and when we worked on integrating Servo's CSS engine into Gecko (the Stylo project), we found this to be one important reason (among several others) contributing to a much larger (actually double the) memory consumption on Stylo than Gecko with Gmail. See Gecko bug 1392314.

The reason is that, Gmail had lots of background-image for emojis, and background-image supports multiple values in it, and an image item is a relatively large type in the style system, so the overallocation of the Vec hurt quite a bit, and we ended up explicitly disable overallocating in the code.

In my experience, it is actually very common at least in browser development that you have a Vec which in majority of time holds zero or one item, and in very rare case it goes to two, and only in some extreme cases can it go even further. And in that kind of scenario, probably it is tolerable that item size is large as you are unlikely to reallocate a lot.

If Vec is going to overallocate at the very start, I think it would be good to at least have document mentioning that, if a Vec is unlikely go beyond one or two items, it is advised to use with_capacity to avoid possible overallocating.

WDYT?

ChrisJefferson · 2020-05-28T08:17:49Z

This patch should help with that problem, as it only over-allocates if the objects allocated are not too big.

I do wonder if the point where objects are considered "large" (1024 bytes) could be too large? Something much smaller (32 or even 16 bytes) might reduce memory wastage?

mati865 · 2020-05-28T08:32:28Z

That's sounds like perfect scenario for SmallVec/TinyVec TBH. Although https://doc.rust-lang.org/std/vec/struct.Vec.html#capacity-and-reallocation could be slightly expanded for really small vectors.

upsuper · 2020-05-28T09:41:43Z

This patch should help with that problem, as it only over-allocates if the objects allocated are not too big.

Good point. I didn't notice that.

I do wonder if the point where objects are considered "large" (1024 bytes) could be too large? Something much smaller (32 or even 16 bytes) might reduce memory wastage?

Based on the analysis in the related bug, each item was like 192 bytes, and having to allocate 4 making it 768 bytes aligned by the allocator to 1k, and there are ~12k such Vecs, costing 12MB of memory.

With no overallocating, 192 bytes would take exactly 192 bytes (because of the allocator strategy), costing only ~2MB.

That's sounds like perfect scenario for SmallVec/TinyVec TBH.

Not necessarily. SmallVec/TinyVec in that case would need extra space on the stack or other struct, which can lead to extra memory copy and further bloated structs which holding it. This is especially a problem when the most common case is zero item.

nnethercote · 2020-05-28T11:26:11Z

I will repeat my comment from above: "Vec makes no promises about capacities. If you have a program where the capacity of 1 and 2 length Vec's has a critical impact on memory usage (e.g. due to having many short vectors and/or short vectors with very large elements) then you should use a collection type that provides clear guarantees about capacity."

the8472 · 2020-05-29T17:53:47Z

Note that this change doesn't just affect dumb collect() implementations that push repeatedly without a good capacity estimate. This would also change the behavior of small extend() calls with an ExactSizeIterator or TrustedLen. That's probably not a significant edge-case but it is something where we have better information that could be acted on.

E.g. this new behavior could be limited to cases where needed_extra_capacity == 1.

redbaron · 2020-05-31T10:29:01Z

Facebook did some research for their FBVector and picked 1.5 growth factor. They explicitly moved away from 2x growth because of cache unfriendliness.

I know next to nothing about memory allocation in rust, but thought it worth mentioning in case it is applicable here.

nnethercote · 2020-05-31T23:15:05Z

Facebook did some research for their FBVector and picked 1.5 growth factor. They explicitly moved away from 2x growth because of cache unfriendliness.

The argument for 1.5x is about memory reuse, rather than cache friendliness. It's a silly argument, IMO. That document says this:

When there's a request to grow the vector, the vector (assuming no in-place resizing, see the appropriate section in this document) will allocate a chunk of memory next to its current chunk, copy its existing data to the new chunk, and then deallocate the old chunk.

The text I have emphasised is false. Modern allocators typically have size classes, which means that allocations of different sizes (e.g. 128 bytes vs 192 bytes) have no chance of being put next to each other. jemalloc, which Facebook uses, is such an allocator.

Indeed, the next section of that document then goes on to talk about jemalloc's size classes, without realizing that they invalidate the reasoning in the previous section.

RalfJung · 2020-05-31T23:20:00Z

But it seems like with a growth factor of 1.5, we get a guarantee that at least 66% of the allocated memory is actually used (assuming only pushes), whereas a growth factor of 2 makes that 50%? That could waste a lot of memory.

nnethercote · 2020-05-31T23:29:44Z

Sure, there's a trade-off between frequency of reallocation and potential for unused memory no matter what growth factor you use.

It's the "2x is the theoretical worst possible number" argument that I object to. It's false, and it also overlooks the benefits of 2x: 2x is simple, it is the cheapest possible multiplication, and powers of two are most likely to result in allocation sizes that match allocator size classes.

redbaron · 2020-06-01T00:01:54Z

The argument for 1.5x is about memory reuse, rather than cache friendliness

Isn't it cache friendly to reuse memory?

It's the "2x is the theoretical worst possible number" argument that I object to.

I guess it is easy to measure if it has any visible effect on performance and/or memory usage.

ChrisJefferson · 2020-06-01T09:57:40Z

While we can tune this, until the memory allocator can accurately tell us the "true" amount of allocated memory, numbers like 1.5x are also much more likely to lead to entirely wasted memory. On most systems if you allocate (say) 600 bytes, you are probably getting 1024 anyway, and once you get up to page size you get whole pages. (NOTE: That is my belief from reading one memory manager long ago. I haven't actually read carefully the different mallocs which rust uses, which is I suppose proving my point, we don't know what we "actually" get).

nnethercote · 2020-06-01T10:07:20Z

Yes. Here are the size classes from an older version of jemalloc.

Small: [8], [16, 32, 48, ..., 128], [192, 256, 320, ..., 512], [768, 1024, 1280, ..., 3840]
Large: [4 KiB, 8 KiB, 12 KiB, ..., 4072 KiB]
Huge: [4 MiB, 8 MiB, 12 MiB, ...]

The spacings increase more smoothly in recent versions (I can't find an exact list right now) but the general idea still holds.

upsuper · 2020-06-01T10:10:44Z

You can find the latest size classes from jemalloc's man page.

RalfJung · 2020-06-01T10:15:32Z

Jemalloc is not the default allocator for Rust though. Do you know how the platform allocators that we use by default behave? Seems to make most sense to measure with the default setup.

mati865 · 2020-06-01T11:08:52Z

@RalfJung it is. At least when building Rust for Linux on it's CI:

rust/src/ci/docker/dist-x86_64-linux/Dockerfile

Line 105 in 475c713

--set rust.jemalloc

Manual builds use system allocator.

RalfJung · 2020-06-01T11:12:19Z

@mati865 rustc uses jemalloc. But the default allocator for Rust projects in general is the system allocator, not jemalloc -- rustc (a specific program written in Rust) just happens to overwrite that default.

Most users of Vec are going to be outside rustc, so rustc's choice of allocator is insignificant here IMO.

mati865 · 2020-06-01T11:15:46Z

But perf.rlo can only measure Rust performance. So all "official" measurements are done with jemalloc.

RalfJung · 2020-06-01T11:18:44Z

That is correct. So what?

The question was if 2x or 1.5x (or something else) is the better growth factor for Vec in general Rust programs (not for rustc specifically). The fact you stated does nothing to help resolve that question, as far as I can see.

ollie27 · 2020-06-01T11:46:03Z

There is a dedicated issue for the growth strategy of Vec: #29931. This discussion should probably be happening there.

Layer::grow relies on reserving exactly as many bytes as specified in the argument. And it apparently has worked as long as the argument was a power of 2, which it was. This has changed for small vectors since: rust-lang/rust#72227 The fix is to use `reserve_exact` instead.

rw · 2020-08-05T09:58:56Z

This micro-optimization makes Vecs harder to reason about, in my opinion.

IIUC, given elements of size 1024 bytes, these Vecs will start off life in the memory allocator pools meant for 4*1024 = 4096 bytes. That is a big change from starting off in the pools for 1024 byte objects.

Instead, what about calculating the cutoff such that the preallocation fits in to a cache line? Or just recommend that users call Vec::with_capacity?

cc @nnethercote
edit: cc #29931

Do not inline finish_grow Fixes rust-lang#78471. Looking at libgkrust.a in Firefox, the sizes for the `gkrust.*.o` file is: - 18584816 (text) 582418 (data) with unmodified master - 17937659 (text) 582554 (data) with rust-lang#72227 reverted - 17968228 (text) 582858 (data) with `#[inline(never)]` on `grow_amortized` and `grow_exact`, but that has some performance consequences - 17927760 (text) 582322 (data) with this change So in terms of size, at least in the case of Firefox, this patch more than undoes the regression. I don't think it should affect performance, but we'll see.

rust-highfive assigned Amanieu May 15, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 15, 2020

nnethercote force-pushed the tiny-vecs-are-dumb branch from dac8c47 to 37dd3b6 Compare May 18, 2020 04:34

nnethercote force-pushed the tiny-vecs-are-dumb branch from 37dd3b6 to 3cbc23e Compare May 18, 2020 04:42

nnethercote force-pushed the tiny-vecs-are-dumb branch from 3cbc23e to f4b9dc3 Compare May 18, 2020 05:27

Amanieu reviewed May 18, 2020

View reviewed changes

src/liballoc/raw_vec.rs Show resolved Hide resolved

This was referenced Oct 28, 2020

Code bloat from RawVec::grow_amortized #78471

Closed

Do not inline finish_grow #78682

Merged

connorskees mentioned this pull request Jun 30, 2021

Performance Improvements mccolljr/segvec#1

Open

3 tasks

Mark-Simulacrum mentioned this pull request Sep 4, 2022

Grow RawVec to fill the allocator bins tighter [please bench it] #101341

Closed

4 tasks

mxinden mentioned this pull request May 28, 2025

Consider not pre-allocating each UDP datagram in output_path mozilla/neqo#2670

Open

Tiny Vecs are dumb. #72227

Tiny Vecs are dumb. #72227

Uh oh!

Conversation

nnethercote commented May 15, 2020

Uh oh!

nnethercote commented May 15, 2020

Uh oh!

rust-timer commented May 15, 2020

Uh oh!

bors commented May 15, 2020

Uh oh!

Amanieu commented May 15, 2020

Uh oh!

leonardo-m commented May 15, 2020

Uh oh!

bors commented May 15, 2020

Uh oh!

rust-timer commented May 15, 2020

Uh oh!

ollie27 commented May 15, 2020

Uh oh!

rust-timer commented May 15, 2020

Uh oh!

Amanieu commented May 15, 2020

Uh oh!

Amanieu commented May 15, 2020

Uh oh!

hanna-kruppe commented May 15, 2020

Uh oh!

nnethercote commented May 15, 2020

Uh oh!

nnethercote commented May 15, 2020

Uh oh!

nnethercote commented May 16, 2020

Uh oh!

hanna-kruppe commented May 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChrisJefferson commented May 17, 2020

Uh oh!

nnethercote commented May 17, 2020

Uh oh!

nnethercote commented May 18, 2020

Uh oh!

nnethercote commented May 18, 2020

Uh oh!

Uh oh!

ChrisJefferson commented May 18, 2020

Uh oh!

nnethercote commented May 19, 2020

Uh oh!

upsuper commented May 28, 2020

Uh oh!

ChrisJefferson commented May 28, 2020

Uh oh!

mati865 commented May 28, 2020

Uh oh!

upsuper commented May 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nnethercote commented May 28, 2020

Uh oh!

the8472 commented May 29, 2020

Uh oh!

redbaron commented May 31, 2020

Uh oh!

nnethercote commented May 31, 2020

Uh oh!

RalfJung commented May 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nnethercote commented May 31, 2020

Uh oh!

redbaron commented Jun 1, 2020

Uh oh!

ChrisJefferson commented Jun 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nnethercote commented Jun 1, 2020

hanna-kruppe commented May 16, 2020 •

edited

Loading

upsuper commented May 28, 2020 •

edited

Loading

RalfJung commented May 31, 2020 •

edited

Loading

ChrisJefferson commented Jun 1, 2020 •

edited

Loading

RalfJung commented Jun 1, 2020 •

edited

Loading

rw commented Aug 5, 2020 •

edited

Loading