Double the capacity when BlobVec is full #11167

garychia · 2024-01-01T09:47:01Z

Objective

Fixes BlobVec::push is linear #10797

Solution

Double the capacity of a full BlobVec before pushing a new element.

stepancheg · 2024-01-01T15:25:44Z

This solution won't fix most of the issues.

In many cases, bevy calls BlobVec::reserve_exact which has exactly the same problem, and after reserve_exact, push won't double the capacity.

Proper fix should be (basically, copy Vec behavior):

add reserve function (in addition to reserve_exact) which would double the capacity (or maybe do something smarter, see Vec)
call reserve where reserve_exact is called (perhaps in a separate PR)
push should just call reserve(1) (instead of reserve_exact(1))

garychia · 2024-01-02T06:21:38Z

Sure. I will leave push mostly unchanged and implement the reserve function instead. At this point I'm not able to come out with a fancy solution. My reserve function basically just ensures the capacity will at least double if there is no enough space.

stepancheg · 2024-01-02T16:50:57Z

crates/bevy_ecs/src/storage/blob_vec.rs

+            let extra_space = self.capacity.max(additional - available_space);
+            // SAFETY: `additional - available_space > 0` so `extra_space` is non-zero
+            let increment = unsafe { NonZeroUsize::new_unchecked(extra_space) };


Let's call it extra_capacity because "space" can mean both "len" and "capacity"

Also let's name both variables extra_space and increment the same, because they are the same

Also new_unchecked is not really needed here, safe new() + unwrap should work equally fine

compiler is able to get rid of this trivial check

but even if it doesn't, this code is executed rarely anyway

crates/bevy_ecs/src/storage/blob_vec.rs

stepancheg · 2024-01-02T16:58:06Z

Overall, looks good.

stepancheg · 2024-01-03T16:04:01Z

If we want to make this code more perfect, further change might be this: split reserve function into reserve and do_reserve.

reserve body would be this:

#[inline]
fn reserve(&mut self, additional: usize) {
  if self.cap - self.len < additional {
    self.do_reserve(additional);
  }
}

#[cold]
fn do_reserve(&mut self additional: usize) { ... }

The idea is this. push function is marked #[inline], and it should be fast. reserve is not marked inline, so it might not be inlined into push even if in most cases this function is simple integer subtraction and comparison.

If we mark reserve #[inline] as is, reserve will be inlined into push, but resulting push function might be too large to be inlined into code that calls push.

Splitting reserve function fixes this issue.

james7132

The status quo seems to have been an artifact of #1525 when we forked from hecs. It seems like hecs does also use a doubling strategy as well, as of July 2021: Ralith/hecs@a8545a2, which was roughly 4-5 months after ECS V2. Checked this with @Ralith and @cart on Discord: https://discord.com/channels/691052431525675048/749335865876021248/1192580492164341820.

With the history here established, I think this is a good idea in general, but if you check the usage of BlobVec::push, it's only used in Column::push, which is only used in ComponentSparseSet::insert, Resource::insert, and their variants. This won't impact any of the table based storage, which is what almost all components use, but this should improve performance when mass inserting/spawning SparseSet components.

Code generally looks good, though there are some things I want addressed.

james7132 · 2024-01-05T16:01:41Z

crates/bevy_ecs/src/storage/blob_vec.rs

+    /// Similar to `reserve_exact`. This method ensures that the capacity will grow at least `self.capacity()` if there is no
+    /// enough space to hold `additional` more elements.
+    #[cold]
+    fn do_reserve(&mut self, additional: usize) {


I'd follow the style the Rust project uses for their Vec implementation, and scope the cold function within the reserve function itself: https://github.com/rust-lang/rust/blob/6bc08a725f888a06ea3c6844f3d0cc2d2ebc5142/library/alloc/src/raw_vec.rs#L294.

james7132 · 2024-01-05T16:19:43Z

crates/bevy_ecs/src/storage/blob_vec.rs

+    #[cold]
+    fn do_reserve(&mut self, additional: usize) {
+        let available_space = self.capacity - self.len;
+        if available_space < additional && self.item_layout.size() > 0 {


If we are already doing the checks in reserve, we don't need to be repeating them here. We can just assume we have already met the requisite conditions.

stepancheg · 2024-01-06T06:41:25Z

crates/bevy_ecs/src/storage/blob_vec.rs

+            if slf.item_layout.size() > 0 {
+                let increment = slf.capacity.max(additional - (slf.capacity - slf.len));
+                let increment = NonZeroUsize::new(increment).unwrap();
+                // SAFETY: not called for ZSTs
+                unsafe { slf.grow_exact(increment) };
+            }


After #10799 merged, this code is actually meant to call grow_exact for ZST.

Does it mean we no longer need that check and just call grow_exact immediately?

Yes, we don't need check if slf.item_layout.size() > 0.

atlv24

logic checks out

atlv24 · 2024-01-20T23:21:46Z

benchmarks are a bit noisy but we do win in the big sparse_set case

$ critcmp before after
group                                     after                                  before
-----                                     -----                                  ------
add_remove/sparse_set                     1.00   494.9±11.07µs        ? ?/sec    1.02   504.0±31.49µs        ? ?/sec
add_remove/table                          1.02   766.3±14.54µs        ? ?/sec    1.00   753.0±14.05µs        ? ?/sec
add_remove_big/sparse_set                 1.00   497.3±11.90µs        ? ?/sec    1.08   535.2±76.61µs        ? ?/sec
add_remove_big/table                      1.01  1751.1±35.25µs        ? ?/sec    1.00  1726.0±23.07µs        ? ?/sec
added_archetypes/archetype_count/100      1.03     38.0±0.24µs        ? ?/sec    1.00     36.9±0.18µs        ? ?/sec
added_archetypes/archetype_count/1000     1.02    403.9±5.28µs        ? ?/sec    1.00    395.1±2.28µs        ? ?/sec
added_archetypes/archetype_count/10000    1.02      7.1±0.17ms        ? ?/sec    1.00      7.0±0.22ms        ? ?/sec
added_archetypes/archetype_count/200      1.00     73.5±0.65µs        ? ?/sec    1.00     73.6±0.51µs        ? ?/sec
added_archetypes/archetype_count/2000     1.02   831.5±10.76µs        ? ?/sec    1.00    816.4±8.05µs        ? ?/sec
added_archetypes/archetype_count/500      1.02    197.7±1.11µs        ? ?/sec    1.00    194.4±1.10µs        ? ?/sec
added_archetypes/archetype_count/5000     1.02      2.7±0.08ms        ? ?/sec    1.00      2.6±0.07ms        ? ?/sec
insert_simple/base                        1.01    251.4±3.25µs        ? ?/sec    1.00    247.8±2.45µs        ? ?/sec
insert_simple/unbatched                   1.01   581.4±19.07µs        ? ?/sec    1.00   578.4±14.18µs        ? ?/sec
no_archetypes/system_count/0              1.00      4.7±0.05ns        ? ?/sec    1.00      4.7±0.03ns        ? ?/sec
no_archetypes/system_count/100            1.00   838.1±13.51ns        ? ?/sec    1.00   836.7±10.64ns        ? ?/sec
no_archetypes/system_count/20             1.00    163.3±1.67ns        ? ?/sec    1.04    169.6±6.60ns        ? ?/sec
no_archetypes/system_count/40             1.01    347.0±5.50ns        ? ?/sec    1.00   343.8±10.40ns        ? ?/sec
no_archetypes/system_count/60             1.00    503.1±9.43ns        ? ?/sec    1.01    506.0±8.72ns        ? ?/sec
no_archetypes/system_count/80             1.01    678.1±5.77ns        ? ?/sec    1.00    674.6±9.58ns        ? ?/sec

one run was particularly harsh on these two benches:

group                                     after                                  before
-----                                     -----                                  ------
insert_simple/unbatched                   1.09   629.3±14.79µs        ? ?/sec    1.00   578.4±14.18µs        ? ?/sec
no_archetypes/system_count/40             1.28    440.7±1.86ns        ? ?/sec    1.00   343.8±10.40ns        ? ?/sec

growing by 1.5x instead of 2x looks pretty much the same

$ critcmp before after
group                                     after                                  before
-----                                     -----                                  ------
add_remove/sparse_set                     1.00   486.2±12.53µs        ? ?/sec    1.04   504.0±31.49µs        ? ?/sec
add_remove/table                          1.00   751.5±15.29µs        ? ?/sec    1.00   753.0±14.05µs        ? ?/sec
add_remove_big/sparse_set                 1.00   499.5±12.71µs        ? ?/sec    1.07   535.2±76.61µs        ? ?/sec
add_remove_big/table                      1.02  1768.9±29.77µs        ? ?/sec    1.00  1726.0±23.07µs        ? ?/sec
added_archetypes/archetype_count/100      1.02     37.8±0.62µs        ? ?/sec    1.00     36.9±0.18µs        ? ?/sec
added_archetypes/archetype_count/1000     1.02    402.5±3.34µs        ? ?/sec    1.00    395.1±2.28µs        ? ?/sec
added_archetypes/archetype_count/10000    1.00      7.0±0.20ms        ? ?/sec    1.00      7.0±0.22ms        ? ?/sec
added_archetypes/archetype_count/200      1.02     74.7±0.37µs        ? ?/sec    1.00     73.6±0.51µs        ? ?/sec
added_archetypes/archetype_count/2000     1.02    830.4±9.54µs        ? ?/sec    1.00    816.4±8.05µs        ? ?/sec
added_archetypes/archetype_count/500      1.02    198.0±1.64µs        ? ?/sec    1.00    194.4±1.10µs        ? ?/sec
added_archetypes/archetype_count/5000     1.02      2.6±0.06ms        ? ?/sec    1.00      2.6±0.07ms        ? ?/sec
insert_simple/base                        1.00    245.4±4.32µs        ? ?/sec    1.01    247.8±2.45µs        ? ?/sec
insert_simple/unbatched                   1.00   576.1±15.01µs        ? ?/sec    1.00   578.4±14.18µs        ? ?/sec
no_archetypes/system_count/0              1.00      4.7±0.03ns        ? ?/sec    1.00      4.7±0.03ns        ? ?/sec
no_archetypes/system_count/100            1.02    850.8±6.63ns        ? ?/sec    1.00   836.7±10.64ns        ? ?/sec
no_archetypes/system_count/20             1.00    168.6±6.18ns        ? ?/sec    1.01    169.6±6.60ns        ? ?/sec
no_archetypes/system_count/40             1.00    342.3±9.41ns        ? ?/sec    1.00   343.8±10.40ns        ? ?/sec
no_archetypes/system_count/60             1.00   500.7±10.08ns        ? ?/sec    1.01    506.0±8.72ns        ? ?/sec
no_archetypes/system_count/80             1.01    678.4±7.95ns        ? ?/sec    1.00    674.6±9.58ns        ? ?/sec

mockersf added A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times labels Jan 1, 2024

garychia force-pushed the blobvec_push branch from a625547 to 38d595e Compare January 2, 2024 06:20

stepancheg reviewed Jan 2, 2024

View reviewed changes

crates/bevy_ecs/src/storage/blob_vec.rs Show resolved Hide resolved

stepancheg approved these changes Jan 2, 2024

View reviewed changes

james7132 reviewed Jan 5, 2024

View reviewed changes

stepancheg reviewed Jan 6, 2024

View reviewed changes

Double the capacity of a full BlobVec after push

feb0a7f

garychia force-pushed the blobvec_push branch from bc07407 to feb0a7f Compare January 6, 2024 07:36

atlv24 approved these changes Jan 20, 2024

View reviewed changes

SkiFire13 approved these changes Jan 20, 2024

View reviewed changes

alice-i-cecile added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Jan 21, 2024

alice-i-cecile added this pull request to the merge queue Jan 22, 2024

Merged via the queue into bevyengine:main with commit 8ad1b93 Jan 22, 2024
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Double the capacity when BlobVec is full #11167

Double the capacity when BlobVec is full #11167

garychia commented Jan 1, 2024

stepancheg commented Jan 1, 2024 •

edited

Loading

garychia commented Jan 2, 2024

stepancheg Jan 2, 2024

stepancheg commented Jan 2, 2024

stepancheg commented Jan 3, 2024 •

edited

Loading

james7132 left a comment

james7132 Jan 5, 2024

james7132 Jan 5, 2024

stepancheg Jan 6, 2024

garychia Jan 6, 2024

stepancheg Jan 6, 2024

atlv24 left a comment

atlv24 commented Jan 20, 2024

Double the capacity when BlobVec is full #11167

Double the capacity when BlobVec is full #11167

Conversation

garychia commented Jan 1, 2024

Objective

Solution

stepancheg commented Jan 1, 2024 • edited Loading

garychia commented Jan 2, 2024

stepancheg Jan 2, 2024

Choose a reason for hiding this comment

stepancheg commented Jan 2, 2024

stepancheg commented Jan 3, 2024 • edited Loading

james7132 left a comment

Choose a reason for hiding this comment

james7132 Jan 5, 2024

Choose a reason for hiding this comment

james7132 Jan 5, 2024

Choose a reason for hiding this comment

stepancheg Jan 6, 2024

Choose a reason for hiding this comment

garychia Jan 6, 2024

Choose a reason for hiding this comment

stepancheg Jan 6, 2024

Choose a reason for hiding this comment

atlv24 left a comment

Choose a reason for hiding this comment

atlv24 commented Jan 20, 2024

stepancheg commented Jan 1, 2024 •

edited

Loading

stepancheg commented Jan 3, 2024 •

edited

Loading