NonMaxUsize for SparseArrays #11843

tygyh · 2024-02-13T10:04:48Z

Objective

Finish [bevy_<util/ecs>] Better SparseArray performance #2104.

Solution

Rebase and solve change conflicts.

tygyh · 2024-02-13T10:11:20Z

There is a conflict between this and #2227 which I do not know how to resolve. I am very open to suggestions on how to resolve their changes in 'insert' in 'sparse_set'.

james7132 · 2024-02-13T17:48:14Z

crates/bevy_utils/src/num.rs

@@ -0,0 +1,85 @@
+macro_rules! impl_non_max_fmt {


There is a nonmax crate that already implements this. We should try using that first.

james7132 · 2024-02-13T17:52:29Z

crates/bevy_ecs/src/storage/sparse_set.rs

            }
        } else {
-            self.sparse.insert(index.clone(), self.dense.len());
+            self.sparse
+                .insert(index.clone(), NonMaxUsize::new(self.dense.len()).unwrap());


This isn't an uncommon branch to take, and this introduces a potential panic, even if we never panic normally, this will have a negative impact on performance, and it's not something we can use debug_checked_unwrap on either since we know for a fact that it's something that we can hit.

Does that mean you want me to revert that line?

I honestly am not sure how to deal with this. This branch is unavoidable if we want to avoid unsoundness. If we can find a way to address this without impacting performance, this seems feasible, but otherwise, this is potentially on the hotpath for a lot of operations within the engine, so any tangible performance regression is probably unacceptable. Definitely need to benchmark this in some way to validate.

What is the potential performance issue here? Is it xor operation or unwrap()?

The best way to deal with this is to improve Rust language (there're a lot of discussions like this). But I'm afraid we're not getting it in the next ten years. :(

What is the potential performance issue here? Is it xor operation or unwrap()?

The unwrap, the XOR has persistently shown to be easily pipelinable in any current CPUs without adding too much here, but the extra branch and codegen from panicking has shown to have a signficant impact when working with code that is this hot.

james7132 · 2024-02-23T20:42:45Z

@tygyh I want to experiment with this a bit more. Do you mind if I resolve the merge conflicts on this branch?

tygyh · 2024-02-23T20:45:01Z

@tygyh I want to experiment with this a bit more. Do you mind if I resolve the merge conflicts on this branch?
Go ahead

james7132 · 2024-02-23T21:30:23Z

Actually, this might be easier to rewrite this from scratch given how much has changed. Do you mind if I adopt this? I'm seeing some results on a local re-implementation that at least should be neutral in CPU time performance, while also saving the memory.

tygyh · 2024-02-24T07:19:11Z

@james7132 Re-adopt this if you want to. I am finished with it.

james7132 · 2024-02-24T09:48:27Z

Closing in favor of #12083. Will credit you as a part of that PR.

@NathanSWard

# Objective Adoption of #2104 and #11843. The `Option<usize>` wastes 3-7 bytes of memory per potential entry, and represents a scaling memory overhead as the ID space grows. The goal of this PR is to reduce memory usage without significantly impacting common use cases. Co-Authored By: @NathanSWard Co-Authored By: @tygyh ## Solution Replace `usize` in `SparseSet`'s sparse array with `nonmax::NonMaxUsize`. NonMaxUsize wraps a NonZeroUsize, and applies a bitwise NOT to the value when accessing it. This allows the compiler to niche the value and eliminate the extra padding used for the `Option` inside the sparse array, while moving the niche value from 0 to usize::MAX instead. Checking the [diff in x86 generated assembly](james7132/bevy_asm_tests@6e4da65), this change actually results in fewer instructions generated. One potential downside is that it seems to have moved a load before a branch, which means we may be incurring a cache miss even if the element is not there. Note: unlike #2104 and #11843, this PR only targets the metadata stores for the ECS and not the component storage itself. Due to #9907 targeting `Entity::generation` instead of `Entity::index`, `ComponentSparseSet` storing only up to `u32::MAX` elements would become a correctness issue. This will come with a cost when inserting items into the SparseSet, as now there is a potential for a panic. These cost are really only incurred when constructing a new Table, Archetype, or Resource that has never been seen before by the World. All operations that are fairly cold and not on any particular hotpath, even for command application. --- ## Changelog Changed: `SparseSet` now can only store up to `usize::MAX - 1` elements instead of `usize::MAX`. Changed: `SparseSet` now uses 33-50% less memory overhead per stored item.

@NathanSWard

# Objective Adoption of bevyengine#2104 and bevyengine#11843. The `Option<usize>` wastes 3-7 bytes of memory per potential entry, and represents a scaling memory overhead as the ID space grows. The goal of this PR is to reduce memory usage without significantly impacting common use cases. Co-Authored By: @NathanSWard Co-Authored By: @tygyh ## Solution Replace `usize` in `SparseSet`'s sparse array with `nonmax::NonMaxUsize`. NonMaxUsize wraps a NonZeroUsize, and applies a bitwise NOT to the value when accessing it. This allows the compiler to niche the value and eliminate the extra padding used for the `Option` inside the sparse array, while moving the niche value from 0 to usize::MAX instead. Checking the [diff in x86 generated assembly](james7132/bevy_asm_tests@6e4da65), this change actually results in fewer instructions generated. One potential downside is that it seems to have moved a load before a branch, which means we may be incurring a cache miss even if the element is not there. Note: unlike bevyengine#2104 and bevyengine#11843, this PR only targets the metadata stores for the ECS and not the component storage itself. Due to bevyengine#9907 targeting `Entity::generation` instead of `Entity::index`, `ComponentSparseSet` storing only up to `u32::MAX` elements would become a correctness issue. This will come with a cost when inserting items into the SparseSet, as now there is a potential for a panic. These cost are really only incurred when constructing a new Table, Archetype, or Resource that has never been seen before by the World. All operations that are fairly cold and not on any particular hotpath, even for command application. --- ## Changelog Changed: `SparseSet` now can only store up to `usize::MAX - 1` elements instead of `usize::MAX`. Changed: `SparseSet` now uses 33-50% less memory overhead per stored item.

NonMaxUsize for SparseArrays

4adb1ae

james7132 added S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times labels Feb 13, 2024

james7132 reviewed Feb 13, 2024

View reviewed changes

james7132 mentioned this pull request Feb 24, 2024

Use NonMaxUsize for non-component SparseSets #12083

Merged

james7132 closed this Feb 24, 2024

tygyh deleted the nward/sparse-array-option branch February 24, 2024 11:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NonMaxUsize for SparseArrays #11843

NonMaxUsize for SparseArrays #11843

tygyh commented Feb 13, 2024

tygyh commented Feb 13, 2024

james7132 Feb 13, 2024

james7132 Feb 13, 2024

tygyh Feb 13, 2024

james7132 Feb 13, 2024

rlidwka Feb 13, 2024

rlidwka Feb 13, 2024

james7132 Feb 13, 2024

james7132 commented Feb 23, 2024

tygyh commented Feb 23, 2024

james7132 commented Feb 23, 2024

tygyh commented Feb 24, 2024

james7132 commented Feb 24, 2024

NonMaxUsize for SparseArrays #11843

NonMaxUsize for SparseArrays #11843

Conversation

tygyh commented Feb 13, 2024

Objective

Solution

tygyh commented Feb 13, 2024

james7132 Feb 13, 2024

Choose a reason for hiding this comment

james7132 Feb 13, 2024

Choose a reason for hiding this comment

tygyh Feb 13, 2024

Choose a reason for hiding this comment

james7132 Feb 13, 2024

Choose a reason for hiding this comment

rlidwka Feb 13, 2024

Choose a reason for hiding this comment

rlidwka Feb 13, 2024

Choose a reason for hiding this comment

james7132 Feb 13, 2024

Choose a reason for hiding this comment

james7132 commented Feb 23, 2024

tygyh commented Feb 23, 2024

james7132 commented Feb 23, 2024

tygyh commented Feb 24, 2024

james7132 commented Feb 24, 2024