-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NonMaxUsize for SparseArrays #11843
NonMaxUsize for SparseArrays #11843
Conversation
There is a conflict between this and #2227 which I do not know how to resolve. I am very open to suggestions on how to resolve their changes in 'insert' in 'sparse_set'. |
@@ -0,0 +1,85 @@ | |||
macro_rules! impl_non_max_fmt { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a nonmax
crate that already implements this. We should try using that first.
} | ||
} else { | ||
self.sparse.insert(index.clone(), self.dense.len()); | ||
self.sparse | ||
.insert(index.clone(), NonMaxUsize::new(self.dense.len()).unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't an uncommon branch to take, and this introduces a potential panic, even if we never panic normally, this will have a negative impact on performance, and it's not something we can use debug_checked_unwrap
on either since we know for a fact that it's something that we can hit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean you want me to revert that line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly am not sure how to deal with this. This branch is unavoidable if we want to avoid unsoundness. If we can find a way to address this without impacting performance, this seems feasible, but otherwise, this is potentially on the hotpath for a lot of operations within the engine, so any tangible performance regression is probably unacceptable. Definitely need to benchmark this in some way to validate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the potential performance issue here? Is it xor
operation or unwrap()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best way to deal with this is to improve Rust language (there're a lot of discussions like this). But I'm afraid we're not getting it in the next ten years. :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the potential performance issue here? Is it xor operation or unwrap()?
The unwrap, the XOR has persistently shown to be easily pipelinable in any current CPUs without adding too much here, but the extra branch and codegen from panicking has shown to have a signficant impact when working with code that is this hot.
@tygyh I want to experiment with this a bit more. Do you mind if I resolve the merge conflicts on this branch? |
|
Actually, this might be easier to rewrite this from scratch given how much has changed. Do you mind if I adopt this? I'm seeing some results on a local re-implementation that at least should be neutral in CPU time performance, while also saving the memory. |
@james7132 Re-adopt this if you want to. I am finished with it. |
Closing in favor of #12083. Will credit you as a part of that PR. |
# Objective Adoption of #2104 and #11843. The `Option<usize>` wastes 3-7 bytes of memory per potential entry, and represents a scaling memory overhead as the ID space grows. The goal of this PR is to reduce memory usage without significantly impacting common use cases. Co-Authored By: @NathanSWard Co-Authored By: @tygyh ## Solution Replace `usize` in `SparseSet`'s sparse array with `nonmax::NonMaxUsize`. NonMaxUsize wraps a NonZeroUsize, and applies a bitwise NOT to the value when accessing it. This allows the compiler to niche the value and eliminate the extra padding used for the `Option` inside the sparse array, while moving the niche value from 0 to usize::MAX instead. Checking the [diff in x86 generated assembly](james7132/bevy_asm_tests@6e4da65), this change actually results in fewer instructions generated. One potential downside is that it seems to have moved a load before a branch, which means we may be incurring a cache miss even if the element is not there. Note: unlike #2104 and #11843, this PR only targets the metadata stores for the ECS and not the component storage itself. Due to #9907 targeting `Entity::generation` instead of `Entity::index`, `ComponentSparseSet` storing only up to `u32::MAX` elements would become a correctness issue. This will come with a cost when inserting items into the SparseSet, as now there is a potential for a panic. These cost are really only incurred when constructing a new Table, Archetype, or Resource that has never been seen before by the World. All operations that are fairly cold and not on any particular hotpath, even for command application. --- ## Changelog Changed: `SparseSet` now can only store up to `usize::MAX - 1` elements instead of `usize::MAX`. Changed: `SparseSet` now uses 33-50% less memory overhead per stored item.
# Objective Adoption of bevyengine#2104 and bevyengine#11843. The `Option<usize>` wastes 3-7 bytes of memory per potential entry, and represents a scaling memory overhead as the ID space grows. The goal of this PR is to reduce memory usage without significantly impacting common use cases. Co-Authored By: @NathanSWard Co-Authored By: @tygyh ## Solution Replace `usize` in `SparseSet`'s sparse array with `nonmax::NonMaxUsize`. NonMaxUsize wraps a NonZeroUsize, and applies a bitwise NOT to the value when accessing it. This allows the compiler to niche the value and eliminate the extra padding used for the `Option` inside the sparse array, while moving the niche value from 0 to usize::MAX instead. Checking the [diff in x86 generated assembly](james7132/bevy_asm_tests@6e4da65), this change actually results in fewer instructions generated. One potential downside is that it seems to have moved a load before a branch, which means we may be incurring a cache miss even if the element is not there. Note: unlike bevyengine#2104 and bevyengine#11843, this PR only targets the metadata stores for the ECS and not the component storage itself. Due to bevyengine#9907 targeting `Entity::generation` instead of `Entity::index`, `ComponentSparseSet` storing only up to `u32::MAX` elements would become a correctness issue. This will come with a cost when inserting items into the SparseSet, as now there is a potential for a panic. These cost are really only incurred when constructing a new Table, Archetype, or Resource that has never been seen before by the World. All operations that are fairly cold and not on any particular hotpath, even for command application. --- ## Changelog Changed: `SparseSet` now can only store up to `usize::MAX - 1` elements instead of `usize::MAX`. Changed: `SparseSet` now uses 33-50% less memory overhead per stored item.
Objective
Solution