-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bevy_<util/ecs>] Better SparseArray performance #2104
[bevy_<util/ecs>] Better SparseArray performance #2104
Conversation
I'd love to see benchmarks on this if you have them <3 Not doubting that it's faster, but I'm curious about the impact (and it's nice for release notes). |
da0235a
to
8d1242f
Compare
Note, I'm totally up for bike-shedding on the |
Is there a reason we cant use types with niche's for all of our sparseset indexes? I havent had a chance to look through all our keys and see if this is possible but it would be nice to know before doing something like this Edit: after looking through it seems like everything we use here is either a usize or contains a usize and so could we just use |
What about #[inline]
pub const fn empty() -> ArchetypeId {
ArchetypeId(0)
} Also, I'm pretty sure we need to allow the values to be |
Yeah the primary purpose of these sparse sets is to be used in the context of arrays, so NonZeroUsize isn't really viable. I don't want to push "dummy data" into the zero index of each array or need to remember to add 1 in the context of sparse sets. |
I'd also like to see benches, as the primary motivator here is perf. |
8d1242f
to
7c9b6d7
Compare
Another interesting find. #[derive(Copy, Clone, Eq, PartialEq, Ord, PartialOrd, Hash)]
pub struct NonMaxUsize(NonZeroUsize);
impl NonMaxUsize {
#[inline]
pub const fn new(n: usize) -> Option<Self> {
Some(Self(NonZeroUsize::new(n ^ usize::MAX)?))
}
#[inline]
pub const unsafe fn new_unchecked(n: usize) -> Self {
Self(NonZeroUsize::new_unchecked(n ^ usize::MAX))
}
#[inline(always)]
pub const fn get(self) -> usize {
self.0.get() ^ usize::MAX
}
} Granted the big downside is that whenever you want to access the value you have to do a This only optimizes that path that we have a You can see the branch/commit I used for testing here. |
|
Yep, they're the same 😄 Also in case performance is a concern, they both compile down to the same assembly. |
I find the "!" form clearer, but I don't feel strongly about it :p |
e791424
to
6edd244
Compare
I just pushed changes changing the implementation to use the However, I would like someone to run the benches as well just to confirm :) |
6edd244
to
e3a6667
Compare
I've ran the benchmarks against current main branch. Note that the insertion benchmarks are fairly noisy, but iteration ones is fairly stable. Table sorted rougly by relative timing difference, from biggest regression first to biggest improvement last. Note that variance quite high in some cases, error bars being bigger than the absolute difference. So take those with a grain of salt.
|
ping @cart for opinions/views on the perf tests. |
@Frizi If you are on Linux could you try using |
I'm a Windows desktop user :) Also with HyperV and WSL enabled, thus making the Windows kernel a guest OS. So probably not much I can do about the high variance without drastically changing my setup. |
This LGTM, but I prefer the approach taken in #3678. |
# Objective Adoption of #2104 and #11843. The `Option<usize>` wastes 3-7 bytes of memory per potential entry, and represents a scaling memory overhead as the ID space grows. The goal of this PR is to reduce memory usage without significantly impacting common use cases. Co-Authored By: @NathanSWard Co-Authored By: @tygyh ## Solution Replace `usize` in `SparseSet`'s sparse array with `nonmax::NonMaxUsize`. NonMaxUsize wraps a NonZeroUsize, and applies a bitwise NOT to the value when accessing it. This allows the compiler to niche the value and eliminate the extra padding used for the `Option` inside the sparse array, while moving the niche value from 0 to usize::MAX instead. Checking the [diff in x86 generated assembly](james7132/bevy_asm_tests@6e4da65), this change actually results in fewer instructions generated. One potential downside is that it seems to have moved a load before a branch, which means we may be incurring a cache miss even if the element is not there. Note: unlike #2104 and #11843, this PR only targets the metadata stores for the ECS and not the component storage itself. Due to #9907 targeting `Entity::generation` instead of `Entity::index`, `ComponentSparseSet` storing only up to `u32::MAX` elements would become a correctness issue. This will come with a cost when inserting items into the SparseSet, as now there is a potential for a panic. These cost are really only incurred when constructing a new Table, Archetype, or Resource that has never been seen before by the World. All operations that are fairly cold and not on any particular hotpath, even for command application. --- ## Changelog Changed: `SparseSet` now can only store up to `usize::MAX - 1` elements instead of `usize::MAX`. Changed: `SparseSet` now uses 33-50% less memory overhead per stored item.
This has been implemented for non-component sparse sets with #12083, a PR that was (twice) adopted from this one, and it's likely infeasible to implement for component sparse sets without it being a correctness issue for the rest of the ECS. Closing this out. |
# Objective Adoption of bevyengine#2104 and bevyengine#11843. The `Option<usize>` wastes 3-7 bytes of memory per potential entry, and represents a scaling memory overhead as the ID space grows. The goal of this PR is to reduce memory usage without significantly impacting common use cases. Co-Authored By: @NathanSWard Co-Authored By: @tygyh ## Solution Replace `usize` in `SparseSet`'s sparse array with `nonmax::NonMaxUsize`. NonMaxUsize wraps a NonZeroUsize, and applies a bitwise NOT to the value when accessing it. This allows the compiler to niche the value and eliminate the extra padding used for the `Option` inside the sparse array, while moving the niche value from 0 to usize::MAX instead. Checking the [diff in x86 generated assembly](james7132/bevy_asm_tests@6e4da65), this change actually results in fewer instructions generated. One potential downside is that it seems to have moved a load before a branch, which means we may be incurring a cache miss even if the element is not there. Note: unlike bevyengine#2104 and bevyengine#11843, this PR only targets the metadata stores for the ECS and not the component storage itself. Due to bevyengine#9907 targeting `Entity::generation` instead of `Entity::index`, `ComponentSparseSet` storing only up to `u32::MAX` elements would become a correctness issue. This will come with a cost when inserting items into the SparseSet, as now there is a potential for a panic. These cost are really only incurred when constructing a new Table, Archetype, or Resource that has never been seen before by the World. All operations that are fairly cold and not on any particular hotpath, even for command application. --- ## Changelog Changed: `SparseSet` now can only store up to `usize::MAX - 1` elements instead of `usize::MAX`. Changed: `SparseSet` now uses 33-50% less memory overhead per stored item.
Problem:
Option<T>
internally to represent is thevalue is valid or not.
Option's discriminatior.
Solution:
NonMaxUsize
class which guaranteesOption<NonMaxUsize>
is the same size asNoneMaxUsize
.Fixes #1558