-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Stop using ArchetypeComponentId
in the executor
#16885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop using ArchetypeComponentId
in the executor
#16885
Conversation
Ah, so we're moving from factual to hypothetical conflicts here. Interesting. I think that this is the right direction, to reduce pain with relations, and also to allow us to precompute more about the schedule ordering ahead of time in the future. |
Many components stress testTimes below are the update schedule times in ms. Negative changes mean this pr is slower.
Note: We see that the frame time decreases as the number of components grows. This happens because each system ends up matching fewer entities and so has less work to do. Trying to interpret these resultsWe see there are some regressions on the 100 components-1600 systems results. This is somewhat expected as these are a worst case senario for this pr. We end up with fewer systems that can run in parallel because of the more pessimistic access check. We can see this in the below tracing spans. This pr is much more sparse than main. We do also see some improvements. This seems to be due to there not being too many new conflicts between systems and the systems spawning faster and starting another thread or two. It helps when there is enough work for all the threads, but when there isn't starting extra threads hurts like for the 50000-100-100 results. Overall I think these results are as expected. We lose some parallelism, but gain some speed spawning system tasks due to cheaper checks when there's a large number of archetype components. I'll probably investigate a little more with varying the number of system to get a better idea of how things scale as the number of systmes increases. But this doesn't answer the question of "What does a more realistic schedule look like?" I would expect conflicts to be low or nonexistant, since those should get detected as ambiguities between systems and then get ordered. I did also ran many_foxes and changes seemed to be within noise. Some runs ended up slightly slower and some ran slightly faster. Some future work could be to remove conflicts between systems that are already ordered |
Fully agree with this analysis, and great benchmarking overall. If we had as-needed ordering (which would check archetype-component access) I'd quibble with you, but as it stands, production projects are going to be very averse to the types of parallelism that the old checks enables and these checks fail. Based on my conversations with users, I think that in practice our current parallelism is too fine-grained for realistic projects. Swapping to single-threaded schedules for your main update loop shouldn't be speeding things up! Both the checks and the system task dispatch are too expensive for all but the heaviest systems. We can help improve the former here at least. |
@inodentry, if you could test this PR on one or two of your projects I'd be very curious in the relative change in performance. I know scheduling overhead has been a bugbear of yours, and this should help improve it a bit (in addition to clearing a blocker for relations). |
Note that this pr is going to conflict with #16784. I think my preference is to merge this pr first as this pr will avoid the important perf regressions in that pr. Any other perf regressions should just be at schedule build time. |
Just to clarify: They won't cause a merge conflict in git, right? I think the only interaction is that this PR mitigates some of the cost of that one. We can do even better by completely removing
Yup! I think we'd want to move the calculation out of the executor and into the schedule graph to do that, since that's where we have the transitive ordering available. That would also let the ambiguity checker share these more precise checks! It currently ignores filters, so it reports things that I consider false positives. |
That would be very nice. It would be great if we could standardize on one "check if things conflict" mechanism, both for simplicity and for teaching. |
I was confused. I though that pr modified the access checks in the multithreaded executor too. But looking at it again, it doesn't seem to conflict. |
I'm not in a hurry to merge this pr, since it seems like we'll be getting non fragmenting relations in the short term instead of the fragmenting version. Ideally, we'd get some testing from users before merging this to make sure this doesn't regress things. |
…component-access # Conflicts: # crates/bevy_ecs/src/schedule/executor/mod.rs # crates/bevy_ecs/src/schedule/executor/multi_threaded.rs
…component-access # Conflicts: # crates/bevy_ecs/src/schedule/executor/mod.rs # crates/bevy_ecs/src/schedule/executor/multi_threaded.rs # crates/bevy_ecs/src/system/observer_system.rs # crates/bevy_ecs/src/system/schedule_system.rs
…component-access # Conflicts: # crates/bevy_ecs/src/schedule/executor/multi_threaded.rs
Isn't using |
There are definitely interesting things worth testing in this space! I made some attempts at building a static graph ahead of time, but I wasn't able to find anything that improved performance on the schedules I was testing. I'd like to keep this PR focused on doing a minimal change to remove the dependency on |
I think this is a better default, and it unlocks a lot of functionality. As we clean up the executor code, we should re-add a archetype-parallel executor for people to compare against. |
Oh! I had been planning a follow-up PR to completely remove Should I create a PR to remove |
Remove it! That's a very strong argument. |
# Objective Stop using `ArchetypeComponentId` in the executor. These IDs will grow even more quickly with relations, and the size may start to degrade performance. ## Solution Have systems expose their `FilteredAccessSet<ComponentId>`, and have the executor use that to determine which systems conflict. This can be determined statically, so determine all conflicts during initialization and only perform bit tests when running. ## Testing I ran many_foxes and didn't see any performance changes. It's probably worth testing this with a wider range of realistic schedules to see whether the reduced concurrency has a cost in practice, but I don't know what sort of test cases to use. ## Migration Guide The schedule will now prevent systems from running in parallel if there *could* be an archetype that they conflict on, even if there aren't actually any. For example, these systems will now conflict even if no entity has both `Player` and `Enemy` components: ```rust fn player_system(query: Query<(&mut Transform, &Player)>) {} fn enemy_system(query: Query<(&mut Transform, &Enemy)>) {} ``` To allow them to run in parallel, use `Without` filters, just as you would to allow both queries in a single system: ```rust // Either one of these changes alone would be enough fn player_system(query: Query<(&mut Transform, &Player), Without<Enemy>>) {} fn enemy_system(query: Query<(&mut Transform, &Enemy), Without<Player>>) {} ```
# Objective Remove `ArchetypeComponentId` and `archetype_component_access`. Following #16885, they are no longer used by the engine, so we can stop spending time calculating them or space storing them. ## Solution Remove `ArchetypeComponentId` and everything that touches it. The `System::update_archetype_component_access` method no longer needs to update `archetype_component_access`. We do still need to update query caches, but we no longer need to do so *before* running the system. We'd have to touch every caller anyway if we gave the method a better name, so just remove `System::update_archetype_component_access` and `SystemParam::new_archetype` entirely, and update the query cache in `Query::get_param`. The `Single` and `Populated` params also need their query caches updated in `SystemParam::validate_param`, so change `validate_param` to take `&mut Self::State` instead of `&Self::State`.
- State now initialized separately. - Updating archetype access no longer needed. - `new_archetype` no longer exists and archetype cache needs to be updated in `get_param`. - `ReplicationRules` resource no longer need to be cloned. For details see bevyengine/bevy#16885 bevyengine/bevy#19143
- State now initialized separately. - Updating archetype access no longer needed. - `new_archetype` no longer exists and archetype cache needs to be updated in `get_param`. - `ReplicationRules` resource no longer need to be cloned. For details see bevyengine/bevy#16885 bevyengine/bevy#19143
- State now initialized separately. - Updating archetype access no longer needed. - `new_archetype` no longer exists and archetype cache needs to be updated in `get_param`. - `ReplicationRules` resource no longer need to be cloned. For details see bevyengine/bevy#16885 bevyengine/bevy#19143
- State now initialized separately. - Updating archetype access no longer needed. - `new_archetype` no longer exists and archetype cache needs to be updated in `get_param`. - `ReplicationRules` resource no longer need to be cloned. For details see bevyengine/bevy#16885 bevyengine/bevy#19143
- State now initialized separately. - Updating archetype access no longer needed. - `new_archetype` no longer exists and archetype cache needs to be updated in `get_param`. - `ReplicationRules` resource no longer need to be cloned. For details see bevyengine/bevy#16885 bevyengine/bevy#19143
* Bump Bevy version * Migrate to the new Entity layout For details see bevyengine/bevy#19121 bevyengine/bevy#18704 The niche is now in the index, which makes the compression logic even simpler. * Migrate to the new `SystemParam` changes For details see bevyengine/bevy#16885 bevyengine/bevy#19143 * Remove `*_trigger_targets` * Simplify fns logic We no longer need custom sed/de to additionally serialize targets. This allows us to express things a bit nicer using conversion traits. * Rename all "event" into "message". * Rename all "trigger" into "event". * Rename "resend locally" into just "send locally" Fits better. * Split channel methods --------- Co-authored-by: UkoeHB <37489173+UkoeHB@users.noreply.github.com>
Objective
Stop using
ArchetypeComponentId
in the executor. These IDs will grow even more quickly with relations, and the size may start to degrade performance.Solution
Have systems expose their
FilteredAccessSet<ComponentId>
, and have the executor use that to determine which systems conflict. This can be determined statically, so determine all conflicts during initialization and only perform bit tests when running.Testing
I ran many_foxes and didn't see any performance changes. It's probably worth testing this with a wider range of realistic schedules to see whether the reduced concurrency has a cost in practice, but I don't know what sort of test cases to use.
Migration Guide
The schedule will now prevent systems from running in parallel if there could be an archetype that they conflict on, even if there aren't actually any. For example, these systems will now conflict even if no entity has both
Player
andEnemy
components:To allow them to run in parallel, use
Without
filters, just as you would to allow both queries in a single system: