Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requiring mutable access to several common shared types prevents queue phase parallelization. #3548

Open
james7132 opened this issue Jan 4, 2022 · 0 comments
Labels
A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times

Comments

@james7132
Copy link
Member

james7132 commented Jan 4, 2022

What problem does this solve or what need does it fill?

Query<&mut RenderPhase<T>>, RenderPipelineCache, SpecializedPipelines (to a lesser degree) requires mutable access to use. Each of these can limit the parallelism of the queue phase when rendering.

  • Query<&mut RenderPhase<T>> limits it to one system per phase at a time. Generally allows parallelism across mutliple phases at the same time, unless there is one system that requires access to multiple render phases.
  • RenderPipelineCache limits any dependent pipeline from parallelizing.
  • SpecializedPipelines limits parallelization across multiple phases within a single extract-prepare-queue pipeline. This is less of an issue than the other two.

What solution would you like?

  • Merge SpecializedPipelines functionality into RenderPipelines and make it generic over a SpecializedPipeline. This removes the global pipeline cache as a blocking requirement. Might also improve the ergonomics of writing those kinds of systems.
  • Switch any/all of the associated types to use internal mutability. Most notably RenderPhase<T> might benefit the most from this. Adding mutexes may be too much of a cost here. Perhaps a good place to try out lock-free data structures? Worth benchmarking.

What alternative(s) have you considered?

  • Recreate render pipelines on every executions.
  • Limit time in each queue job by aggressively batching it's inputs in the prepare phase.
@james7132 james7132 added C-Feature A new feature, making something new possible S-Needs-Triage This issue needs to be labelled labels Jan 4, 2022
@james7132 james7132 changed the title Requiring mutable access to several types prevents queue phase parallelization. Requiring mutable access to several common shared types prevents queue phase parallelization. Jan 4, 2022
@alice-i-cecile alice-i-cecile added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times and removed S-Needs-Triage This issue needs to be labelled C-Feature A new feature, making something new possible labels Jan 4, 2022
github-merge-queue bot pushed a commit that referenced this issue Feb 20, 2024
# Objective
Partially addresses #3548. `queue_shadows` and `queue_material_meshes`
cannot parallelize because of the `ResMut<RenderMeshInstances>`
parameter for `queue_material_meshes`.

## Solution
Change the `material_bind_group` field to use atomics instead of needing
full mutable access. Change the `ResMut` to a `Res`, which should allow
both sets of systems to parallelize without issue.

## Performance
Tested against `many_foxes`, this has a significant improvement over the
entire render schedule. (Yellow is this PR, red is main)

![image](https://github.com/bevyengine/bevy/assets/3137680/6cc7f346-4f50-4f12-a383-682a9ce1daf6)

The use of atomics does seem to have a negative effect on
`queue_material_meshes` (roughly a 8.29% increase in time spent in the
system).

![image](https://github.com/bevyengine/bevy/assets/3137680/7907079a-863d-4760-aa5b-df68c006ea36)

`queue_shadows` seems to be ever so slightly slower (1.6% more time
spent) in the system.

![image](https://github.com/bevyengine/bevy/assets/3137680/6d90af73-b922-45e4-bae5-df200e8b9784)

`batch_and_prepare_render_phase` seems to be a mix, but overall seems to
be slightly *faster* by about 5%.

![image](https://github.com/bevyengine/bevy/assets/3137680/fac638ff-8c90-436b-9362-c6209b18957c)
msvbg pushed a commit to msvbg/bevy that referenced this issue Feb 26, 2024
# Objective
Partially addresses bevyengine#3548. `queue_shadows` and `queue_material_meshes`
cannot parallelize because of the `ResMut<RenderMeshInstances>`
parameter for `queue_material_meshes`.

## Solution
Change the `material_bind_group` field to use atomics instead of needing
full mutable access. Change the `ResMut` to a `Res`, which should allow
both sets of systems to parallelize without issue.

## Performance
Tested against `many_foxes`, this has a significant improvement over the
entire render schedule. (Yellow is this PR, red is main)

![image](https://github.com/bevyengine/bevy/assets/3137680/6cc7f346-4f50-4f12-a383-682a9ce1daf6)

The use of atomics does seem to have a negative effect on
`queue_material_meshes` (roughly a 8.29% increase in time spent in the
system).

![image](https://github.com/bevyengine/bevy/assets/3137680/7907079a-863d-4760-aa5b-df68c006ea36)

`queue_shadows` seems to be ever so slightly slower (1.6% more time
spent) in the system.

![image](https://github.com/bevyengine/bevy/assets/3137680/6d90af73-b922-45e4-bae5-df200e8b9784)

`batch_and_prepare_render_phase` seems to be a mix, but overall seems to
be slightly *faster* by about 5%.

![image](https://github.com/bevyengine/bevy/assets/3137680/fac638ff-8c90-436b-9362-c6209b18957c)
msvbg pushed a commit to msvbg/bevy that referenced this issue Feb 26, 2024
# Objective
Partially addresses bevyengine#3548. `queue_shadows` and `queue_material_meshes`
cannot parallelize because of the `ResMut<RenderMeshInstances>`
parameter for `queue_material_meshes`.

## Solution
Change the `material_bind_group` field to use atomics instead of needing
full mutable access. Change the `ResMut` to a `Res`, which should allow
both sets of systems to parallelize without issue.

## Performance
Tested against `many_foxes`, this has a significant improvement over the
entire render schedule. (Yellow is this PR, red is main)

![image](https://github.com/bevyengine/bevy/assets/3137680/6cc7f346-4f50-4f12-a383-682a9ce1daf6)

The use of atomics does seem to have a negative effect on
`queue_material_meshes` (roughly a 8.29% increase in time spent in the
system).

![image](https://github.com/bevyengine/bevy/assets/3137680/7907079a-863d-4760-aa5b-df68c006ea36)

`queue_shadows` seems to be ever so slightly slower (1.6% more time
spent) in the system.

![image](https://github.com/bevyengine/bevy/assets/3137680/6d90af73-b922-45e4-bae5-df200e8b9784)

`batch_and_prepare_render_phase` seems to be a mix, but overall seems to
be slightly *faster* by about 5%.

![image](https://github.com/bevyengine/bevy/assets/3137680/fac638ff-8c90-436b-9362-c6209b18957c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times
Projects
None yet
2 participants