Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Mesh-related queue phase systems to parallelize #11804

Merged
merged 8 commits into from
Feb 20, 2024

Conversation

james7132
Copy link
Member

@james7132 james7132 commented Feb 10, 2024

Objective

Partially addresses #3548. queue_shadows and queue_material_meshes cannot parallelize because of the ResMut<RenderMeshInstances> parameter for queue_material_meshes.

Solution

Change the material_bind_group field to use atomics instead of needing full mutable access. Change the ResMut to a Res, which should allow both sets of systems to parallelize without issue.

Performance

Tested against many_foxes, this has a significant improvement over the entire render schedule. (Yellow is this PR, red is main)
image

The use of atomics does seem to have a negative effect on queue_material_meshes (roughly a 8.29% increase in time spent in the system).
image

queue_shadows seems to be ever so slightly slower (1.6% more time spent) in the system.
image

batch_and_prepare_render_phase seems to be a mix, but overall seems to be slightly faster by about 5%.
image

@james7132 james7132 added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Feb 10, 2024
crates/bevy_pbr/src/lightmap/mod.rs Outdated Show resolved Hide resolved
crates/bevy_pbr/src/material.rs Outdated Show resolved Hide resolved
@JMS55 JMS55 added this to the 0.13 milestone Feb 10, 2024
@james7132 james7132 requested a review from pcwalton February 10, 2024 07:53
@alice-i-cecile alice-i-cecile removed this from the 0.13 milestone Feb 12, 2024
@james7132 james7132 added this to the 0.14 milestone Feb 19, 2024
@hymm
Copy link
Contributor

hymm commented Feb 19, 2024

Does this have a significant effect on single threaded perf?

@superdump
Copy link
Contributor

superdump commented Feb 19, 2024

Merge conflict. Code looks good. Could you test with a heavier single-material-type no-contention load like many_cubes?

Copy link
Contributor

@IceSentry IceSentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming conflicts are fixed

@james7132
Copy link
Member Author

james7132 commented Feb 19, 2024

Merge conflict. Code looks good. Could you test with a heavier single-material-type no-contention load like many_cubes?

Checked against main and this PR. There seems to be next to zero impact on the average time spent, which makes sense: uncontended atomics on x86 machines is essentially free. There's a bit more variance, but not by much. Here's queue_material_meshes with many_cubes.
image

@james7132
Copy link
Member Author

Does this have a significant effect on single threaded perf?

Most modern platforms (with maybe the exception of wasm) treat all load/store operations as atomic, and so long as there is no contention on the same cache line, it's identical to single threaded performance. The only thing you might be missing out on is the compiler reordering your operations to be more optimal, which isn't a particular concern for these systems.

@james7132 james7132 enabled auto-merge February 20, 2024 00:00
@james7132 james7132 added this pull request to the merge queue Feb 20, 2024
Merged via the queue into bevyengine:main with commit 6d547d7 Feb 20, 2024
23 checks passed
msvbg pushed a commit to msvbg/bevy that referenced this pull request Feb 26, 2024
# Objective
Partially addresses bevyengine#3548. `queue_shadows` and `queue_material_meshes`
cannot parallelize because of the `ResMut<RenderMeshInstances>`
parameter for `queue_material_meshes`.

## Solution
Change the `material_bind_group` field to use atomics instead of needing
full mutable access. Change the `ResMut` to a `Res`, which should allow
both sets of systems to parallelize without issue.

## Performance
Tested against `many_foxes`, this has a significant improvement over the
entire render schedule. (Yellow is this PR, red is main)

![image](https://github.com/bevyengine/bevy/assets/3137680/6cc7f346-4f50-4f12-a383-682a9ce1daf6)

The use of atomics does seem to have a negative effect on
`queue_material_meshes` (roughly a 8.29% increase in time spent in the
system).

![image](https://github.com/bevyengine/bevy/assets/3137680/7907079a-863d-4760-aa5b-df68c006ea36)

`queue_shadows` seems to be ever so slightly slower (1.6% more time
spent) in the system.

![image](https://github.com/bevyengine/bevy/assets/3137680/6d90af73-b922-45e4-bae5-df200e8b9784)

`batch_and_prepare_render_phase` seems to be a mix, but overall seems to
be slightly *faster* by about 5%.

![image](https://github.com/bevyengine/bevy/assets/3137680/fac638ff-8c90-436b-9362-c6209b18957c)
msvbg pushed a commit to msvbg/bevy that referenced this pull request Feb 26, 2024
# Objective
Partially addresses bevyengine#3548. `queue_shadows` and `queue_material_meshes`
cannot parallelize because of the `ResMut<RenderMeshInstances>`
parameter for `queue_material_meshes`.

## Solution
Change the `material_bind_group` field to use atomics instead of needing
full mutable access. Change the `ResMut` to a `Res`, which should allow
both sets of systems to parallelize without issue.

## Performance
Tested against `many_foxes`, this has a significant improvement over the
entire render schedule. (Yellow is this PR, red is main)

![image](https://github.com/bevyengine/bevy/assets/3137680/6cc7f346-4f50-4f12-a383-682a9ce1daf6)

The use of atomics does seem to have a negative effect on
`queue_material_meshes` (roughly a 8.29% increase in time spent in the
system).

![image](https://github.com/bevyengine/bevy/assets/3137680/7907079a-863d-4760-aa5b-df68c006ea36)

`queue_shadows` seems to be ever so slightly slower (1.6% more time
spent) in the system.

![image](https://github.com/bevyengine/bevy/assets/3137680/6d90af73-b922-45e4-bae5-df200e8b9784)

`batch_and_prepare_render_phase` seems to be a mix, but overall seems to
be slightly *faster* by about 5%.

![image](https://github.com/bevyengine/bevy/assets/3137680/fac638ff-8c90-436b-9362-c6209b18957c)
@re0312 re0312 mentioned this pull request Jun 20, 2024
github-merge-queue bot pushed a commit that referenced this pull request Jul 14, 2024
# Objective

- After #11804 , The queue_prepass_material_meshes function is now
executed in parallel with other queue_* systems. This optimization
introduced a potential issue where mesh_instance.should_batch() could
return false in queue_prepass_material_meshes due to an unset
material_bind_group_id.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants