Skip to content

Commit 4dadebd

Browse files
pcwaltonjames7132
andauthored
Improve performance by binning together opaque items instead of sorting them. (#12453)
Today, we sort all entities added to all phases, even the phases that don't strictly need sorting, such as the opaque and shadow phases. This results in a performance loss because our `PhaseItem`s are rather large in memory, so sorting is slow. Additionally, determining the boundaries of batches is an O(n) process. This commit makes Bevy instead applicable place phase items into *bins* keyed by *bin keys*, which have the invariant that everything in the same bin is potentially batchable. This makes determining batch boundaries O(1), because everything in the same bin can be batched. Instead of sorting each entity, we now sort only the bin keys. This drops the sorting time to near-zero on workloads with few bins like `many_cubes --no-frustum-culling`. Memory usage is improved too, with batch boundaries and dynamic indices now implicit instead of explicit. The improved memory usage results in a significant win even on unbatchable workloads like `many_cubes --no-frustum-culling --vary-material-data-per-instance`, presumably due to cache effects. Not all phases can be binned; some, such as transparent and transmissive phases, must still be sorted. To handle this, this commit splits `PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the logic that today deals with `PhaseItem`s has been moved to `SortedPhaseItem`. `BinnedPhaseItem` has the new logic. Frame time results (in ms/frame) are as follows: | Benchmark | `binning` | `main` | Speedup | | ------------------------ | --------- | ------- | ------- | | `many_cubes -nfc -vpi` | 232.179 | 312.123 | 34.43% | | `many_cubes -nfc` | 25.874 | 30.117 | 16.40% | | `many_foxes` | 3.276 | 3.515 | 7.30% | (`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for `--vary-per-instance`.) --- ## Changelog ### Changed * Render phases have been split into binned and sorted phases. Binned phases, such as the common opaque phase, achieve improved CPU performance by avoiding the sorting step. ## Migration Guide - `PhaseItem` has been split into `BinnedPhaseItem` and `SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need to migrate them to one of these two types. `SortedPhaseItem` requires the fewest code changes, but you may want to pick `BinnedPhaseItem` if your phase doesn't require sorting, as that enables higher performance. ## Tracy graphs `many-cubes --no-frustum-culling`, `main` branch: <img width="1064" alt="Screenshot 2024-03-12 180037" src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55"> `many-cubes --no-frustum-culling`, this branch: <img width="1064" alt="Screenshot 2024-03-12 180011" src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c"> You can see that `batch_and_prepare_binned_render_phase` is a much smaller fraction of the time. Zooming in on that function, with yellow being this branch and red being `main`, we see: <img width="1064" alt="Screenshot 2024-03-12 175832" src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0"> The binning happens in `queue_material_meshes`. Again with yellow being this branch and red being `main`: <img width="1064" alt="Screenshot 2024-03-12 175755" src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96"> We can see that there is a small regression in `queue_material_meshes` performance, but it's not nearly enough to outweigh the large gains in `batch_and_prepare_binned_render_phase`. --------- Co-authored-by: James Liu <contact@jamessliu.com>
1 parent df76fd4 commit 4dadebd

File tree

31 files changed

+1059
-418
lines changed

31 files changed

+1059
-418
lines changed

crates/bevy_core_pipeline/src/core_2d/main_pass_2d_node.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ use bevy_render::{
44
camera::ExtractedCamera,
55
diagnostic::RecordDiagnostics,
66
render_graph::{Node, NodeRunError, RenderGraphContext},
7-
render_phase::RenderPhase,
7+
render_phase::SortedRenderPhase,
88
render_resource::RenderPassDescriptor,
99
renderer::RenderContext,
1010
view::{ExtractedView, ViewTarget},
@@ -16,7 +16,7 @@ pub struct MainPass2dNode {
1616
query: QueryState<
1717
(
1818
&'static ExtractedCamera,
19-
&'static RenderPhase<Transparent2d>,
19+
&'static SortedRenderPhase<Transparent2d>,
2020
&'static ViewTarget,
2121
),
2222
With<ExtractedView>,

crates/bevy_core_pipeline/src/core_2d/mod.rs

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ use bevy_render::{
3838
render_graph::{EmptyNode, RenderGraphApp, ViewNodeRunner},
3939
render_phase::{
4040
sort_phase_system, CachedRenderPipelinePhaseItem, DrawFunctionId, DrawFunctions, PhaseItem,
41-
RenderPhase,
41+
SortedPhaseItem, SortedRenderPhase,
4242
},
4343
render_resource::CachedRenderPipelineId,
4444
Extract, ExtractSchedule, Render, RenderApp, RenderSet,
@@ -96,29 +96,16 @@ pub struct Transparent2d {
9696
}
9797

9898
impl PhaseItem for Transparent2d {
99-
type SortKey = FloatOrd;
100-
10199
#[inline]
102100
fn entity(&self) -> Entity {
103101
self.entity
104102
}
105103

106-
#[inline]
107-
fn sort_key(&self) -> Self::SortKey {
108-
self.sort_key
109-
}
110-
111104
#[inline]
112105
fn draw_function(&self) -> DrawFunctionId {
113106
self.draw_function
114107
}
115108

116-
#[inline]
117-
fn sort(items: &mut [Self]) {
118-
// radsort is a stable radix sort that performed better than `slice::sort_by_key` or `slice::sort_unstable_by_key`.
119-
radsort::sort_by_key(items, |item| item.sort_key().0);
120-
}
121-
122109
#[inline]
123110
fn batch_range(&self) -> &Range<u32> {
124111
&self.batch_range
@@ -140,6 +127,21 @@ impl PhaseItem for Transparent2d {
140127
}
141128
}
142129

130+
impl SortedPhaseItem for Transparent2d {
131+
type SortKey = FloatOrd;
132+
133+
#[inline]
134+
fn sort_key(&self) -> Self::SortKey {
135+
self.sort_key
136+
}
137+
138+
#[inline]
139+
fn sort(items: &mut [Self]) {
140+
// radsort is a stable radix sort that performed better than `slice::sort_by_key` or `slice::sort_unstable_by_key`.
141+
radsort::sort_by_key(items, |item| item.sort_key().0);
142+
}
143+
}
144+
143145
impl CachedRenderPipelinePhaseItem for Transparent2d {
144146
#[inline]
145147
fn cached_pipeline(&self) -> CachedRenderPipelineId {
@@ -155,7 +157,7 @@ pub fn extract_core_2d_camera_phases(
155157
if camera.is_active {
156158
commands
157159
.get_or_spawn(entity)
158-
.insert(RenderPhase::<Transparent2d>::default());
160+
.insert(SortedRenderPhase::<Transparent2d>::default());
159161
}
160162
}
161163
}

crates/bevy_core_pipeline/src/core_3d/main_opaque_pass_3d_node.rs

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ use bevy_render::{
77
camera::ExtractedCamera,
88
diagnostic::RecordDiagnostics,
99
render_graph::{NodeRunError, RenderGraphContext, ViewNode},
10-
render_phase::{RenderPhase, TrackedRenderPass},
10+
render_phase::{BinnedRenderPhase, TrackedRenderPass},
1111
render_resource::{CommandEncoderDescriptor, PipelineCache, RenderPassDescriptor, StoreOp},
1212
renderer::RenderContext,
1313
view::{ViewDepthTexture, ViewTarget, ViewUniformOffset},
@@ -17,14 +17,16 @@ use bevy_utils::tracing::info_span;
1717

1818
use super::AlphaMask3d;
1919

20-
/// A [`bevy_render::render_graph::Node`] that runs the [`Opaque3d`] and [`AlphaMask3d`] [`RenderPhase`].
20+
/// A [`bevy_render::render_graph::Node`] that runs the [`Opaque3d`]
21+
/// [`BinnedRenderPhase`] and [`AlphaMask3d`]
22+
/// [`bevy_render::render_phase::SortedRenderPhase`]s.
2123
#[derive(Default)]
2224
pub struct MainOpaquePass3dNode;
2325
impl ViewNode for MainOpaquePass3dNode {
2426
type ViewQuery = (
2527
&'static ExtractedCamera,
26-
&'static RenderPhase<Opaque3d>,
27-
&'static RenderPhase<AlphaMask3d>,
28+
&'static BinnedRenderPhase<Opaque3d>,
29+
&'static BinnedRenderPhase<AlphaMask3d>,
2830
&'static ViewTarget,
2931
&'static ViewDepthTexture,
3032
Option<&'static SkyboxPipelineId>,
@@ -80,14 +82,14 @@ impl ViewNode for MainOpaquePass3dNode {
8082
}
8183

8284
// Opaque draws
83-
if !opaque_phase.items.is_empty() {
85+
if !opaque_phase.is_empty() {
8486
#[cfg(feature = "trace")]
8587
let _opaque_main_pass_3d_span = info_span!("opaque_main_pass_3d").entered();
8688
opaque_phase.render(&mut render_pass, world, view_entity);
8789
}
8890

8991
// Alpha draws
90-
if !alpha_mask_phase.items.is_empty() {
92+
if !alpha_mask_phase.is_empty() {
9193
#[cfg(feature = "trace")]
9294
let _alpha_mask_main_pass_3d_span = info_span!("alpha_mask_main_pass_3d").entered();
9395
alpha_mask_phase.render(&mut render_pass, world, view_entity);

crates/bevy_core_pipeline/src/core_3d/main_transmissive_pass_3d_node.rs

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ use bevy_ecs::{prelude::*, query::QueryItem};
44
use bevy_render::{
55
camera::ExtractedCamera,
66
render_graph::{NodeRunError, RenderGraphContext, ViewNode},
7-
render_phase::RenderPhase,
7+
render_phase::SortedRenderPhase,
88
render_resource::{Extent3d, RenderPassDescriptor, StoreOp},
99
renderer::RenderContext,
1010
view::{ViewDepthTexture, ViewTarget},
@@ -13,15 +13,16 @@ use bevy_render::{
1313
use bevy_utils::tracing::info_span;
1414
use std::ops::Range;
1515

16-
/// A [`bevy_render::render_graph::Node`] that runs the [`Transmissive3d`] [`RenderPhase`].
16+
/// A [`bevy_render::render_graph::Node`] that runs the [`Transmissive3d`]
17+
/// [`SortedRenderPhase`].
1718
#[derive(Default)]
1819
pub struct MainTransmissivePass3dNode;
1920

2021
impl ViewNode for MainTransmissivePass3dNode {
2122
type ViewQuery = (
2223
&'static ExtractedCamera,
2324
&'static Camera3d,
24-
&'static RenderPhase<Transmissive3d>,
25+
&'static SortedRenderPhase<Transmissive3d>,
2526
&'static ViewTarget,
2627
Option<&'static ViewTransmissionTexture>,
2728
&'static ViewDepthTexture,

crates/bevy_core_pipeline/src/core_3d/main_transparent_pass_3d_node.rs

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,22 +4,23 @@ use bevy_render::{
44
camera::ExtractedCamera,
55
diagnostic::RecordDiagnostics,
66
render_graph::{NodeRunError, RenderGraphContext, ViewNode},
7-
render_phase::RenderPhase,
7+
render_phase::SortedRenderPhase,
88
render_resource::{RenderPassDescriptor, StoreOp},
99
renderer::RenderContext,
1010
view::{ViewDepthTexture, ViewTarget},
1111
};
1212
#[cfg(feature = "trace")]
1313
use bevy_utils::tracing::info_span;
1414

15-
/// A [`bevy_render::render_graph::Node`] that runs the [`Transparent3d`] [`RenderPhase`].
15+
/// A [`bevy_render::render_graph::Node`] that runs the [`Transparent3d`]
16+
/// [`SortedRenderPhase`].
1617
#[derive(Default)]
1718
pub struct MainTransparentPass3dNode;
1819

1920
impl ViewNode for MainTransparentPass3dNode {
2021
type ViewQuery = (
2122
&'static ExtractedCamera,
22-
&'static RenderPhase<Transparent3d>,
23+
&'static SortedRenderPhase<Transparent3d>,
2324
&'static ViewTarget,
2425
&'static ViewDepthTexture,
2526
);

0 commit comments

Comments
 (0)