Skip to content

Commit 357d4ad

Browse files
committed
Implement experimental GPU two-phase occlusion culling for the standard
3D mesh pipeline. *Occlusion culling* allows the GPU to skip the vertex and fragment shading overhead for objects that can be quickly proved to be invisible because they're behind other geometry. A depth prepass already eliminates most fragment shading overhead for occluded objects, but the vertex shading overhead, as well as the cost of testing and rejecting fragments against the Z-buffer, is presently unavoidable for standard meshes. We currently perform occlusion culling only for meshlets. But other meshes, such as skinned meshes, can benefit from occlusion culling too in order to avoid the transform and skinning overhead for unseen meshes. This commit adapts the same [*two-phase occlusion culling*] technique that meshlets use to Bevy's standard 3D mesh pipeline when the new `OcclusionCulling` component, as well as the `DepthPrepass` component, are present on the camera. It has these steps: 1. *Early depth prepass*: We use the hierarchical Z-buffer from the previous frame to cull meshes for the initial depth prepass, effectively rendering only the meshes that were visible in the last frame. 2. *Early depth downsample*: We downsample the depth buffer to create another hierarchical Z-buffer, this time with the current view transform. 3. *Late depth prepass*: We use the new hierarchical Z-buffer to test all meshes that weren't rendered in the early depth prepass. Any meshes that pass this check are rendered. 4. *Late depth downsample*: Again, we downsample the depth buffer to create a hierarchical Z-buffer in preparation for the early depth prepass of the next frame. This step is done after all the rendering, in order to account for custom phase items that might write to the depth buffer. Note that this patch has no effect on the per-mesh CPU overhead for occluded objects, which remains high for a GPU-driven renderer due to the lack of `cold-specialization` and retained bins. If `cold-specialization` and retained bins weren't on the horizon, then a more traditional approach like potentially visible sets (PVS) or low-res CPU rendering would probably be more efficient than the GPU-driven approach that this patch implements for most scenes. However, at this point the amount of effort required to implement a PVS baking tool or a low-res CPU renderer would probably be greater than landing `cold-specialization` and retained bins, and the GPU driven approach is the more modern one anyway. It does mean that the performance improvements from occlusion culling as implemented in this patch *today* are likely to be limited, because of the high CPU overhead for occluded meshes. Note also that this patch currently doesn't implement occlusion culling for 2D objects or shadow maps. Those can be addressed in a follow-up. Additionally, note that the techniques in this patch require compute shaders, which excludes support for WebGL 2. This PR is marked experimental because of known precision issues with the downsampling approach when applied to non-power-of-two framebuffer sizes (i.e. most of them). These precision issues can, in rare cases, cause objects to be judged occluded that in fact are not. (I've never seen this in practice, but I know it's possible; it tends to be likelier to happen with small meshes.) As a follow-up to this patch, we desire to switch to the [SPD-based hi-Z buffer shader from the Granite engine], which doesn't suffer from these problems, at which point we should be able to graduate this feature from experimental status. I opted not to include that rewrite in this patch for two reasons: (1) @JMS55 is planning on doing the rewrite to coincide with the new availability of image atomic operations in Naga; (2) to reduce the scope of this patch. [*two-phase occlusion culling*]: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501 [Aaltonen SIGGRAPH 2015]: https://www.advances.realtimerendering.com/s2015/aaltonenhaar_siggraph2015_combined_final_footer_220dpi.pdf [Some literature]: https://gist.github.com/reduz/c5769d0e705d8ab7ac187d63be0099b5?permalink_comment_id=5040452#gistcomment-5040452 [SPD-based hi-Z buffer shader from the Granite engine]: https://github.com/Themaister/Granite/blob/master/assets/shaders/post/hiz.comp
1 parent b66c3ce commit 357d4ad

39 files changed

+4021
-909
lines changed

Cargo.toml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4062,3 +4062,14 @@ name = "Directional Navigation"
40624062
description = "Demonstration of Directional Navigation between UI elements"
40634063
category = "UI (User Interface)"
40644064
wasm = true
4065+
4066+
[[example]]
4067+
name = "occlusion_culling"
4068+
path = "examples/3d/occlusion_culling.rs"
4069+
doc-scrape-examples = true
4070+
4071+
[package.metadata.example.occlusion_culling]
4072+
name = "Occlusion Culling"
4073+
description = "Demonstration of Occlusion Culling"
4074+
category = "3D Rendering"
4075+
wasm = false

crates/bevy_core_pipeline/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ nonmax = "0.5"
4343
smallvec = "1"
4444
thiserror = { version = "2", default-features = false }
4545
tracing = { version = "0.1", default-features = false, features = ["std"] }
46+
bytemuck = { version = "1" }
4647

4748
[lints]
4849
workspace = true

crates/bevy_core_pipeline/src/core_2d/mod.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -312,6 +312,8 @@ impl PhaseItem for AlphaMask2d {
312312
}
313313

314314
impl BinnedPhaseItem for AlphaMask2d {
315+
// Since 2D meshes presently can't be multidrawn, the batch set key is
316+
// irrelevant.
315317
type BatchSetKey = BatchSetKey2d;
316318

317319
type BinKey = AlphaMask2dBinKey;

crates/bevy_core_pipeline/src/core_3d/mod.rs

Lines changed: 30 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,9 @@ pub mod graph {
1616
#[derive(Debug, Hash, PartialEq, Eq, Clone, RenderLabel)]
1717
pub enum Node3d {
1818
MsaaWriteback,
19-
Prepass,
19+
EarlyPrepass,
20+
EarlyDownsampleDepth,
21+
LatePrepass,
2022
DeferredPrepass,
2123
CopyDeferredLightingId,
2224
EndPrepasses,
@@ -25,6 +27,7 @@ pub mod graph {
2527
MainTransmissivePass,
2628
MainTransparentPass,
2729
EndMainPass,
30+
LateDownsampleDepth,
2831
Taa,
2932
MotionBlur,
3033
Bloom,
@@ -67,9 +70,10 @@ use core::ops::Range;
6770

6871
use bevy_render::{
6972
batching::gpu_preprocessing::{GpuPreprocessingMode, GpuPreprocessingSupport},
73+
experimental::occlusion_culling::OcclusionCulling,
7074
mesh::allocator::SlabId,
7175
render_phase::PhaseItemBatchSetKey,
72-
view::{NoIndirectDrawing, RetainedViewEntity},
76+
view::{prepare_view_targets, NoIndirectDrawing, RetainedViewEntity},
7377
};
7478
pub use camera_3d::*;
7579
pub use main_opaque_pass_3d_node::*;
@@ -114,8 +118,9 @@ use crate::{
114118
},
115119
dof::DepthOfFieldNode,
116120
prepass::{
117-
node::PrepassNode, AlphaMask3dPrepass, DeferredPrepass, DepthPrepass, MotionVectorPrepass,
118-
NormalPrepass, Opaque3dPrepass, OpaqueNoLightmap3dBatchSetKey, OpaqueNoLightmap3dBinKey,
121+
node::{EarlyPrepassNode, LatePrepassNode},
122+
AlphaMask3dPrepass, DeferredPrepass, DepthPrepass, MotionVectorPrepass, NormalPrepass,
123+
Opaque3dPrepass, OpaqueNoLightmap3dBatchSetKey, OpaqueNoLightmap3dBinKey,
119124
ViewPrepassTextures, MOTION_VECTOR_PREPASS_FORMAT, NORMAL_PREPASS_FORMAT,
120125
},
121126
skybox::SkyboxPlugin,
@@ -161,6 +166,9 @@ impl Plugin for Core3dPlugin {
161166
(
162167
sort_phase_system::<Transmissive3d>.in_set(RenderSet::PhaseSort),
163168
sort_phase_system::<Transparent3d>.in_set(RenderSet::PhaseSort),
169+
configure_occlusion_culling_view_targets
170+
.after(prepare_view_targets)
171+
.in_set(RenderSet::ManageViews),
164172
prepare_core_3d_depth_textures.in_set(RenderSet::PrepareResources),
165173
prepare_core_3d_transmission_textures.in_set(RenderSet::PrepareResources),
166174
prepare_prepass_textures.in_set(RenderSet::PrepareResources),
@@ -169,7 +177,8 @@ impl Plugin for Core3dPlugin {
169177

170178
render_app
171179
.add_render_sub_graph(Core3d)
172-
.add_render_graph_node::<ViewNodeRunner<PrepassNode>>(Core3d, Node3d::Prepass)
180+
.add_render_graph_node::<ViewNodeRunner<EarlyPrepassNode>>(Core3d, Node3d::EarlyPrepass)
181+
.add_render_graph_node::<ViewNodeRunner<LatePrepassNode>>(Core3d, Node3d::LatePrepass)
173182
.add_render_graph_node::<ViewNodeRunner<DeferredGBufferPrepassNode>>(
174183
Core3d,
175184
Node3d::DeferredPrepass,
@@ -200,7 +209,8 @@ impl Plugin for Core3dPlugin {
200209
.add_render_graph_edges(
201210
Core3d,
202211
(
203-
Node3d::Prepass,
212+
Node3d::EarlyPrepass,
213+
Node3d::LatePrepass,
204214
Node3d::DeferredPrepass,
205215
Node3d::CopyDeferredLightingId,
206216
Node3d::EndPrepasses,
@@ -898,6 +908,20 @@ pub fn prepare_core_3d_transmission_textures(
898908
}
899909
}
900910

911+
/// Sets the `TEXTURE_BINDING` flag on the depth texture if necessary for
912+
/// occlusion culling.
913+
///
914+
/// We need that flag to be set in order to read from the texture.
915+
fn configure_occlusion_culling_view_targets(
916+
mut view_targets: Query<&mut Camera3d, (With<OcclusionCulling>, With<DepthPrepass>)>,
917+
) {
918+
for mut camera_3d in &mut view_targets {
919+
let mut depth_texture_usages = TextureUsages::from(camera_3d.depth_texture_usages);
920+
depth_texture_usages |= TextureUsages::TEXTURE_BINDING;
921+
camera_3d.depth_texture_usages = depth_texture_usages.into();
922+
}
923+
}
924+
901925
// Disable MSAA and warn if using deferred rendering
902926
pub fn check_msaa(mut deferred_views: Query<&mut Msaa, (With<Camera>, With<DeferredPrepass>)>) {
903927
for mut msaa in deferred_views.iter_mut() {

crates/bevy_pbr/src/meshlet/downsample_depth.wgsl renamed to crates/bevy_core_pipeline/src/experimental/mip_generation/downsample_depth.wgsl

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,16 @@
11
#ifdef MESHLET_VISIBILITY_BUFFER_RASTER_PASS_OUTPUT
22
@group(0) @binding(0) var<storage, read> mip_0: array<u64>; // Per pixel
33
#else
4+
#ifdef MESHLET
45
@group(0) @binding(0) var<storage, read> mip_0: array<u32>; // Per pixel
5-
#endif
6+
#else // MESHLET
7+
#ifdef MULTISAMPLE
8+
@group(0) @binding(0) var mip_0: texture_depth_multisampled_2d;
9+
#else // MULTISAMPLE
10+
@group(0) @binding(0) var mip_0: texture_depth_2d;
11+
#endif // MULTISAMPLE
12+
#endif // MESHLET
13+
#endif // MESHLET_VISIBILITY_BUFFER_RASTER_PASS_OUTPUT
614
@group(0) @binding(1) var mip_1: texture_storage_2d<r32float, write>;
715
@group(0) @binding(2) var mip_2: texture_storage_2d<r32float, write>;
816
@group(0) @binding(3) var mip_3: texture_storage_2d<r32float, write>;
@@ -304,9 +312,25 @@ fn load_mip_0(x: u32, y: u32) -> f32 {
304312
let i = y * constants.view_width + x;
305313
#ifdef MESHLET_VISIBILITY_BUFFER_RASTER_PASS_OUTPUT
306314
return bitcast<f32>(u32(mip_0[i] >> 32u));
307-
#else
315+
#else // MESHLET_VISIBILITY_BUFFER_RASTER_PASS_OUTPUT
316+
#ifdef MESHLET
308317
return bitcast<f32>(mip_0[i]);
309-
#endif
318+
#else // MESHLET
319+
// Downsample the top level.
320+
#ifdef MULTISAMPLE
321+
// The top level is multisampled, so we need to loop over all the samples
322+
// and reduce them to 1.
323+
var result = textureLoad(mip_0, vec2(x, y), 0);
324+
let sample_count = i32(textureNumSamples(mip_0));
325+
for (var sample = 1; sample < sample_count; sample += 1) {
326+
result = min(result, textureLoad(mip_0, vec2(x, y), sample));
327+
}
328+
return result;
329+
#else // MULTISAMPLE
330+
return textureLoad(mip_0, vec2(x, y), 0);
331+
#endif // MULTISAMPLE
332+
#endif // MESHLET
333+
#endif // MESHLET_VISIBILITY_BUFFER_RASTER_PASS_OUTPUT
310334
}
311335

312336
fn reduce_4(v: vec4f) -> f32 {

0 commit comments

Comments
 (0)