Skip to content

Commit

Permalink
Multithreaded render command encoding (#9172)
Browse files Browse the repository at this point in the history
# Objective
- Encoding many GPU commands (such as in a renderpass with many draws,
such as the main opaque pass) onto a `wgpu::CommandEncoder` is very
expensive, and takes a long time.
- To improve performance, we want to perform the command encoding for
these heavy passes in parallel.

## Solution
- `RenderContext` can now queue up "command buffer generation tasks"
which are closures that will generate a command buffer when called.
- When finalizing the render context to produce the final list of
command buffers, these tasks are run in parallel on the
`ComputeTaskPool` to produce their corresponding command buffers.
- The general idea is that the node graph will run in serial, but in a
node, instead of doing rendering work, you can add tasks to do render
work in parallel with other node's tasks that get ran at the end of the
graph execution.

## Nodes Parallelized
- `MainOpaquePass3dNode`
- `PrepassNode`
- `DeferredGBufferPrepassNode`
- `ShadowPassNode` (One task per view)


## Future Work
- For large number of draws calls, might be worth further subdividing
passes into 2+ tasks.
- Extend this to UI, 2d, transparent, and transmissive nodes?
- Needs testing - small command buffers are inefficient - it may be
worth reverting to the serial command encoder usage for render phases
with few items.
- All "serial" (traditional) rendering work must finish before parallel
rendering tasks (the new stuff) can start to run.
- There is still only one submission to the graphics queue at the end of
the graph execution. There is still no ability to submit work earlier.

## Performance Improvement
Thanks to @Elabajaba for testing on Bistro.


![image](https://github.com/bevyengine/bevy/assets/47158642/be50dafa-85eb-4da5-a5cd-c0a044f1e76f)


TLDR: Without shadow mapping, this PR has no impact. _With_ shadow
mapping, this PR gives **~40 more fps** than main.

---

## Changelog
- `MainOpaquePass3dNode`, `PrepassNode`, `DeferredGBufferPrepassNode`,
and each shadow map within `ShadowPassNode` are now encoded in parallel,
giving _greatly_ increased CPU performance, mainly when shadow mapping
is enabled.
  - Does not work on WASM or AMD+Windows+Vulkan.
- Added `RenderContext::add_command_buffer_generation_task()`.
- `RenderContext::new()` now takes adapter info
- Some render graph and Node related types and methods now have
additional lifetime constraints.


## Migration Guide
`RenderContext::new()` now takes adapter info
- Some render graph and Node related types and methods now have
additional lifetime constraints.

---------

Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com>
Co-authored-by: François <mockersf@gmail.com>
  • Loading branch information
3 people authored Feb 9, 2024
1 parent 5313730 commit f4dab8a
Show file tree
Hide file tree
Showing 7 changed files with 260 additions and 137 deletions.
101 changes: 59 additions & 42 deletions crates/bevy_core_pipeline/src/core_3d/main_opaque_pass_3d_node.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ use bevy_ecs::{prelude::World, query::QueryItem};
use bevy_render::{
camera::ExtractedCamera,
render_graph::{NodeRunError, RenderGraphContext, ViewNode},
render_phase::RenderPhase,
render_resource::{PipelineCache, RenderPassDescriptor, StoreOp},
render_phase::{RenderPhase, TrackedRenderPass},
render_resource::{CommandEncoderDescriptor, PipelineCache, RenderPassDescriptor, StoreOp},
renderer::RenderContext,
view::{ViewDepthTexture, ViewTarget, ViewUniformOffset},
};
Expand All @@ -31,10 +31,10 @@ impl ViewNode for MainOpaquePass3dNode {
&'static ViewUniformOffset,
);

fn run(
fn run<'w>(
&self,
graph: &mut RenderGraphContext,
render_context: &mut RenderContext,
render_context: &mut RenderContext<'w>,
(
camera,
opaque_phase,
Expand All @@ -44,52 +44,69 @@ impl ViewNode for MainOpaquePass3dNode {
skybox_pipeline,
skybox_bind_group,
view_uniform_offset,
): QueryItem<Self::ViewQuery>,
world: &World,
): QueryItem<'w, Self::ViewQuery>,
world: &'w World,
) -> Result<(), NodeRunError> {
// Run the opaque pass, sorted by pipeline key and mesh id to greatly improve batching.
// NOTE: Scoped to drop the mutable borrow of render_context
#[cfg(feature = "trace")]
let _main_opaque_pass_3d_span = info_span!("main_opaque_pass_3d").entered();
let color_attachments = [Some(target.get_color_attachment())];
let depth_stencil_attachment = Some(depth.get_attachment(StoreOp::Store));

// Setup render pass
let mut render_pass = render_context.begin_tracked_render_pass(RenderPassDescriptor {
label: Some("main_opaque_pass_3d"),
color_attachments: &[Some(target.get_color_attachment())],
depth_stencil_attachment: Some(depth.get_attachment(StoreOp::Store)),
timestamp_writes: None,
occlusion_query_set: None,
});
let view_entity = graph.view_entity();
render_context.add_command_buffer_generation_task(move |render_device| {
#[cfg(feature = "trace")]
let _main_opaque_pass_3d_span = info_span!("main_opaque_pass_3d").entered();

if let Some(viewport) = camera.viewport.as_ref() {
render_pass.set_camera_viewport(viewport);
}
// Command encoder setup
let mut command_encoder =
render_device.create_command_encoder(&CommandEncoderDescriptor {
label: Some("main_opaque_pass_3d_command_encoder"),
});

let view_entity = graph.view_entity();
// Render pass setup
let render_pass = command_encoder.begin_render_pass(&RenderPassDescriptor {
label: Some("main_opaque_pass_3d"),
color_attachments: &color_attachments,
depth_stencil_attachment,
timestamp_writes: None,
occlusion_query_set: None,
});
let mut render_pass = TrackedRenderPass::new(&render_device, render_pass);
if let Some(viewport) = camera.viewport.as_ref() {
render_pass.set_camera_viewport(viewport);
}

// Opaque draws
opaque_phase.render(&mut render_pass, world, view_entity);
// Opaque draws
if !opaque_phase.items.is_empty() {
#[cfg(feature = "trace")]
let _opaque_main_pass_3d_span = info_span!("opaque_main_pass_3d").entered();
opaque_phase.render(&mut render_pass, world, view_entity);
}

// Alpha draws
if !alpha_mask_phase.items.is_empty() {
alpha_mask_phase.render(&mut render_pass, world, view_entity);
}
// Alpha draws
if !alpha_mask_phase.items.is_empty() {
#[cfg(feature = "trace")]
let _alpha_mask_main_pass_3d_span = info_span!("alpha_mask_main_pass_3d").entered();
alpha_mask_phase.render(&mut render_pass, world, view_entity);
}

// Draw the skybox using a fullscreen triangle
if let (Some(skybox_pipeline), Some(SkyboxBindGroup(skybox_bind_group))) =
(skybox_pipeline, skybox_bind_group)
{
let pipeline_cache = world.resource::<PipelineCache>();
if let Some(pipeline) = pipeline_cache.get_render_pipeline(skybox_pipeline.0) {
render_pass.set_render_pipeline(pipeline);
render_pass.set_bind_group(
0,
&skybox_bind_group.0,
&[view_uniform_offset.offset, skybox_bind_group.1],
);
render_pass.draw(0..3, 0..1);
// Skybox draw using a fullscreen triangle
if let (Some(skybox_pipeline), Some(SkyboxBindGroup(skybox_bind_group))) =
(skybox_pipeline, skybox_bind_group)
{
let pipeline_cache = world.resource::<PipelineCache>();
if let Some(pipeline) = pipeline_cache.get_render_pipeline(skybox_pipeline.0) {
render_pass.set_render_pipeline(pipeline);
render_pass.set_bind_group(
0,
&skybox_bind_group.0,
&[view_uniform_offset.offset, skybox_bind_group.1],
);
render_pass.draw(0..3, 0..1);
}
}
}

drop(render_pass);
command_encoder.finish()
});

Ok(())
}
Expand Down
66 changes: 40 additions & 26 deletions crates/bevy_core_pipeline/src/deferred/node.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@ use bevy_ecs::prelude::*;
use bevy_ecs::query::QueryItem;
use bevy_render::render_graph::ViewNode;

use bevy_render::render_resource::StoreOp;
use bevy_render::render_phase::TrackedRenderPass;
use bevy_render::render_resource::{CommandEncoderDescriptor, StoreOp};
use bevy_render::{
camera::ExtractedCamera,
render_graph::{NodeRunError, RenderGraphContext},
Expand Down Expand Up @@ -33,21 +34,19 @@ impl ViewNode for DeferredGBufferPrepassNode {
&'static ViewPrepassTextures,
);

fn run(
fn run<'w>(
&self,
graph: &mut RenderGraphContext,
render_context: &mut RenderContext,
render_context: &mut RenderContext<'w>,
(
camera,
opaque_deferred_phase,
alpha_mask_deferred_phase,
view_depth_texture,
view_prepass_textures,
): QueryItem<Self::ViewQuery>,
world: &World,
): QueryItem<'w, Self::ViewQuery>,
world: &'w World,
) -> Result<(), NodeRunError> {
let view_entity = graph.view_entity();

let mut color_attachments = vec![];
color_attachments.push(
view_prepass_textures
Expand Down Expand Up @@ -107,49 +106,64 @@ impl ViewNode for DeferredGBufferPrepassNode {
.map(|deferred_lighting_pass_id| deferred_lighting_pass_id.get_attachment()),
);

// If all color attachments are none: clear the color attachment list so that no fragment shader is required
if color_attachments.iter().all(Option::is_none) {
// All attachments are none: clear the attachment list so that no fragment shader is required.
color_attachments.clear();
}

{
// Set up the pass descriptor with the depth attachment and optional color attachments.
let mut render_pass = render_context.begin_tracked_render_pass(RenderPassDescriptor {
let depth_stencil_attachment = Some(view_depth_texture.get_attachment(StoreOp::Store));

let view_entity = graph.view_entity();
render_context.add_command_buffer_generation_task(move |render_device| {
#[cfg(feature = "trace")]
let _deferred_span = info_span!("deferred").entered();

// Command encoder setup
let mut command_encoder =
render_device.create_command_encoder(&CommandEncoderDescriptor {
label: Some("deferred_command_encoder"),
});

// Render pass setup
let render_pass = command_encoder.begin_render_pass(&RenderPassDescriptor {
label: Some("deferred"),
color_attachments: &color_attachments,
depth_stencil_attachment: Some(view_depth_texture.get_attachment(StoreOp::Store)),
depth_stencil_attachment,
timestamp_writes: None,
occlusion_query_set: None,
});

let mut render_pass = TrackedRenderPass::new(&render_device, render_pass);
if let Some(viewport) = camera.viewport.as_ref() {
render_pass.set_camera_viewport(viewport);
}

// Always run deferred pass to ensure the deferred gbuffer and deferred_lighting_pass_id are cleared.
{
// Run the prepass, sorted front-to-back.
// Opaque draws
if !opaque_deferred_phase.items.is_empty() {
#[cfg(feature = "trace")]
let _opaque_prepass_span = info_span!("opaque_deferred").entered();
opaque_deferred_phase.render(&mut render_pass, world, view_entity);
}

// Alpha masked draws
if !alpha_mask_deferred_phase.items.is_empty() {
// Run the deferred, sorted front-to-back.
#[cfg(feature = "trace")]
let _alpha_mask_deferred_span = info_span!("alpha_mask_deferred").entered();
alpha_mask_deferred_phase.render(&mut render_pass, world, view_entity);
}
}

if let Some(prepass_depth_texture) = &view_prepass_textures.depth {
// Copy depth buffer to texture.
render_context.command_encoder().copy_texture_to_texture(
view_depth_texture.texture.as_image_copy(),
prepass_depth_texture.texture.texture.as_image_copy(),
view_prepass_textures.size,
);
}
drop(render_pass);

// Copy prepass depth to the main depth texture
if let Some(prepass_depth_texture) = &view_prepass_textures.depth {
command_encoder.copy_texture_to_texture(
view_depth_texture.texture.as_image_copy(),
prepass_depth_texture.texture.texture.as_image_copy(),
view_prepass_textures.size,
);
}

command_encoder.finish()
});

Ok(())
}
Expand Down
77 changes: 45 additions & 32 deletions crates/bevy_core_pipeline/src/prepass/node.rs
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
use bevy_ecs::prelude::*;
use bevy_ecs::query::QueryItem;
use bevy_render::render_graph::ViewNode;
use bevy_render::render_resource::StoreOp;
use bevy_render::{
camera::ExtractedCamera,
render_graph::{NodeRunError, RenderGraphContext},
render_phase::RenderPhase,
render_resource::RenderPassDescriptor,
render_graph::{NodeRunError, RenderGraphContext, ViewNode},
render_phase::{RenderPhase, TrackedRenderPass},
render_resource::{CommandEncoderDescriptor, RenderPassDescriptor, StoreOp},
renderer::RenderContext,
view::ViewDepthTexture,
};
Expand All @@ -31,22 +29,20 @@ impl ViewNode for PrepassNode {
Option<&'static DeferredPrepass>,
);

fn run(
fn run<'w>(
&self,
graph: &mut RenderGraphContext,
render_context: &mut RenderContext,
render_context: &mut RenderContext<'w>,
(
camera,
opaque_prepass_phase,
alpha_mask_prepass_phase,
view_depth_texture,
view_prepass_textures,
deferred_prepass,
): QueryItem<Self::ViewQuery>,
world: &World,
): QueryItem<'w, Self::ViewQuery>,
world: &'w World,
) -> Result<(), NodeRunError> {
let view_entity = graph.view_entity();

let mut color_attachments = vec![
view_prepass_textures
.normal
Expand All @@ -56,55 +52,72 @@ impl ViewNode for PrepassNode {
.motion_vectors
.as_ref()
.map(|motion_vectors_texture| motion_vectors_texture.get_attachment()),
// Use None in place of Deferred attachments
// Use None in place of deferred attachments
None,
None,
];

// If all color attachments are none: clear the color attachment list so that no fragment shader is required
if color_attachments.iter().all(Option::is_none) {
// all attachments are none: clear the attachment list so that no fragment shader is required
color_attachments.clear();
}

{
// Set up the pass descriptor with the depth attachment and optional color attachments
let mut render_pass = render_context.begin_tracked_render_pass(RenderPassDescriptor {
let depth_stencil_attachment = Some(view_depth_texture.get_attachment(StoreOp::Store));

let view_entity = graph.view_entity();
render_context.add_command_buffer_generation_task(move |render_device| {
#[cfg(feature = "trace")]
let _prepass_span = info_span!("prepass").entered();

// Command encoder setup
let mut command_encoder =
render_device.create_command_encoder(&CommandEncoderDescriptor {
label: Some("prepass_command_encoder"),
});

// Render pass setup
let render_pass = command_encoder.begin_render_pass(&RenderPassDescriptor {
label: Some("prepass"),
color_attachments: &color_attachments,
depth_stencil_attachment: Some(view_depth_texture.get_attachment(StoreOp::Store)),
depth_stencil_attachment,
timestamp_writes: None,
occlusion_query_set: None,
});
let mut render_pass = TrackedRenderPass::new(&render_device, render_pass);
if let Some(viewport) = camera.viewport.as_ref() {
render_pass.set_camera_viewport(viewport);
}

// Always run opaque pass to ensure screen is cleared
{
// Run the prepass, sorted front-to-back
// Opaque draws
if !opaque_prepass_phase.items.is_empty() {
#[cfg(feature = "trace")]
let _opaque_prepass_span = info_span!("opaque_prepass").entered();
opaque_prepass_phase.render(&mut render_pass, world, view_entity);
}

// Alpha masked draws
if !alpha_mask_prepass_phase.items.is_empty() {
// Run the prepass, sorted front-to-back
#[cfg(feature = "trace")]
let _alpha_mask_prepass_span = info_span!("alpha_mask_prepass").entered();
alpha_mask_prepass_phase.render(&mut render_pass, world, view_entity);
}
}
if deferred_prepass.is_none() {
// Copy if deferred isn't going to
if let Some(prepass_depth_texture) = &view_prepass_textures.depth {
// Copy depth buffer to texture
render_context.command_encoder().copy_texture_to_texture(
view_depth_texture.texture.as_image_copy(),
prepass_depth_texture.texture.texture.as_image_copy(),
view_prepass_textures.size,
);

drop(render_pass);

// Copy prepass depth to the main depth texture if deferred isn't going to
if deferred_prepass.is_none() {
if let Some(prepass_depth_texture) = &view_prepass_textures.depth {
command_encoder.copy_texture_to_texture(
view_depth_texture.texture.as_image_copy(),
prepass_depth_texture.texture.texture.as_image_copy(),
view_prepass_textures.size,
);
}
}
}

command_encoder.finish()
});

Ok(())
}
}
Loading

0 comments on commit f4dab8a

Please sign in to comment.