Skip to content

Commit 2dd9e60

Browse files
authored
Add release notes for GPU-driven rendering (#2097)
1 parent 2c11f13 commit 2dd9e60

File tree

1 file changed

+96
-0
lines changed

1 file changed

+96
-0
lines changed
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
## GPU-Driven Rendering
2+
3+
Over the years, the trend in real-time rendering has increasingly been to move work from the CPU to the GPU. One of the latest developments in this area has been *GPU-driven rendering*, in which the GPU takes a representation of the scene and essentially works out what to draw on its own. While Bevy 0.15 had support for GPU-driven rendering for virtual geometry only, Bevy 0.16 supports GPU-driven rendering for most other types of 3D meshes, including skinned ones. This dramatically reduces the amount of CPU time that the renderer needs for larger scenes. It's automatically enabled on platforms that support it; unless your application hooks into the rendering pipeline, upgrading to Bevy 0.16 will automatically enable GPU-driven rendering for your meshes.
4+
5+
To explain how GPU-driven rendering operates, it's easiest to first describe how CPU-driven rendering works:
6+
7+
1. The CPU determines the objects that are visible, via frustum culling and perhaps occlusion culling.
8+
9+
2. For each such object:
10+
11+
a. The CPU sends the object's transform to the GPU, possibly in addition to other data such as joint weights.
12+
13+
b. The CPU tells the GPU where the mesh data is.
14+
15+
c. The CPU writes the material data to the GPU.
16+
17+
d. The CPU tells the GPU where textures and other buffers needed to render the objects are (light data, etc.)
18+
19+
e. The CPU issues a drawcall.
20+
21+
f. The GPU renders the object.
22+
23+
In contrast, GPU-driven rendering in Bevy works like this:
24+
25+
1. The CPU supplies a single buffer containing transform information for all objects to the GPU, so that shaders can process many objects at once.
26+
27+
2. If new objects have been spawned since the last frame, the CPU fills out tables specifying where the mesh data for the new objects are.
28+
29+
3. If materials have been modified since the last frame, the CPU uploads information about those materials to the GPU.
30+
31+
4. The CPU creates lists of objects to be rendered this frame. Each object is simply referenced by an integer ID, so these lists are small. The number of lists depends on the size and complexity of the scene, but there are rarely more than 15 such lists even for large scenes.
32+
33+
5. For each such list:
34+
35+
a. The CPU issues a *single* drawcall.
36+
37+
b. The GPU processes all objects in the list, determining which ones are truly visible.
38+
39+
c. The GPU renders each such visible object.
40+
41+
For large scenes that may have tens of thousands of objects, GPU-driven rendering frequently results in a reduction in CPU rendering overhead of 3× or more. It's also necessary for occlusion culling, because of the GPU transform step (5(b) above).
42+
43+
Internally, GPU-driven rendering is less a single technique than a combination of several techniques. These include:
44+
45+
* *Multi-draw indirect* (MDI), a GPU API that allows multiple meshes to be drawn in a single drawcall, the details of which the GPU provides by filling out tables in GPU memory. In order to use MDI effectively, Bevy uses a new subsystem, the *mesh allocator*, which manages the details of packing meshes together in GPU memory.
46+
47+
- *Multi-draw indirect count* (MDIC), an extension to multi-draw indirect that allows the GPU to determine the *number* of meshes to draw with minimal overhead.
48+
49+
* *Bindless resources*, which allow Bevy to supply the textures (and other resources) for many objects as a group, instead of having to bind textures one-by-one on the CPU. These resources are managed by a new subsystem known as the *material allocator*.
50+
51+
* *GPU transform and cull*, which allows Bevy to compute the position and visibility of every object from the camera's perspective on the GPU instead of on the CPU.
52+
53+
* The *retained render world*, which allows the CPU to avoid processing and uploading data that hasn't changed since the last frame.
54+
55+
* *Cached pipeline specialization*, which leverages Bevy's component-level change detection to more quickly determine when the rendering state for meshes is unchanged from the previous frame.
56+
57+
At the moment, not all platforms offer full support for this feature. The following table summarizes the platform support for the various parts of GPU-driven rendering:
58+
59+
| OS | Graphics API | GPU transform | Multi-draw & GPU cull | Bindless resources |
60+
|---------|--------------|---------------|-----------------------|--------------------|
61+
| Windows | Vulkan ||||
62+
| Windows | Direct3D 12 ||||
63+
| Windows | OpenGL ||||
64+
| Linux | Vulkan ||||
65+
| Linux | OpenGL ||||
66+
| macOS | Metal |||➖¹ |
67+
| iOS | Metal |||➖¹ |
68+
| Android | Vulkan | ➖² |➖² |➖² |
69+
| Web | WebGPU ||||
70+
| Web | WebGL 2 ||||
71+
72+
¹ Bevy does support bindless resources on Metal, but the limits are currently significantly lower, potentially resulting in more drawcalls.
73+
74+
² Some Android drivers that are known to exhibit bugs in Bevy's workloads are denylisted and will cause Bevy to fall back to CPU-driven rendering.
75+
76+
In most cases, you don't need to do anything special in order for your application to support GPU-driven rendering. There are two main exceptions:
77+
78+
1. Materials with custom WGSL shaders will continue to use CPU-driven rendering by default. In order for your materials to use GPU-driven rendering, you'll want to use the new `#[bindless]` feature on `AsBindGroup`. See the `AsBindGroup` documentation and the `shader_material_bindless` example for more details. If you're using `ExtendedMaterial`, check out the new `extended_material_bindless` example.
79+
80+
2. Applications and plugins that hook into the renderer at a low level will need to be updated to support GPU-driven rendering. The newly-updated `custom_phase_item` and `specialized_mesh_pipeline` examples may prove useful as a guide to do this.
81+
82+
Bevy's current GPU-driven rendering isn't the end of the story. There's a sizable amount of potential future work to be done:
83+
84+
* Bevy 0.16 only supports GPU-driven rendering for the 3D pipeline, but the techniques are equally applicable to the 2D pipeline. Future versions of Bevy should support GPU-driven rendering for 2D mesh rendering, sprites, UI, and so on.
85+
86+
* Bevy currently draws objects with morph targets using CPU-driven rendering. This is something we plan to address in the future. Note that the presence of objects with morph targets doesn't prevent objects that don't have morph targets from being drawn with GPU-driven rendering.
87+
88+
* In the future, a portion of the GPU-driven rendering infrastructure could be ported to platforms that don't support the full set of features, offering some performance improvements on those platforms. For example, even on WebGL 2 the renderer could make use of the material allocator to pack data more efficiently.
89+
90+
* We're watching new API features, such as [Vulkan device generated commands] and [Direct3D 12 work graphs], with interest. These would allow future versions of Bevy to offload even more work to the GPU, such as sorting of transparent objects. While figuring out how to unify these disparate APIs in a single renderer will be a challenge, the future possibilities in this space are exciting.
91+
92+
If you're interested in any of these tasks, feel free to ask in Discord or via GitHub issues.
93+
94+
[Vulkan device generated commands]: https://www.supergoodcode.com/device-generated-commands/
95+
96+
[Direct3D 12 work graphs]: https://devblogs.microsoft.com/directx/d3d12-work-graphs/

0 commit comments

Comments
 (0)