[Merged by Bors] - Allow unbatched render phases to use unstable sorts #5049

james7132 · 2022-06-19T08:00:05Z

Objective

Partially addresses #4291.

Speed up the sort phase for unbatched render phases.

Solution

Split out one of the optimizations in #4899 and allow implementers of PhaseItem to change what kind of sort is used when sorting the items in the phase. Given a &mut [Self], the implementers must sort the slice in a way that is consistent with the downstream requirements. A default implementation that uses slice::sort_unstable_by_key is provided.

Performance

This will not impact the performance of any batched phases, as it is still using a stable sort. 2D's only phase is unchanged. All 3D phases are unbatched currently, and will benefit from this change.

On many_cubes, where the primary phase is opaque, this change sees a speed up from 907.02us -> 477.62us, a 47.35% reduction.

Future Work

There were prior discussions to add support for faster radix sorts in #4291, which in theory should be a O(n) instead of a O(nlog(n)) time. voracious has been proposed, but it seems to be optimize for use cases with more than 30,000 items, which may be atypical for most systems.

Another optimization included in #4899 is to reduce the size of a few of the IDs commonly used in PhaseItem implementations to shrink the types to make swapping/sorting faster. Both CachedPipelineId and DrawFunctionId could be reduced to u32 instead of usize.

Ideally, this should automatically change to use stable sorts when BatchedPhaseItem is implemented on the same phase item type, but this requires specialization, which may not land in stable Rust for a short while.

Changelog

Added: PhaseItem::sort

Migration Guide

RenderPhases now default to a unstable sort (via slice::sort_unstable_by_key). This can typically improve sort phase performance, but may produce incorrect batching results when implementing BatchedPhaseItem. To revert to the older stable sort, manually implement PhaseItem::sort to implement a stable sort (i.e. via slice::sort_by_key).

crates/bevy_core_pipeline/src/core_2d/mod.rs

crates/bevy_render/src/render_phase/draw.rs

superdump

Just some information in a comment but then lgtm.

crates/bevy_render/src/render_phase/draw.rs

superdump

Reworded a bit and then it is good to go.

crates/bevy_render/src/render_phase/draw.rs

Co-authored-by: Robert Swain <robert.swain@gmail.com>

tim-blackbird

This is a nice improvement, one nit though.

crates/bevy_render/src/render_phase/draw.rs

james7132 · 2022-06-21T23:53:47Z

On further thought, perhaps PhaseItem::sort(&mut [Self]) might be a more apt approach to this than just hard-coding a solution. this would allow more flexibility for implementors to use their own sort implementations if they choose to do so.

# Objective The descriptions included in the API docs of `entity` module, `Entity` struct, and `Component` trait have some issues: 1. the concept of entity is not clearly defined, 2. descriptions are a little bit out of place, 3. in a case the description leak too many details about the implementation, 4. some descriptions are not exhaustive, 5. there are not enough examples, 6. the content can be formatted in a much better way. ## Solution 1. ~~Stress the fact that entity is an abstract and elementary concept. Abstract because the concept of entity is not hardcoded into the library but emerges from the interaction of `Entity` with every other part of `bevy_ecs`, like components and world methods. Elementary because it is a fundamental concept that cannot be defined with other terms (like point in euclidean geometry, or time in classical physics).~~ We decided to omit the definition of entity in the API docs ([see why]). It is only described in its relationship with components. 2. Information has been moved to relevant places and links are used instead in the other places. 3. Implementation details about `Entity` have been reduced. 4. Descriptions have been made more exhaustive by stating how to obtain and use items. Entity operations are enriched with `World` methods. 5. Examples have been added or enriched. 6. Sections have been added to organize content. Entity operations are now laid out in a table. ### Todo list - [x] Break lines at sentence-level. ## For reviewers - ~~I added a TODO over `Component` docs, make sure to check it out and discuss it if necessary.~~ ([Resolved]) - You can easily check the rendered documentation by doing `cargo doc -p bevy_ecs --no-deps --open`. [see why]: bevyengine#4767 (comment) [Resolved]: bevyengine#4767 (comment)

…el (bevyengine#4663) # Objective Further speed up visibility checking by removing the main sources of contention for the system. ## Solution - ~~Make `ComputedVisibility` a resource wrapping a `FixedBitset`.~~ - ~~Remove `ComputedVisibility` as a component.~~ ~~This adds a one-bit overhead to every entity in the app world. For a game with 100,000 entities, this is 12.5KB of memory. This is still small enough to fit entirely in most L1 caches. Also removes the need for a per-Entity change detection tick. This reduces the memory footprint of ComputedVisibility 72x.~~ ~~The decreased memory usage and less fragmented memory locality should provide significant performance benefits.~~ ~~Clearing visible entities should be significantly faster than before:~~ - ~~Setting one `u32` to 0 clears 32 entities per cycle.~~ - ~~No archetype fragmentation to contend with.~~ - ~~Change detection is applied to the resource, so there is no per-Entity update tick requirement.~~ ~~The side benefit of this design is that it removes one more "computed component" from userspace. Though accessing the values within it are now less ergonomic.~~ This PR changes `crossbeam_channel` in `check_visibility` to use a `Local<ThreadLocal<Cell<Vec<Entity>>>` to mark down visible entities instead. Co-Authored-By: TheRawMeatball <therawmeatball@gmail.com> Co-Authored-By: Aevyrie <aevyrie@gmail.com>

# Objective Closes bevyengine#1557. Partially addresses bevyengine#3362. Cleanup the public facing API for storage types. Most of these APIs are difficult to use safely when directly interfacing with these types, and is also currently impossible to interact with in normal ECS use as there is no `World::storages_mut`. The majority of these types should be easy enough to read, and perhaps mutate the contents, but never structurally altered without the same checks in the rest of bevy_ecs code. This both cleans up the public facing types and helps use unused code detection to remove a few of the APIs we're not using internally. ## Solution - Mark all APIs that take `&mut T` under `bevy_ecs::storage` as `pub(crate)` or `pub(super)` - Cleanup after it all. Entire type visibility changes: - `BlobVec` is `pub(super)`, only storage code should be directly interacting with it. - `SparseArray` is now `pub(crate)` for the entire type. It's an implementation detail for `Table` and `(Component)SparseSet`. - `TableMoveResult` is now `pub(crate) --- ## Changelog TODO ## Migration Guide Dear God, I hope not.

# Objective - Builds on top of bevyengine#4938 - Make clustered-forward PBR lighting/shadows functionality callable - See bevyengine#3969 for details ## Solution - Add `PbrInput` struct type containing a `StandardMaterial`, occlusion, world_position, world_normal, and frag_coord - Split functionality to calculate the unit view vector, and normal-mapped normal into `bevy_pbr::pbr_functions` - Split high-level shading flow into `pbr(in: PbrInput, N: vec3<f32>, V: vec3<f32>, is_orthographic: bool)` function in `bevy_pbr::pbr_functions` - Rework `pbr.wgsl` fragment stage entry point to make use of the new functions - This has been benchmarked on an M1 Max using `many_cubes -- sphere`. `main` had a median frame time of 15.88ms, this PR 15.99ms, which is a 0.69% frame time increase, which is within noise in my opinion. --- ## Changelog - Added: PBR shading code is now callable. Import `bevy_pbr::pbr_functions` and its dependencies, create a `PbrInput`, calculate the unit view and normal-mapped normal vectors and whether the projection is orthographic, and call `pbr()`!

@alice-i-cecile

…gine#4716) # Objective DioxusLabs and Bevy have taken over maintaining what was our abandoned ui layout dependency [stretch](https://github.com/vislyhq/stretch). Dioxus' fork has had a lot of work done on it by @alice-i-cecile, @Weibye , @jkelleyrtp, @mockersf, @HackerFoo, @TimJentzsch and a dozen other contributors and now is in much better shape than stretch was. The updated crate is called taffy and is available on github [here](https://github.com/DioxusLabs/taffy) ([taffy](https://crates.io/crates/taffy) on crates.io). The goal of this PR is to replace stretch v0.3.2 with taffy v0.1.0. ## Solution I changed the bevy_ui Cargo.toml to depend on taffy instead of stretch and fixed all the errors rustc complained about. --- ## Changelog Changed bevy_ui layout dependency from stretch to taffy (the maintained fork of stretch). fixes bevyengine#677 ## Migration Guide The public api of taffy is different from that of stretch so please advise me on what to do here @alice-i-cecile.

james7132 · 2022-06-22T01:36:52Z

Not sure what's going on with that merge.

On further thought, perhaps PhaseItem::sort(&mut [Self]) might be a more apt approach to this than just hard-coding a solution. this would allow more flexibility for implementors to use their own sort implementations if they choose to do so.

Went through with this. It's allows us to provide custom per-phase sorting behavior. Looking at #4291 and @superdump's test results, it seems like for unstable/unbatched phases, radsort produced some of the better results here, and it produces similar results on my machine.

With the same benchmark many_cubes, this further reduces the time spent to sort Opaque3d phase items from 846us to 144.18us, an 86.96% reduction in the system.

superdump

I like the structure and flexibility but I'd like to either benchmark and choose the best radix sort crate based on criteria discussed in #4291, or I think if we want to do that separately, it makes sense to not introduce a new dependency if we are probably going to replace it almost straight away.

crates/bevy_core_pipeline/Cargo.toml

crates/bevy_render/src/render_phase/draw.rs

Co-authored-by: Robert Swain <robert.swain@gmail.com>

james7132 · 2022-06-23T10:32:24Z

Tested the four aforementioned sorts on many_cubes -- sphere.

sort	sort_phase_system mean timings (Opaque3d)
slice::sort_unstable_by	477.62us
radsort	144.18us
voracious (unstable)	163.52us
voracious (stable)	169.72us
rdst	413.77us
rdst (single threaded)	173.29us

With these results, I think we should stick with radsort. These results mirror that of those seen in #4291 as well.

superdump · 2022-06-23T10:50:25Z

Yup, ok. I think voracious/rdst may be faster than radsort for sprites, but ok. I'm on board with taking that separately.

superdump · 2022-06-23T10:52:33Z

bors r+

# Objective Partially addresses #4291. Speed up the sort phase for unbatched render phases. ## Solution Split out one of the optimizations in #4899 and allow implementors of `PhaseItem` to change what kind of sort is used when sorting the items in the phase. This currently includes Stable, Unstable, and Unsorted. Each of these corresponds to `Vec::sort_by_key`, `Vec::sort_unstable_by_key`, and no sorting at all. The default is `Unstable`. The last one can be used as a default if users introduce a preliminary depth prepass. ## Performance This will not impact the performance of any batched phases, as it is still using a stable sort. 2D's only phase is unchanged. All 3D phases are unbatched currently, and will benefit from this change. On `many_cubes`, where the primary phase is opaque, this change sees a speed up from 907.02us -> 477.62us, a 47.35% reduction. ![image](https://user-images.githubusercontent.com/3137680/174471253-22424874-30d5-4db5-b5b4-65fb2c612a9c.png) ## Future Work There were prior discussions to add support for faster radix sorts in #4291, which in theory should be a `O(n)` instead of a `O(nlog(n))` time. [`voracious`](https://crates.io/crates/voracious_radix_sort) has been proposed, but it seems to be optimize for use cases with more than 30,000 items, which may be atypical for most systems. Another optimization included in #4899 is to reduce the size of a few of the IDs commonly used in `PhaseItem` implementations to shrink the types to make swapping/sorting faster. Both `CachedPipelineId` and `DrawFunctionId` could be reduced to `u32` instead of `usize`. Ideally, this should automatically change to use stable sorts when `BatchedPhaseItem` is implemented on the same phase item type, but this requires specialization, which may not land in stable Rust for a short while. --- ## Changelog Added: `PhaseItem::sort` ## Migration Guide RenderPhases now default to a unstable sort (via `slice::sort_unstable_by_key`). This can typically improve sort phase performance, but may produce incorrect batching results when implementing `BatchedPhaseItem`. To revert to the older stable sort, manually implement `PhaseItem::sort` to implement a stable sort (i.e. via `slice::sort_by_key`). Co-authored-by: Federico Rinaldi <gisquerin@gmail.com> Co-authored-by: Robert Swain <robert.swain@gmail.com> Co-authored-by: colepoirier <colepoirier@gmail.com>

bors · 2022-06-23T11:11:17Z

Pull request successfully merged into main.

Build succeeded:

# Objective Partially addresses bevyengine#4291. Speed up the sort phase for unbatched render phases. ## Solution Split out one of the optimizations in bevyengine#4899 and allow implementors of `PhaseItem` to change what kind of sort is used when sorting the items in the phase. This currently includes Stable, Unstable, and Unsorted. Each of these corresponds to `Vec::sort_by_key`, `Vec::sort_unstable_by_key`, and no sorting at all. The default is `Unstable`. The last one can be used as a default if users introduce a preliminary depth prepass. ## Performance This will not impact the performance of any batched phases, as it is still using a stable sort. 2D's only phase is unchanged. All 3D phases are unbatched currently, and will benefit from this change. On `many_cubes`, where the primary phase is opaque, this change sees a speed up from 907.02us -> 477.62us, a 47.35% reduction. ![image](https://user-images.githubusercontent.com/3137680/174471253-22424874-30d5-4db5-b5b4-65fb2c612a9c.png) ## Future Work There were prior discussions to add support for faster radix sorts in bevyengine#4291, which in theory should be a `O(n)` instead of a `O(nlog(n))` time. [`voracious`](https://crates.io/crates/voracious_radix_sort) has been proposed, but it seems to be optimize for use cases with more than 30,000 items, which may be atypical for most systems. Another optimization included in bevyengine#4899 is to reduce the size of a few of the IDs commonly used in `PhaseItem` implementations to shrink the types to make swapping/sorting faster. Both `CachedPipelineId` and `DrawFunctionId` could be reduced to `u32` instead of `usize`. Ideally, this should automatically change to use stable sorts when `BatchedPhaseItem` is implemented on the same phase item type, but this requires specialization, which may not land in stable Rust for a short while. --- ## Changelog Added: `PhaseItem::sort` ## Migration Guide RenderPhases now default to a unstable sort (via `slice::sort_unstable_by_key`). This can typically improve sort phase performance, but may produce incorrect batching results when implementing `BatchedPhaseItem`. To revert to the older stable sort, manually implement `PhaseItem::sort` to implement a stable sort (i.e. via `slice::sort_by_key`). Co-authored-by: Federico Rinaldi <gisquerin@gmail.com> Co-authored-by: Robert Swain <robert.swain@gmail.com> Co-authored-by: colepoirier <colepoirier@gmail.com>

Split sort phase implementations based on what's required

a66cc70

james7132 added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Jun 19, 2022

james7132 requested a review from superdump June 19, 2022 08:03

Fix CI

2ef7a7c

mockersf reviewed Jun 21, 2022

View reviewed changes

crates/bevy_core_pipeline/src/core_2d/mod.rs Outdated Show resolved Hide resolved

mockersf reviewed Jun 21, 2022

View reviewed changes