Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine visibility queries in check_visibility_system #10196

Merged

Conversation

rparrett
Copy link
Contributor

@rparrett rparrett commented Oct 19, 2023

Objective

Alternative to #7310

Solution

Implemented the suggestion from #7310 (comment)

I am guessing that these were originally split as an optimization, but I am not sure since I believe the original author of the code is the one speculating about combining them up there.

Benchmarks

I ran three benchmarks to compare main, this PR, and the approach from #7310 (updated to the same commit on main).

This seems to perform slightly better than main in scenarios where most entities have AABBs, and a bit worse when they don't (many_lights). That seems to make sense to me.

Either way, the difference is ~-20 microseconds in the more common scenarios or ~+100 microseconds in the less common scenario. I would speculate that this might perform very slightly worse in single-threaded scenarios.

Benches were run in release mode for 2000 frames while capturing a trace with tracy.

bench commit check_visibility_system mean μs
many_cubes main 929.5
many_cubes this 914.0
many_cubes 7310 1003.5
many_foxes main 191.6
many_foxes this 173.2
many_foxes 7310 167.9
many_lights main 619.3
many_lights this 703.7
many_lights 7310 842.5

Notes

Technically this behaves slightly differently -- prior to this PR, view visibility was determined even for entities without GlobalTransform. I don't think this has any practical impact though.

IMO, I don't think we need to do this. But I opened a PR because it seemed like the handiest way to share the code / benchmarks.

TODO

I have done some rudimentary testing with the examples above, but I can do some screenshot diffing if it seems like we want to do this.

@rparrett rparrett added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Oct 19, 2023
Copy link
Member

@james7132 james7132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should lower the per-view overhead, which is what the gains in the stress tests seem to show, since there aren't any non-AABB entities in them. This should also increase the thread utilization in the case there is a mix of archetypes. The only downside might be that the per-entity time spent might increase, but that's already probably dominated by the actual math being done, and/or mitigated by the fact that Option<Aabb> will be the same for a given archetype and will result in better branch prediction.

Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong feelings on the performance changes, but I quite like how much this simplifies the code here. Should we experiment with adding par_iter_mut into the remaining loop? The second loop used it, presumably for good reason.

@alice-i-cecile alice-i-cecile added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Oct 20, 2023
@alice-i-cecile alice-i-cecile requested a review from hymm October 20, 2023 16:06
@rparrett
Copy link
Contributor Author

Should we experiment with adding par_iter_mut into the remaining loop?

Both used par_iter_mut, I just removed one. Github's diff is leaving out that context.

@alice-i-cecile alice-i-cecile added this pull request to the merge queue Oct 31, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 31, 2023
@alice-i-cecile alice-i-cecile added this pull request to the merge queue Nov 2, 2023
Merged via the queue into bevyengine:main with commit 09c2090 Nov 2, 2023
25 checks passed
aevyrie added a commit to aevyrie/bevy that referenced this pull request Nov 4, 2023
Fix branding inconsistencies

don't Implement `Display` for `Val` (bevyengine#10345)

- Revert bevyengine#10296

- Avoid implementing `Display` without a justification
- `Display` implementation is a guarantee without a direct use, takes
additional time to compile and require work to maintain
- `Debug`, `Reflect` or `Serialize` should cover all needs

Combine visibility queries in check_visibility_system (bevyengine#10196)

Alternative to bevyengine#7310

Implemented the suggestion from
bevyengine#7310 (comment)

I am guessing that these were originally split as an optimization, but I
am not sure since I believe the original author of the code is the one
speculating about combining them up there.

I ran three benchmarks to compare main, this PR, and the approach from
([updated](https://github.com/rparrett/bevy/commits/rebased-parallel-check-visibility)
to the same commit on main).

This seems to perform slightly better than main in scenarios where most
entities have AABBs, and a bit worse when they don't (`many_lights`).
That seems to make sense to me.

Either way, the difference is ~-20 microseconds in the more common
scenarios or ~+100 microseconds in the less common scenario. I would
speculate that this might perform **very slightly** worse in
single-threaded scenarios.

Benches were run in release mode for 2000 frames while capturing a trace
with tracy.

| bench | commit | check_visibility_system mean μs |
| -- | -- | -- |
| many_cubes | main | 929.5 |
| many_cubes | this | 914.0 |
| many_cubes | 7310 | 1003.5 |
| | |
| many_foxes | main | 191.6 |
| many_foxes | this | 173.2 |
| many_foxes | 7310 | 167.9 |
| | |
| many_lights | main | 619.3 |
| many_lights | this | 703.7 |
| many_lights | 7310 | 842.5 |

Technically this behaves slightly differently -- prior to this PR, view
visibility was determined even for entities without `GlobalTransform`. I
don't think this has any practical impact though.

IMO, I don't think we need to do this. But I opened a PR because it
seemed like the handiest way to share the code / benchmarks.

I have done some rudimentary testing with the examples above, but I can
do some screenshot diffing if it seems like we want to do this.

Make VERTEX_COLORS usable in prepass shader, if available (bevyengine#10341)

I was working with forward rendering prepass fragment shaders and ran
into an issue of not being able to access vertex colors in the prepass.
I was able to access vertex colors in regular fragment shaders as well
as in deferred shaders.

It seems like this `if` was nested unintentionally as moving it outside
of the `deferred` block works.

---

Enable vertex colors in forward rendering prepass fragment shaders

allow DeferredPrepass to work without other prepass markers (bevyengine#10223)

fix crash / misbehaviour when `DeferredPrepass` is used without
`DepthPrepass`.

- Deferred lighting requires the depth prepass texture to be present, so
that the depth texture is available for binding. without it the deferred
lighting pass will use 0 for depth of all meshes.
- When `DeferredPrepass` is used without other prepass markers, and with
any materials that use `OpaqueRenderMode::Forward`, those entities will
try to queue to the `Opaque3dPrepass` render phase, which doesn't exist,
causing a crash.

- check if the prepass phases exist before queueing
- generate prepass textures if `Opaque3dDeferred` is present
- add a note to the DeferredPrepass marker to note that DepthPrepass is
also required by the default deferred lighting pass
- also changed some `With<T>.is_some()`s to `Has<T>`s

UI batching Fix (bevyengine#9610)

Reimplement bevyengine#8793 on top of the recent rendering changes.

The batch creation logic is quite convoluted, but I tested it on enough
examples to convince myself that it works.

The initial value of `batch_image_handle` is changed from
`HandleId::Id(Uuid::nil(), u64::MAX)` to `DEFAULT_IMAGE_HANDLE.id()`,
which allowed me to make the if-block simpler I think.

The default image from `DEFAULT_IMAGE_HANDLE` is always inserted into
`UiImageBindGroups` even if it's not used. I tried to add a check so
that it would be only inserted when there is only one batch using the
default image but this crashed.

---

`prepare_uinodes`
* Changed the initial value of `batch_image_handle` to
`DEFAULT_IMAGE_HANDLE.id()`.
* The default image is added to the UI image bind groups before
assembling the batches.
* A new `UiBatch` isn't created when the next `ExtractedUiNode`s image
is set to `DEFAULT_IMAGE_HANDLE` (unless it is the first item in the UI
phase items list).

Increase default normal bias to avoid common artifacts (bevyengine#10346)

Bevy's default bias values for directional and spot lights currently
cause significant artifacts. We should fix that so shadows look good by
default!

This is a less controversial/invasive alternative to bevyengine#10188, which might
enable us to keep the default bias value low, but also has its own sets
of concerns and caveats that make it a risky choice for Bevy 0.12.

Bump the default normal bias from `0.6` to `1.8`. There is precedent for
values in this general area as Godot has a default normal bias of `2.0`.

![image](https://github.com/superdump/bevy/assets/2694663/a5828011-33fc-4427-90ed-f093d7389053)

![image](https://github.com/bevyengine/bevy/assets/2694663/0f2b16b0-c116-41ab-9886-1ace9e00efd6)

The default `shadow_normal_bias` value for `DirectionalLight` and
`SpotLight` has changed to accommodate artifacts introduced with the new
shadow PCF changes. It is unlikely (especially given the new PCF shadow
behaviors with these values), but you might need to manually tweak this
value if your scene requires a lower bias and it relied on the previous
default value.

Make `DirectionalLight` `Cascades` computation generic over `CameraProjection` (bevyengine#9226)

Fixes bevyengine#9077 (see this issue for
motivations)

Implement 1 and 2 of the "How to fix it" section of
bevyengine#9077

`update_directional_light_cascades` is split into
`clear_directional_light_cascades` and a generic
`build_directional_light_cascades`, to clear once and potentially insert
many times.

---

`DirectionalLight`'s computation is now generic over `CameraProjection`
and can work with custom camera projections.

If you have a component `MyCustomProjection` that implements
`CameraProjection`:
- You need to implement a new required associated method,
`get_frustum_corners`, returning an array of the corners of a subset of
the frustum with given `z_near` and `z_far`, in local camera space.
- You can now add the
`build_directional_light_cascades::<MyCustomProjection>` system in
`SimulationLightSystems::UpdateDirectionalLightCascades` after
`clear_directional_light_cascades` for your projection to work with
directional lights.

---------

Co-authored-by: Carter Anderson <mcanders1@gmail.com>

Update default `ClearColor` to better match Bevy's branding (bevyengine#10339)

- Changes the default clear color to match the code block color on
Bevy's website.

- Changed the clear color, updated text in examples to ensure adequate
contrast. Inconsistent usage of white text color set to use the default
color instead, which is already white.
- Additionally, updated the `3d_scene` example to make it look a bit
better, and use bevy's branding colors.

![image](https://github.com/bevyengine/bevy/assets/2632925/540a22c0-826c-4c33-89aa-34905e3e313a)

Corrected incorrect doc comment on read_asset_bytes (bevyengine#10352)

Fixes bevyengine#10302

- Removed the incorrect comment.

Allow AccessKit to react to WindowEvents before they reach the engine (bevyengine#10356)

- Adopt bevyengine#10239 to get it in time for the release
- Fix accessibility on macOS and linux

- call `on_event` from AcccessKit adapter on winit events

---------

Co-authored-by: Nolan Darilek <nolan@thewordnerd.info>
Co-authored-by: Alice Cecile <alice.i.cecil@gmail.com>
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>

Fix typo in window.rs (bevyengine#10358)

Fixes a small typo in `bevy_window/src/window.rs`

Change `Should be used instead 'scale_factor' when set.` to `Should be
used instead of 'scale_factor' when set.`

Add UI Materials (bevyengine#9506)

- Add Ui Materials so that UI can render more complex and animated
widgets.
- Fixes bevyengine#5607

- Create a UiMaterial trait for specifying a Shader Asset and Bind Group
Layout/Data.
- Create a pipeline for rendering these Materials inside the Ui
layout/tree.
- Create a MaterialNodeBundle for simple spawning.

- Created a `UiMaterial` trait for specifying a Shader asset and Bind
Group.
- Created a `UiMaterialPipeline` for rendering said Materials.
- Added Example [`ui_material`
](https://github.com/MarkusTheOrt/bevy/blob/ui_material/examples/ui/ui_material.rs)
for example usage.
- Created
[`UiVertexOutput`](https://github.com/MarkusTheOrt/bevy/blob/ui_material/crates/bevy_ui/src/render/ui_vertex_output.wgsl)
export as VertexData for shaders.
- Created
[`material_ui`](https://github.com/MarkusTheOrt/bevy/blob/ui_material/crates/bevy_ui/src/render/ui_material.wgsl)
shader as default for both Vertex and Fragment shaders.

---------

Co-authored-by: ickshonpe <david.curthoys@googlemail.com>
Co-authored-by: François <mockersf@gmail.com>

support file operations in single threaded context (bevyengine#10312)

- Fixes bevyengine#10209
- Assets should work in single threaded

- In single threaded mode, don't use `async_fs` but fallback on
`std::fs` with a thin layer to mimic the async API
- file `file_asset.rs` is the async imps from `mod.rs`
- file `sync_file_asset.rs` is the same with `async_fs` APIs replaced by
`std::fs`
- which module is used depends on the `multi-threaded` feature

---------

Co-authored-by: Carter Anderson <mcanders1@gmail.com>

Fix gizmo crash when prepass enabled (bevyengine#10360)

- Fix gizmo crash when prepass enabled

- Add the prepass to the view key

Fixes: bevyengine#10347
ameknite pushed a commit to ameknite/bevy that referenced this pull request Nov 6, 2023
# Objective

Alternative to bevyengine#7310

## Solution

Implemented the suggestion from
bevyengine#7310 (comment)

I am guessing that these were originally split as an optimization, but I
am not sure since I believe the original author of the code is the one
speculating about combining them up there.

## Benchmarks

I ran three benchmarks to compare main, this PR, and the approach from
bevyengine#7310
([updated](https://github.com/rparrett/bevy/commits/rebased-parallel-check-visibility)
to the same commit on main).

This seems to perform slightly better than main in scenarios where most
entities have AABBs, and a bit worse when they don't (`many_lights`).
That seems to make sense to me.

Either way, the difference is ~-20 microseconds in the more common
scenarios or ~+100 microseconds in the less common scenario. I would
speculate that this might perform **very slightly** worse in
single-threaded scenarios.

Benches were run in release mode for 2000 frames while capturing a trace
with tracy.

| bench | commit | check_visibility_system mean μs |
| -- | -- | -- |
| many_cubes | main | 929.5 |
| many_cubes | this | 914.0 |
| many_cubes | 7310 | 1003.5 |
| | |
| many_foxes | main | 191.6 |
| many_foxes | this | 173.2 |
| many_foxes | 7310 | 167.9 |
| | |
| many_lights | main | 619.3 |
| many_lights | this | 703.7 |
| many_lights | 7310 | 842.5 |

## Notes

Technically this behaves slightly differently -- prior to this PR, view
visibility was determined even for entities without `GlobalTransform`. I
don't think this has any practical impact though.

IMO, I don't think we need to do this. But I opened a PR because it
seemed like the handiest way to share the code / benchmarks.

## TODO

I have done some rudimentary testing with the examples above, but I can
do some screenshot diffing if it seems like we want to do this.
rdrpenguin04 pushed a commit to rdrpenguin04/bevy that referenced this pull request Jan 9, 2024
# Objective

Alternative to bevyengine#7310

## Solution

Implemented the suggestion from
bevyengine#7310 (comment)

I am guessing that these were originally split as an optimization, but I
am not sure since I believe the original author of the code is the one
speculating about combining them up there.

## Benchmarks

I ran three benchmarks to compare main, this PR, and the approach from
bevyengine#7310
([updated](https://github.com/rparrett/bevy/commits/rebased-parallel-check-visibility)
to the same commit on main).

This seems to perform slightly better than main in scenarios where most
entities have AABBs, and a bit worse when they don't (`many_lights`).
That seems to make sense to me.

Either way, the difference is ~-20 microseconds in the more common
scenarios or ~+100 microseconds in the less common scenario. I would
speculate that this might perform **very slightly** worse in
single-threaded scenarios.

Benches were run in release mode for 2000 frames while capturing a trace
with tracy.

| bench | commit | check_visibility_system mean μs |
| -- | -- | -- |
| many_cubes | main | 929.5 |
| many_cubes | this | 914.0 |
| many_cubes | 7310 | 1003.5 |
| | |
| many_foxes | main | 191.6 |
| many_foxes | this | 173.2 |
| many_foxes | 7310 | 167.9 |
| | |
| many_lights | main | 619.3 |
| many_lights | this | 703.7 |
| many_lights | 7310 | 842.5 |

## Notes

Technically this behaves slightly differently -- prior to this PR, view
visibility was determined even for entities without `GlobalTransform`. I
don't think this has any practical impact though.

IMO, I don't think we need to do this. But I opened a PR because it
seemed like the handiest way to share the code / benchmarks.

## TODO

I have done some rudimentary testing with the examples above, but I can
do some screenshot diffing if it seems like we want to do this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants