Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Queue Phase parallelization and other small optimizations #4899

Closed
wants to merge 12 commits into from

Conversation

james7132
Copy link
Member

@james7132 james7132 commented Jun 2, 2022

Objective

Fixes #3548. A mutable reference to PipelineCache prevents a good chunk of the systems in queue from parallelizing with each other, even though it's primary use with Specialized*Pipelines<T> should rarely require mutation. Likewise &mut RenderPhase<I> on each render phase also requires exclusive access, which prohibits multiple systems from enqueuing phase items at the same time.

Solution

Selectively leverage internal mutability and thread-local in a way that avoids adding per-entity overheads.

RenderPhase

  • Use ThreadLocal inside RenderPhase to create thread local queues of phase items.
  • Add a phase_scope for enqueuing on each thread separately without locking overhead.
  • Collect all of the thread-local phase items into one vec before sorting.
  • Shrink the size of IDs used to reduce the amount of memory being shuffled around.
  • Use Vec::sort_unstable_by_key for a slight sorting speedup.
  • Change all of the &mut RenderPhase<T> to &RenderPhase<T>, allow for increased parallelism.
  • Can we just shove all of these into a std::collections::BinaryHeap and drain?

PipelineCache

  • Introduce LockablePipelineCache, a wrapper around RwLock<PipelineCache>.
  • Change Specialized*Pipelines to take a &LockablePipelineCache instead of &mut PipelineCache. Update systems to match.
  • Introduce two systems lock_pipeline_cache (in Extract) and unlock_pipeline_cache (in PhaseSort) that take the existing PipelineCache resource and wraps it into a LockablePipelineCache and vice versa. This allows Render phase draw functions to read from the cache without any contention, at the cost of one small command buffer.

Performance

I tested this on the default configuration of many_foxes, which has several heavy queue systems.

The direct effects are as expected. The queue phase results show a 30% speedup on my machine due to the increased parallelism. (yellow is this PR, red is main)

image

For sort phase, there is a slight regression due to the additional copy into the sorted vec before sorting. This is slightly alleviated by shrinking the draw function type sizes. This can be further addressed by using more optimized sorting algorithms (i.e. voracious).

image

Overall, this sees a rough 0.3ms improvement (2 FPS, 73 -> 75) improvement on my machine.

image

Future Work

The changes to enable internal mutability in a thread-safe manner can be extended to also allow internal parallelism in heavy queue tasks.


Changelog

TODO

Migration Guide

TODO

@james7132 james7132 added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide labels Jun 2, 2022
@james7132 james7132 requested a review from superdump June 9, 2022 12:16
bors bot pushed a commit that referenced this pull request Jun 23, 2022
# Objective

Partially addresses #4291.

Speed up the sort phase for unbatched render phases.

## Solution
Split out one of the optimizations in #4899 and allow implementors of `PhaseItem` to change what kind of sort is used when sorting the items in the phase. This currently includes Stable, Unstable, and Unsorted. Each of these corresponds to `Vec::sort_by_key`, `Vec::sort_unstable_by_key`, and no sorting at all. The default is `Unstable`. The last one can be used as a default if users introduce a preliminary depth prepass.

## Performance
This will not impact the performance of any batched phases, as it is still using a stable sort. 2D's only phase is unchanged. All 3D phases are unbatched currently, and will benefit from this change.

On `many_cubes`, where the primary phase is opaque, this change sees a speed up from 907.02us -> 477.62us, a 47.35% reduction.

![image](https://user-images.githubusercontent.com/3137680/174471253-22424874-30d5-4db5-b5b4-65fb2c612a9c.png)

## Future Work
There were prior discussions to add support for faster radix sorts in #4291, which in theory should be a `O(n)` instead of a `O(nlog(n))` time. [`voracious`](https://crates.io/crates/voracious_radix_sort) has been proposed, but it seems to be optimize for use cases with more than 30,000 items, which may be atypical for most systems.

Another optimization included in #4899 is to reduce the size of a few of the IDs commonly used in `PhaseItem` implementations to shrink the types to make swapping/sorting faster. Both `CachedPipelineId` and `DrawFunctionId` could be reduced to `u32` instead of `usize`.

Ideally, this should automatically change to use stable sorts when `BatchedPhaseItem` is implemented on the same phase item type, but this requires specialization, which may not land in stable Rust for a short while.

---

## Changelog
Added: `PhaseItem::sort`

## Migration Guide
RenderPhases now default to a unstable sort (via `slice::sort_unstable_by_key`). This can typically improve sort phase performance, but may produce incorrect batching results when implementing `BatchedPhaseItem`. To revert to the older stable sort, manually implement `PhaseItem::sort` to implement a stable sort (i.e. via `slice::sort_by_key`).

Co-authored-by: Federico Rinaldi <gisquerin@gmail.com>
Co-authored-by: Robert Swain <robert.swain@gmail.com>
Co-authored-by: colepoirier <colepoirier@gmail.com>
inodentry pushed a commit to IyesGames/bevy that referenced this pull request Aug 8, 2022
# Objective

Partially addresses bevyengine#4291.

Speed up the sort phase for unbatched render phases.

## Solution
Split out one of the optimizations in bevyengine#4899 and allow implementors of `PhaseItem` to change what kind of sort is used when sorting the items in the phase. This currently includes Stable, Unstable, and Unsorted. Each of these corresponds to `Vec::sort_by_key`, `Vec::sort_unstable_by_key`, and no sorting at all. The default is `Unstable`. The last one can be used as a default if users introduce a preliminary depth prepass.

## Performance
This will not impact the performance of any batched phases, as it is still using a stable sort. 2D's only phase is unchanged. All 3D phases are unbatched currently, and will benefit from this change.

On `many_cubes`, where the primary phase is opaque, this change sees a speed up from 907.02us -> 477.62us, a 47.35% reduction.

![image](https://user-images.githubusercontent.com/3137680/174471253-22424874-30d5-4db5-b5b4-65fb2c612a9c.png)

## Future Work
There were prior discussions to add support for faster radix sorts in bevyengine#4291, which in theory should be a `O(n)` instead of a `O(nlog(n))` time. [`voracious`](https://crates.io/crates/voracious_radix_sort) has been proposed, but it seems to be optimize for use cases with more than 30,000 items, which may be atypical for most systems.

Another optimization included in bevyengine#4899 is to reduce the size of a few of the IDs commonly used in `PhaseItem` implementations to shrink the types to make swapping/sorting faster. Both `CachedPipelineId` and `DrawFunctionId` could be reduced to `u32` instead of `usize`.

Ideally, this should automatically change to use stable sorts when `BatchedPhaseItem` is implemented on the same phase item type, but this requires specialization, which may not land in stable Rust for a short while.

---

## Changelog
Added: `PhaseItem::sort`

## Migration Guide
RenderPhases now default to a unstable sort (via `slice::sort_unstable_by_key`). This can typically improve sort phase performance, but may produce incorrect batching results when implementing `BatchedPhaseItem`. To revert to the older stable sort, manually implement `PhaseItem::sort` to implement a stable sort (i.e. via `slice::sort_by_key`).

Co-authored-by: Federico Rinaldi <gisquerin@gmail.com>
Co-authored-by: Robert Swain <robert.swain@gmail.com>
Co-authored-by: colepoirier <colepoirier@gmail.com>
@Weibye Weibye added the S-Adopt-Me The original PR author has no intent to complete this work. Pick me up! label Aug 10, 2022
@james7132 james7132 removed the S-Adopt-Me The original PR author has no intent to complete this work. Pick me up! label Sep 14, 2022
james7132 added a commit to james7132/bevy that referenced this pull request Oct 28, 2022
# Objective

Partially addresses bevyengine#4291.

Speed up the sort phase for unbatched render phases.

## Solution
Split out one of the optimizations in bevyengine#4899 and allow implementors of `PhaseItem` to change what kind of sort is used when sorting the items in the phase. This currently includes Stable, Unstable, and Unsorted. Each of these corresponds to `Vec::sort_by_key`, `Vec::sort_unstable_by_key`, and no sorting at all. The default is `Unstable`. The last one can be used as a default if users introduce a preliminary depth prepass.

## Performance
This will not impact the performance of any batched phases, as it is still using a stable sort. 2D's only phase is unchanged. All 3D phases are unbatched currently, and will benefit from this change.

On `many_cubes`, where the primary phase is opaque, this change sees a speed up from 907.02us -> 477.62us, a 47.35% reduction.

![image](https://user-images.githubusercontent.com/3137680/174471253-22424874-30d5-4db5-b5b4-65fb2c612a9c.png)

## Future Work
There were prior discussions to add support for faster radix sorts in bevyengine#4291, which in theory should be a `O(n)` instead of a `O(nlog(n))` time. [`voracious`](https://crates.io/crates/voracious_radix_sort) has been proposed, but it seems to be optimize for use cases with more than 30,000 items, which may be atypical for most systems.

Another optimization included in bevyengine#4899 is to reduce the size of a few of the IDs commonly used in `PhaseItem` implementations to shrink the types to make swapping/sorting faster. Both `CachedPipelineId` and `DrawFunctionId` could be reduced to `u32` instead of `usize`.

Ideally, this should automatically change to use stable sorts when `BatchedPhaseItem` is implemented on the same phase item type, but this requires specialization, which may not land in stable Rust for a short while.

---

## Changelog
Added: `PhaseItem::sort`

## Migration Guide
RenderPhases now default to a unstable sort (via `slice::sort_unstable_by_key`). This can typically improve sort phase performance, but may produce incorrect batching results when implementing `BatchedPhaseItem`. To revert to the older stable sort, manually implement `PhaseItem::sort` to implement a stable sort (i.e. via `slice::sort_by_key`).

Co-authored-by: Federico Rinaldi <gisquerin@gmail.com>
Co-authored-by: Robert Swain <robert.swain@gmail.com>
Co-authored-by: colepoirier <colepoirier@gmail.com>
bors bot pushed a commit that referenced this pull request Dec 20, 2022
# Objective
This includes one part of #4899. The aim is to improve CPU-side rendering performance by reducing the memory footprint and bandwidth required.

## Solution
Shrink `DrawFunctionId` to `u32`. Enforce that `u32 as usize` conversions are always safe by forbidding compilation on 16-bit platforms. This shouldn't be a breaking change since #4736 disabled compilation of `bevy_ecs` on those platforms.

Shrinking `DrawFunctionId` shrinks all of the `PhaseItem` types, which is integral to sort and render phase performance.

Testing against `many_cubes`, the sort phase improved by 22% (174.21us -> 141.76us per frame).

![image](https://user-images.githubusercontent.com/3137680/207345422-a512b4cf-1680-46e0-9973-ea72494ebdfe.png)

The main opaque pass also imrproved by 9% (5.49ms -> 5.03ms)

![image](https://user-images.githubusercontent.com/3137680/207346436-cbee7209-6450-4964-b566-0b64cfa4b4ea.png)

Overall frame time improved by 5% (14.85ms -> 14.09ms)

![image](https://user-images.githubusercontent.com/3137680/207346895-9de8676b-ef37-4cb9-8445-8493f5f90003.png)

There will be a followup PR that likewise shrinks `CachedRenderPipelineId` which should yield similar results on top of these improvements.
bors bot pushed a commit that referenced this pull request Dec 20, 2022
# Objective
This includes one part of #4899. The aim is to improve CPU-side rendering performance by reducing the memory footprint and bandwidth required.

## Solution
Shrink `DrawFunctionId` to `u32`. Enforce that `u32 as usize` conversions are always safe by forbidding compilation on 16-bit platforms. This shouldn't be a breaking change since #4736 disabled compilation of `bevy_ecs` on those platforms.

Shrinking `DrawFunctionId` shrinks all of the `PhaseItem` types, which is integral to sort and render phase performance.

Testing against `many_cubes`, the sort phase improved by 22% (174.21us -> 141.76us per frame).

![image](https://user-images.githubusercontent.com/3137680/207345422-a512b4cf-1680-46e0-9973-ea72494ebdfe.png)

The main opaque pass also imrproved by 9% (5.49ms -> 5.03ms)

![image](https://user-images.githubusercontent.com/3137680/207346436-cbee7209-6450-4964-b566-0b64cfa4b4ea.png)

Overall frame time improved by 5% (14.85ms -> 14.09ms)

![image](https://user-images.githubusercontent.com/3137680/207346895-9de8676b-ef37-4cb9-8445-8493f5f90003.png)

There will be a followup PR that likewise shrinks `CachedRenderPipelineId` which should yield similar results on top of these improvements.
bors bot pushed a commit that referenced this pull request Dec 20, 2022
# Objective
This includes one part of #4899. The aim is to improve CPU-side rendering performance by reducing the memory footprint and bandwidth required.

## Solution
Shrink `DrawFunctionId` to `u32`. Enforce that `u32 as usize` conversions are always safe by forbidding compilation on 16-bit platforms. This shouldn't be a breaking change since #4736 disabled compilation of `bevy_ecs` on those platforms.

Shrinking `DrawFunctionId` shrinks all of the `PhaseItem` types, which is integral to sort and render phase performance.

Testing against `many_cubes`, the sort phase improved by 22% (174.21us -> 141.76us per frame).

![image](https://user-images.githubusercontent.com/3137680/207345422-a512b4cf-1680-46e0-9973-ea72494ebdfe.png)

The main opaque pass also imrproved by 9% (5.49ms -> 5.03ms)

![image](https://user-images.githubusercontent.com/3137680/207346436-cbee7209-6450-4964-b566-0b64cfa4b4ea.png)

Overall frame time improved by 5% (14.85ms -> 14.09ms)

![image](https://user-images.githubusercontent.com/3137680/207346895-9de8676b-ef37-4cb9-8445-8493f5f90003.png)

There will be a followup PR that likewise shrinks `CachedRenderPipelineId` which should yield similar results on top of these improvements.
bors bot pushed a commit that referenced this pull request Dec 20, 2022
# Objective
This includes one part of #4899. The aim is to improve CPU-side rendering performance by reducing the memory footprint and bandwidth required.

## Solution
Shrink `DrawFunctionId` to `u32`. Enforce that `u32 as usize` conversions are always safe by forbidding compilation on 16-bit platforms. This shouldn't be a breaking change since #4736 disabled compilation of `bevy_ecs` on those platforms.

Shrinking `DrawFunctionId` shrinks all of the `PhaseItem` types, which is integral to sort and render phase performance.

Testing against `many_cubes`, the sort phase improved by 22% (174.21us -> 141.76us per frame).

![image](https://user-images.githubusercontent.com/3137680/207345422-a512b4cf-1680-46e0-9973-ea72494ebdfe.png)

The main opaque pass also imrproved by 9% (5.49ms -> 5.03ms)

![image](https://user-images.githubusercontent.com/3137680/207346436-cbee7209-6450-4964-b566-0b64cfa4b4ea.png)

Overall frame time improved by 5% (14.85ms -> 14.09ms)

![image](https://user-images.githubusercontent.com/3137680/207346895-9de8676b-ef37-4cb9-8445-8493f5f90003.png)

There will be a followup PR that likewise shrinks `CachedRenderPipelineId` which should yield similar results on top of these improvements.
bors bot pushed a commit that referenced this pull request Dec 20, 2022
# Objective
This includes one part of #4899. The aim is to improve CPU-side rendering performance by reducing the memory footprint and bandwidth required.

## Solution
Shrink `DrawFunctionId` to `u32`. Enforce that `u32 as usize` conversions are always safe by forbidding compilation on 16-bit platforms. This shouldn't be a breaking change since #4736 disabled compilation of `bevy_ecs` on those platforms.

Shrinking `DrawFunctionId` shrinks all of the `PhaseItem` types, which is integral to sort and render phase performance.

Testing against `many_cubes`, the sort phase improved by 22% (174.21us -> 141.76us per frame).

![image](https://user-images.githubusercontent.com/3137680/207345422-a512b4cf-1680-46e0-9973-ea72494ebdfe.png)

The main opaque pass also imrproved by 9% (5.49ms -> 5.03ms)

![image](https://user-images.githubusercontent.com/3137680/207346436-cbee7209-6450-4964-b566-0b64cfa4b4ea.png)

Overall frame time improved by 5% (14.85ms -> 14.09ms)

![image](https://user-images.githubusercontent.com/3137680/207346895-9de8676b-ef37-4cb9-8445-8493f5f90003.png)

There will be a followup PR that likewise shrinks `CachedRenderPipelineId` which should yield similar results on top of these improvements.
@alice-i-cecile
Copy link
Member

Closing in favor of #7205.

@james7132
Copy link
Member Author

There's still the other half of this that splays the RenderPhase into thread local queues. I'll open a separate PR for that.

alradish pushed a commit to alradish/bevy that referenced this pull request Jan 22, 2023
# Objective
This includes one part of bevyengine#4899. The aim is to improve CPU-side rendering performance by reducing the memory footprint and bandwidth required.

## Solution
Shrink `DrawFunctionId` to `u32`. Enforce that `u32 as usize` conversions are always safe by forbidding compilation on 16-bit platforms. This shouldn't be a breaking change since bevyengine#4736 disabled compilation of `bevy_ecs` on those platforms.

Shrinking `DrawFunctionId` shrinks all of the `PhaseItem` types, which is integral to sort and render phase performance.

Testing against `many_cubes`, the sort phase improved by 22% (174.21us -> 141.76us per frame).

![image](https://user-images.githubusercontent.com/3137680/207345422-a512b4cf-1680-46e0-9973-ea72494ebdfe.png)

The main opaque pass also imrproved by 9% (5.49ms -> 5.03ms)

![image](https://user-images.githubusercontent.com/3137680/207346436-cbee7209-6450-4964-b566-0b64cfa4b4ea.png)

Overall frame time improved by 5% (14.85ms -> 14.09ms)

![image](https://user-images.githubusercontent.com/3137680/207346895-9de8676b-ef37-4cb9-8445-8493f5f90003.png)

There will be a followup PR that likewise shrinks `CachedRenderPipelineId` which should yield similar results on top of these improvements.
ItsDoot pushed a commit to ItsDoot/bevy that referenced this pull request Feb 1, 2023
# Objective

Partially addresses bevyengine#4291.

Speed up the sort phase for unbatched render phases.

## Solution
Split out one of the optimizations in bevyengine#4899 and allow implementors of `PhaseItem` to change what kind of sort is used when sorting the items in the phase. This currently includes Stable, Unstable, and Unsorted. Each of these corresponds to `Vec::sort_by_key`, `Vec::sort_unstable_by_key`, and no sorting at all. The default is `Unstable`. The last one can be used as a default if users introduce a preliminary depth prepass.

## Performance
This will not impact the performance of any batched phases, as it is still using a stable sort. 2D's only phase is unchanged. All 3D phases are unbatched currently, and will benefit from this change.

On `many_cubes`, where the primary phase is opaque, this change sees a speed up from 907.02us -> 477.62us, a 47.35% reduction.

![image](https://user-images.githubusercontent.com/3137680/174471253-22424874-30d5-4db5-b5b4-65fb2c612a9c.png)

## Future Work
There were prior discussions to add support for faster radix sorts in bevyengine#4291, which in theory should be a `O(n)` instead of a `O(nlog(n))` time. [`voracious`](https://crates.io/crates/voracious_radix_sort) has been proposed, but it seems to be optimize for use cases with more than 30,000 items, which may be atypical for most systems.

Another optimization included in bevyengine#4899 is to reduce the size of a few of the IDs commonly used in `PhaseItem` implementations to shrink the types to make swapping/sorting faster. Both `CachedPipelineId` and `DrawFunctionId` could be reduced to `u32` instead of `usize`.

Ideally, this should automatically change to use stable sorts when `BatchedPhaseItem` is implemented on the same phase item type, but this requires specialization, which may not land in stable Rust for a short while.

---

## Changelog
Added: `PhaseItem::sort`

## Migration Guide
RenderPhases now default to a unstable sort (via `slice::sort_unstable_by_key`). This can typically improve sort phase performance, but may produce incorrect batching results when implementing `BatchedPhaseItem`. To revert to the older stable sort, manually implement `PhaseItem::sort` to implement a stable sort (i.e. via `slice::sort_by_key`).

Co-authored-by: Federico Rinaldi <gisquerin@gmail.com>
Co-authored-by: Robert Swain <robert.swain@gmail.com>
Co-authored-by: colepoirier <colepoirier@gmail.com>
ItsDoot pushed a commit to ItsDoot/bevy that referenced this pull request Feb 1, 2023
# Objective
This includes one part of bevyengine#4899. The aim is to improve CPU-side rendering performance by reducing the memory footprint and bandwidth required.

## Solution
Shrink `DrawFunctionId` to `u32`. Enforce that `u32 as usize` conversions are always safe by forbidding compilation on 16-bit platforms. This shouldn't be a breaking change since bevyengine#4736 disabled compilation of `bevy_ecs` on those platforms.

Shrinking `DrawFunctionId` shrinks all of the `PhaseItem` types, which is integral to sort and render phase performance.

Testing against `many_cubes`, the sort phase improved by 22% (174.21us -> 141.76us per frame).

![image](https://user-images.githubusercontent.com/3137680/207345422-a512b4cf-1680-46e0-9973-ea72494ebdfe.png)

The main opaque pass also imrproved by 9% (5.49ms -> 5.03ms)

![image](https://user-images.githubusercontent.com/3137680/207346436-cbee7209-6450-4964-b566-0b64cfa4b4ea.png)

Overall frame time improved by 5% (14.85ms -> 14.09ms)

![image](https://user-images.githubusercontent.com/3137680/207346895-9de8676b-ef37-4cb9-8445-8493f5f90003.png)

There will be a followup PR that likewise shrinks `CachedRenderPipelineId` which should yield similar results on top of these improvements.
github-merge-queue bot pushed a commit that referenced this pull request Feb 19, 2024
# Objective
There's a repeating pattern of `ThreadLocal<Cell<Vec<T>>>` which is very
useful for low overhead, low contention multithreaded queues that have
cropped up in a few places in the engine. This pattern is surprisingly
useful when building deferred mutation across multiple threads, as noted
by it's use in `ParallelCommands`.

However, `ThreadLocal<Cell<Vec<T>>>` is not only a mouthful, it's also
hard to ensure the thread-local queue is replaced after it's been
temporarily removed from the `Cell`.

## Solution
Wrap the pattern into `bevy_utils::Parallel<T>` which codifies the
entire pattern and ensures the user follows the contract. Instead of
fetching indivdual cells, removing the value, mutating it, and replacing
it, `Parallel::get` returns a `ParRef<'a, T>` which contains the
temporarily removed value and a reference back to the cell, and will
write the mutated value back to the cell upon being dropped.

I would like to use this to simplify the remaining part of #4899 that
has not been adopted/merged.

---

## Changelog
TODO

---------

Co-authored-by: Joseph <21144246+JoJoJet@users.noreply.github.com>
msvbg pushed a commit to msvbg/bevy that referenced this pull request Feb 26, 2024
# Objective
There's a repeating pattern of `ThreadLocal<Cell<Vec<T>>>` which is very
useful for low overhead, low contention multithreaded queues that have
cropped up in a few places in the engine. This pattern is surprisingly
useful when building deferred mutation across multiple threads, as noted
by it's use in `ParallelCommands`.

However, `ThreadLocal<Cell<Vec<T>>>` is not only a mouthful, it's also
hard to ensure the thread-local queue is replaced after it's been
temporarily removed from the `Cell`.

## Solution
Wrap the pattern into `bevy_utils::Parallel<T>` which codifies the
entire pattern and ensures the user follows the contract. Instead of
fetching indivdual cells, removing the value, mutating it, and replacing
it, `Parallel::get` returns a `ParRef<'a, T>` which contains the
temporarily removed value and a reference back to the cell, and will
write the mutated value back to the cell upon being dropped.

I would like to use this to simplify the remaining part of bevyengine#4899 that
has not been adopted/merged.

---

## Changelog
TODO

---------

Co-authored-by: Joseph <21144246+JoJoJet@users.noreply.github.com>
msvbg pushed a commit to msvbg/bevy that referenced this pull request Feb 26, 2024
# Objective
There's a repeating pattern of `ThreadLocal<Cell<Vec<T>>>` which is very
useful for low overhead, low contention multithreaded queues that have
cropped up in a few places in the engine. This pattern is surprisingly
useful when building deferred mutation across multiple threads, as noted
by it's use in `ParallelCommands`.

However, `ThreadLocal<Cell<Vec<T>>>` is not only a mouthful, it's also
hard to ensure the thread-local queue is replaced after it's been
temporarily removed from the `Cell`.

## Solution
Wrap the pattern into `bevy_utils::Parallel<T>` which codifies the
entire pattern and ensures the user follows the contract. Instead of
fetching indivdual cells, removing the value, mutating it, and replacing
it, `Parallel::get` returns a `ParRef<'a, T>` which contains the
temporarily removed value and a reference back to the cell, and will
write the mutated value back to the cell upon being dropped.

I would like to use this to simplify the remaining part of bevyengine#4899 that
has not been adopted/merged.

---

## Changelog
TODO

---------

Co-authored-by: Joseph <21144246+JoJoJet@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Requiring mutable access to several common shared types prevents queue phase parallelization.
5 participants