Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Options for disabling all parallelism for single-threaded performance #6689

Closed
james7132 opened this issue Nov 19, 2022 · 4 comments · Fixed by #6690
Closed

Options for disabling all parallelism for single-threaded performance #6689

james7132 opened this issue Nov 19, 2022 · 4 comments · Fixed by #6690
Labels
A-ECS Entities, components, systems, and events A-Tasks Tools for parallel and async work C-Feature A new feature, making something new possible C-Performance A change motivated by improving speed, memory usage or compile times
Milestone

Comments

@james7132
Copy link
Member

What problem does this solve or what need does it fill?

In certain environments, like low cost VPS game hosting, it is often more efficient to host multiple single threaded game instances instead of hosting. For these use cases, the additional synchronization overhead of many Send/Sync types can be quite high, particularly with atomics.

What solution would you like?

A feature flag on bevy_ecs and bevy_tasks to:

  • disable the ParallelExecutor as a default runner
  • the multi-threaded TaskPool
  • internally replace Query::par_for_each calls with for_each.
  • switch to using !Send or !Sync options of common types (i.e. Mutex -> RefCell, Arc -> Rc) (questionable if this is possible, we already avoid the use of these).

In these target environments, rendering is typically not required. Ideally this shouldn't require too much code change on the user's end other than changing some crate features so that code can be easily shared between client and server.

What alternative(s) have you considered?

Leave it as is, eat the cost of atomics in these systems.

Additional context

Original discussion: https://www.reddit.com/r/rust/comments/ytiv2a/comment/iw4q6ed/?utm_source=share&utm_medium=web2x&context=3

@james7132 james7132 added C-Feature A new feature, making something new possible A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times A-Tasks Tools for parallel and async work labels Nov 19, 2022
@alice-i-cecile
Copy link
Member

Strongly agree, I've wanted this for tests and parallel scientific computing workloads too.

A single global setting would be perfect.

@recatek
Copy link
Contributor

recatek commented Nov 19, 2022

For reference, here are numbers I captured on a single vCPU $5/mo Vultr VPS.

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              1
On-line CPU(s) list: 0
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               61
Model name:          Intel Core Processor (Broadwell, no TSX, IBRS)
Stepping:            2
CPU MHz:             2394.454
BogoMIPS:            4788.90
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            4096K
L3 cache:            16384K
NUMA node0 CPU(s):   0
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti ssbd ibrs ibpb fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat
simple_insert/naive		[141.58 µs 143.77 µs 146.19 µs]
simple_insert/legion		[416.46 µs 420.21 µs 424.61 µs]
simple_insert/bevy		[744.48 µs 747.96 µs 751.76 µs]
simple_insert/hecs		[349.97 µs 355.33 µs 361.15 µs]
simple_insert/planck_ecs	[514.21 µs 517.47 µs 521.31 µs]
simple_insert/shipyard		[788.10 µs 793.08 µs 798.77 µs]
simple_insert/specs		[2.5694 ms 2.5890 ms 2.6094 ms]

simple_iter/naive		[12.766 µs 12.818 µs 12.877 µs]
simple_iter/legion		[13.086 µs 13.124 µs 13.164 µs]
simple_iter/legion (packed)	[13.209 µs 13.316 µs 13.436 µs]
simple_iter/bevy		[24.174 µs 24.248 µs 24.332 µs]
simple_iter/hecs		[12.985 µs 13.095 µs 13.262 µs]
simple_iter/planck_ecs		[86.476 µs 86.705 µs 86.969 µs]
simple_iter/shipyard		[48.300 µs 48.642 µs 49.029 µs]
simple_iter/specs		[41.561 µs 41.694 µs 41.826 µs]

fragmented_iter/naive		[168.48 ns 169.44 ns 170.70 ns]
fragmented_iter/legion		[452.24 ns 453.31 ns 454.47 ns]
fragmented_iter/bevy		[1.5242 µs 1.5473 µs 1.5770 µs]
fragmented_iter/hecs		[592.52 ns 596.52 ns 601.09 ns]
fragmented_iter/planck_ecs	[2.7075 µs 2.7221 µs 2.7403 µs]
fragmented_iter/shipyard	[116.42 ns 117.45 ns 118.67 ns]
fragmented_iter/specs		[2.0717 µs 2.0798 µs 2.0891 µs]

schedule/naive			[26.381 µs 26.537 µs 26.751 µs]
schedule/legion			[38.080 µs 38.298 µs 38.544 µs]
schedule/legion (packed)	[37.693 µs 37.891 µs 38.131 µs]
schedule/bevy (manual)		[154.93 µs 155.42 µs 155.95 µs]
schedule/bevy (parallel)	[740.08 µs 756.14 µs 771.69 µs]
schedule/bevy (single)		[207.09 µs 207.76 µs 208.52 µs]
schedule/hecs (manual)		[54.626 µs 55.048 µs 55.562 µs]
schedule/planck_ecs		[621.18 µs 625.25 µs 630.15 µs]
schedule/shipyard		[310.94 µs 312.46 µs 314.15 µs]
schedule/specs			[250.86 µs 253.45 µs 258.18 µs]

heavy_compute/naive		[7.9072 ms 8.0099 ms 8.1350 ms]
heavy_compute/legion		[5.1729 ms 5.2026 ms 5.2359 ms]
heavy_compute/legion (packed)	[5.2143 ms 5.2452 ms 5.2780 ms]
heavy_compute/bevy		[7.6051 ms 7.6712 ms 7.7331 ms]
heavy_compute/hecs		[5.2735 ms 5.3189 ms 5.3700 ms]
heavy_compute/shipyard		[5.2730 ms 5.3152 ms 5.3694 ms]
heavy_compute/specs		[5.1762 ms 5.2146 ms 5.2572 ms]

add_remove_component/legion	[4.0691 ms 4.0942 ms 4.1229 ms]
add_remove_component/hecs	[797.80 µs 799.87 µs 801.92 µs]
add_remove_component/planck_ecs	[74.963 µs 75.710 µs 76.702 µs]
add_remove_component/shipyard	[159.94 µs 160.66 µs 161.46 µs]
add_remove_component/specs	[99.491 µs 100.71 µs 102.54 µs]
add_remove_component/bevy	[1.7468 ms 1.7634 ms 1.7841 ms]

Using my fork of ecs_bench_suite here with Bevy 0.9: https://github.com/recatek/ecs_bench_suite/tree/9caa6e401e393267f0d7539eaf53c27e03d3aa88

Benchmark upgrades are a best guess, but if anyone has recommendations for more accurate results on this let me know.

@Guvante
Copy link
Contributor

Guvante commented Jan 15, 2023

I saw a comment somewhere saying something like this can eliminate tracking access. While care would be taken to avoid accidentally introducing ambiguity in systems it would be neat to be able to avoid tracking access metadata everywhere since it is all single threaded anyway.

@james7132 james7132 added this to the 0.11 milestone Mar 4, 2023
@jdobrzanski
Copy link

Much needed feature for me running deterministic simulations, hope it works out!

@alice-i-cecile alice-i-cecile modified the milestones: 0.11, 0.12 Jun 19, 2023
github-merge-queue bot pushed a commit that referenced this issue Jul 9, 2023
# Objective
Fixes #6689.

## Solution
Add `single-threaded` as an optional non-default feature to `bevy_ecs`
and `bevy_tasks` that:
 
 - disable the `ParallelExecutor` as a default runner
 - disables the multi-threaded `TaskPool`
- internally replace `QueryParIter::for_each` calls with
`Query::for_each`.

Removed the `Mutex` and `Arc` usage in the single-threaded task pool.


![image](https://user-images.githubusercontent.com/3137680/202833253-dd2d520f-75e6-4c7b-be2d-5ce1523cbd38.png)

## Future Work/TODO
Create type aliases for `Mutex`, `Arc` that change to single-threaaded
equivalents where possible.

---

## Changelog
Added: Optional default feature `multi-theaded` to that enables
multithreaded parallelism in the engine. Disabling it disables all
multithreading in exchange for higher single threaded performance. Does
nothing on WASM targets.

---------

Co-authored-by: Carter Anderson <mcanders1@gmail.com>
@cart cart closed this as completed in #6690 Jul 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events A-Tasks Tools for parallel and async work C-Feature A new feature, making something new possible C-Performance A change motivated by improving speed, memory usage or compile times
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants