Options for disabling all parallelism for single-threaded performance #6689

james7132 · 2022-11-19T02:51:12Z

What problem does this solve or what need does it fill?

In certain environments, like low cost VPS game hosting, it is often more efficient to host multiple single threaded game instances instead of hosting. For these use cases, the additional synchronization overhead of many Send/Sync types can be quite high, particularly with atomics.

What solution would you like?

A feature flag on bevy_ecs and bevy_tasks to:

disable the ParallelExecutor as a default runner
the multi-threaded TaskPool
internally replace Query::par_for_each calls with for_each.
switch to using !Send or !Sync options of common types (i.e. Mutex -> RefCell, Arc -> Rc) (questionable if this is possible, we already avoid the use of these).

In these target environments, rendering is typically not required. Ideally this shouldn't require too much code change on the user's end other than changing some crate features so that code can be easily shared between client and server.

What alternative(s) have you considered?

Leave it as is, eat the cost of atomics in these systems.

Additional context

Original discussion: https://www.reddit.com/r/rust/comments/ytiv2a/comment/iw4q6ed/?utm_source=share&utm_medium=web2x&context=3

The text was updated successfully, but these errors were encountered:

alice-i-cecile · 2022-11-19T02:56:33Z

Strongly agree, I've wanted this for tests and parallel scientific computing workloads too.

A single global setting would be perfect.

recatek · 2022-11-19T05:47:58Z

For reference, here are numbers I captured on a single vCPU $5/mo Vultr VPS.

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              1
On-line CPU(s) list: 0
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               61
Model name:          Intel Core Processor (Broadwell, no TSX, IBRS)
Stepping:            2
CPU MHz:             2394.454
BogoMIPS:            4788.90
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            4096K
L3 cache:            16384K
NUMA node0 CPU(s):   0
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti ssbd ibrs ibpb fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat

simple_insert/naive		[141.58 µs 143.77 µs 146.19 µs]
simple_insert/legion		[416.46 µs 420.21 µs 424.61 µs]
simple_insert/bevy		[744.48 µs 747.96 µs 751.76 µs]
simple_insert/hecs		[349.97 µs 355.33 µs 361.15 µs]
simple_insert/planck_ecs	[514.21 µs 517.47 µs 521.31 µs]
simple_insert/shipyard		[788.10 µs 793.08 µs 798.77 µs]
simple_insert/specs		[2.5694 ms 2.5890 ms 2.6094 ms]

simple_iter/naive		[12.766 µs 12.818 µs 12.877 µs]
simple_iter/legion		[13.086 µs 13.124 µs 13.164 µs]
simple_iter/legion (packed)	[13.209 µs 13.316 µs 13.436 µs]
simple_iter/bevy		[24.174 µs 24.248 µs 24.332 µs]
simple_iter/hecs		[12.985 µs 13.095 µs 13.262 µs]
simple_iter/planck_ecs		[86.476 µs 86.705 µs 86.969 µs]
simple_iter/shipyard		[48.300 µs 48.642 µs 49.029 µs]
simple_iter/specs		[41.561 µs 41.694 µs 41.826 µs]

fragmented_iter/naive		[168.48 ns 169.44 ns 170.70 ns]
fragmented_iter/legion		[452.24 ns 453.31 ns 454.47 ns]
fragmented_iter/bevy		[1.5242 µs 1.5473 µs 1.5770 µs]
fragmented_iter/hecs		[592.52 ns 596.52 ns 601.09 ns]
fragmented_iter/planck_ecs	[2.7075 µs 2.7221 µs 2.7403 µs]
fragmented_iter/shipyard	[116.42 ns 117.45 ns 118.67 ns]
fragmented_iter/specs		[2.0717 µs 2.0798 µs 2.0891 µs]

schedule/naive			[26.381 µs 26.537 µs 26.751 µs]
schedule/legion			[38.080 µs 38.298 µs 38.544 µs]
schedule/legion (packed)	[37.693 µs 37.891 µs 38.131 µs]
schedule/bevy (manual)		[154.93 µs 155.42 µs 155.95 µs]
schedule/bevy (parallel)	[740.08 µs 756.14 µs 771.69 µs]
schedule/bevy (single)		[207.09 µs 207.76 µs 208.52 µs]
schedule/hecs (manual)		[54.626 µs 55.048 µs 55.562 µs]
schedule/planck_ecs		[621.18 µs 625.25 µs 630.15 µs]
schedule/shipyard		[310.94 µs 312.46 µs 314.15 µs]
schedule/specs			[250.86 µs 253.45 µs 258.18 µs]

heavy_compute/naive		[7.9072 ms 8.0099 ms 8.1350 ms]
heavy_compute/legion		[5.1729 ms 5.2026 ms 5.2359 ms]
heavy_compute/legion (packed)	[5.2143 ms 5.2452 ms 5.2780 ms]
heavy_compute/bevy		[7.6051 ms 7.6712 ms 7.7331 ms]
heavy_compute/hecs		[5.2735 ms 5.3189 ms 5.3700 ms]
heavy_compute/shipyard		[5.2730 ms 5.3152 ms 5.3694 ms]
heavy_compute/specs		[5.1762 ms 5.2146 ms 5.2572 ms]

add_remove_component/legion	[4.0691 ms 4.0942 ms 4.1229 ms]
add_remove_component/hecs	[797.80 µs 799.87 µs 801.92 µs]
add_remove_component/planck_ecs	[74.963 µs 75.710 µs 76.702 µs]
add_remove_component/shipyard	[159.94 µs 160.66 µs 161.46 µs]
add_remove_component/specs	[99.491 µs 100.71 µs 102.54 µs]
add_remove_component/bevy	[1.7468 ms 1.7634 ms 1.7841 ms]

Using my fork of ecs_bench_suite here with Bevy 0.9: https://github.com/recatek/ecs_bench_suite/tree/9caa6e401e393267f0d7539eaf53c27e03d3aa88

Benchmark upgrades are a best guess, but if anyone has recommendations for more accurate results on this let me know.

Guvante · 2023-01-15T17:32:00Z

I saw a comment somewhere saying something like this can eliminate tracking access. While care would be taken to avoid accidentally introducing ambiguity in systems it would be neat to be able to avoid tracking access metadata everywhere since it is all single threaded anyway.

jdobrzanski · 2023-05-18T16:18:12Z

Much needed feature for me running deterministic simulations, hope it works out!

# Objective Fixes #6689. ## Solution Add `single-threaded` as an optional non-default feature to `bevy_ecs` and `bevy_tasks` that: - disable the `ParallelExecutor` as a default runner - disables the multi-threaded `TaskPool` - internally replace `QueryParIter::for_each` calls with `Query::for_each`. Removed the `Mutex` and `Arc` usage in the single-threaded task pool. ![image](https://user-images.githubusercontent.com/3137680/202833253-dd2d520f-75e6-4c7b-be2d-5ce1523cbd38.png) ## Future Work/TODO Create type aliases for `Mutex`, `Arc` that change to single-threaaded equivalents where possible. --- ## Changelog Added: Optional default feature `multi-theaded` to that enables multithreaded parallelism in the engine. Disabling it disables all multithreading in exchange for higher single threaded performance. Does nothing on WASM targets. --------- Co-authored-by: Carter Anderson <mcanders1@gmail.com>

james7132 added C-Feature A new feature, making something new possible A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times A-Tasks Tools for parallel and async work labels Nov 19, 2022

james7132 mentioned this issue Nov 19, 2022

Add optional single-threaded feature to bevy_ecs/bevy_tasks #6690

Merged

james7132 added this to the 0.11 milestone Mar 4, 2023

alice-i-cecile modified the milestones: 0.11, 0.12 Jun 19, 2023

daxpedda mentioned this issue Jul 3, 2023

Don't implement Send or Sync on Wasm gfx-rs/wgpu#3691

Merged

3 tasks

cart closed this as completed in #6690 Jul 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Options for disabling all parallelism for single-threaded performance #6689

Options for disabling all parallelism for single-threaded performance #6689

james7132 commented Nov 19, 2022

alice-i-cecile commented Nov 19, 2022

recatek commented Nov 19, 2022 •

edited

Loading

Guvante commented Jan 15, 2023

jdobrzanski commented May 18, 2023

Options for disabling all parallelism for single-threaded performance #6689

Options for disabling all parallelism for single-threaded performance #6689

Comments

james7132 commented Nov 19, 2022

What problem does this solve or what need does it fill?

What solution would you like?

What alternative(s) have you considered?

Additional context

alice-i-cecile commented Nov 19, 2022

recatek commented Nov 19, 2022 • edited Loading

Guvante commented Jan 15, 2023

jdobrzanski commented May 18, 2023

recatek commented Nov 19, 2022 •

edited

Loading