Use multiple single threaded runtimes #23

programatik29 · 2021-11-11T20:49:22Z

With recent updates to tokio, CPU can't be utilized to 100% with tokio::spawn when using hyper. Performance of rewrk can be increased if multiple single threaded runtimes are used.

I think it is a good idea because all connection tasks spawned by rewrk are identical so work stealing benefits aren't much.

The text was updated successfully, but these errors were encountered:

programatik29 · 2021-11-13T15:02:41Z

This branch is using multiple single-threaded runtimes.

Results on my computer:

Multiple single threaded runtimes:

Beginning round 1...
Benchmarking 500 connections @ http://localhost:3000 for 10 second(s)
  Latencies:
    Avg      Stdev    Min      Max
    1.71ms   1.73ms   0.03ms   68.96ms
  Requests:
    Total: 2857417 Req/Sec: 286210.95
  Transfer:
    Total: 313.38 MB Transfer Rate: 31.39 MB/Sec

Current approach, regular tokio::spawn:

Beginning round 1...
Benchmarking 500 connections @ http://localhost:3000 for 10 second(s)
  Latencies:
    Avg      Stdev    Min      Max
    2.21ms   1.25ms   0.04ms   51.12ms
  Requests:
    Total: 2247341 Req/Sec: 225330.90
  Transfer:
    Total: 246.47 MB Transfer Rate: 24.71 MB/Sec

jschwe · 2022-12-14T14:15:49Z

When using tokio with our proposed BWoS workstealing queue I measured the following improvements for hypers throughput when changing the rewrk tokio queue backend. The following throughput measurements only modified rewrk and left hyper untouched (using rust-web-benchmarks).

500 connections

Rewrk 0.3.2 with BWoS queue:

Framework Name	Latency.Avg	Latency.Stdev	Latency.Min	Latency.Max	Request.Total	Request.Req/Sec	Transfer.Total	Transfer.Rate	Max. Memory Usage
hyper	0.89ms	0.40ms	0.03ms	212.56ms	16401497	546698.77	1.36GB	46.40MB/Sec	16.5MB

Rewrk 0.3.2 with original tokio (1.22):

Framework Name	Latency.Avg	Latency.Stdev	Latency.Min	Latency.Max	Request.Total	Request.Req/Sec	Transfer.Total	Transfer.Rate	Max. Memory Usage
hyper	1.04ms	0.44ms	0.03ms	9.12ms	14002501	466735.26	1.16GB	39.62MB/Sec	14.6MB

1000 connections

Rewrk 0.3.2 with BWoS queue:

Framework Name	Latency.Avg	Latency.Stdev	Latency.Min	Latency.Max	Request.Total	Request.Req/Sec	Transfer.Total	Transfer.Rate	Max. Memory Usage
hyper	1.18ms	0.69ms	0.03ms	20.85ms	24647183	821524.47	2.04GB	69.73MB/Sec	26.5MB

Rewrk 0.3.2 with original tokio (1.22):

Framework Name	Latency.Avg	Latency.Stdev	Latency.Min	Latency.Max	Request.Total	Request.Req/Sec	Transfer.Total	Transfer.Rate	Max. Memory Usage
hyper	1.28ms	0.53ms	0.05ms	9.23ms	23019349	767279.90	1.91GB	65.12MB/Sec	26.5MB

The throughput increase (if only rewrk is changed) is not as much was what you measured, but still I thought this might be interesting.

programatik29 · 2022-12-14T14:21:09Z

@jschwe Interesting... How does that compare to rewrk in this pull request?

jschwe · 2022-12-14T15:35:34Z

Ah, I did not see there was a pull request related to this issue.

I'll try this out tommorow.

Edit: Initial results do show that using multiple single threaded runtime still offers significant performance improvements. This is probably related to parking/unparking overhead. I'll post some more details tommorow.

ChillFish8 · 2022-12-14T21:16:02Z

It would be interesting to see, if the performance is close enough I'm somewhat tempted to go with the approach you mentioned over the single threaded runtimes just for convenience.

jschwe · 2022-12-16T14:40:28Z

I compared the single threaded approach with the original strategy and differen BWoS strategies on an x86 machine with 2 numa nodes. rewrk was bound to Numa node 1 (44 cores including hyperthreads) and the hyper benchmark was bound to Numa node 0. This is the benchmarking script I used.

It seems that the single threaded runtime does offer clear advantages, especially with less connections. This could be due to overhead from parking/unparking cores when there is not enough work, but I haven't investigated this further.

500 connections

Hyper tokio stealing strategy	Rewrk strategy	rewrk throughput (MB/s)
original	single_thread	65.65
original	original	37.29
original	bwos steal half	44.96
original	bwos_steal_block	45.78
original	bwos_steal_1	37.02
bwos_steal_half	single_thread	91.87
bwos_steal_block	single_thread	89.18
bwos_steal_block	bwos_steal_block	71.69

750 connections

Hyper tokio stealing strategy	Rewrk strategy	rewrk throughput (MB/s)
original	single_thread	71.81
original	original	52.08
original	bwos steal half	53.9
original	bwos_steal_block	55.41
original	bwos_steal_1	46.66
bwos_steal_half	single_thread	82.23
bwos_steal_block	single_thread	83.96
bwos_steal_block	bwos_steal_block	77.51

1000 connections

Hyper tokio stealing strategy	Rewrk strategy	rewrk throughput (MB/s)
original	single_thread	75.4
original	original	64.33
original	bwos steal half	63.21
original	bwos_steal_block	63.36
original	bwos_steal_1	56.37
bwos_steal_half	single_thread	81.54
bwos_steal_block	single_thread	82.48
bwos_steal_block	bwos_steal_block	79.37

programatik29 · 2022-12-18T17:23:34Z

@jschwe So is it free performance if this design gets merged into tokio?

jschwe · 2022-12-18T17:43:02Z

@programatik29 Our current proposal is for our queue to be an integrated as an alternative Backend, which could be selected via a runtime flavor. So this is not going to be a drop in change for now, but would require the downstream user to select a different flavor to get the benefits of the new queue. That's a very minor change required for downstream users though.

We do think our queue should be better in basically all scenarios, but there are situations where the queue is not the bottleneck, so switching it out wouldn't change much.

ChillFish8 · 2023-02-12T23:38:48Z

This behaviour will be implemented in rewrk-core, moving the CLI tool over to this will close this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use multiple single threaded runtimes #23

Use multiple single threaded runtimes #23

programatik29 commented Nov 11, 2021

programatik29 commented Nov 13, 2021

jschwe commented Dec 14, 2022 •

edited

Loading

programatik29 commented Dec 14, 2022

jschwe commented Dec 14, 2022 •

edited

Loading

ChillFish8 commented Dec 14, 2022

jschwe commented Dec 16, 2022

programatik29 commented Dec 18, 2022

jschwe commented Dec 18, 2022

ChillFish8 commented Feb 12, 2023

Use multiple single threaded runtimes #23

Use multiple single threaded runtimes #23

Comments

programatik29 commented Nov 11, 2021

programatik29 commented Nov 13, 2021

jschwe commented Dec 14, 2022 • edited Loading

500 connections

1000 connections

programatik29 commented Dec 14, 2022

jschwe commented Dec 14, 2022 • edited Loading

ChillFish8 commented Dec 14, 2022

jschwe commented Dec 16, 2022

500 connections

750 connections

1000 connections

programatik29 commented Dec 18, 2022

jschwe commented Dec 18, 2022

ChillFish8 commented Feb 12, 2023

jschwe commented Dec 14, 2022 •

edited

Loading

jschwe commented Dec 14, 2022 •

edited

Loading