refactor(bin): use 32k stream IO buffer #2008

mxinden · 2024-07-26T14:54:48Z

Firefox by default uses a 32k IO buffer for streams.

https://searchfox.org/mozilla-central/rev/f6e3b81aac49e602f06c204f9278da30993cdc8a/modules/libpref/init/all.js#3212

This commit makes neqo-bin use the same buffer size across http09/3 and client/server.

Along the way it consolidates various buffer logic and reuses buffers whereever feasible.

Firefox by default uses a 32k IO buffer for streams. https://searchfox.org/mozilla-central/rev/f6e3b81aac49e602f06c204f9278da30993cdc8a/modules/libpref/init/all.js#3212 This commit makes `neqo-bin` use the same buffer size across http09/3 and client/server. Along the way it consolidates various buffer logic and reuses buffers whereever feasible.

neqo-bin/src/server/http09.rs

neqo-bin/src/server/http3.rs

github-actions · 2024-07-26T15:10:07Z

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

aioquic vs. neqo-latest: C1
lsquic vs. neqo-latest: run cancelled after 20 min
mvfst vs. neqo-latest: Z 3 A L1 C1
ngtcp2 vs. neqo-latest: run cancelled after 20 min
quiche vs. neqo-latest: 3
quinn vs. neqo-latest: V2
xquic vs. neqo-latest: M

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: H DC LR C20 M S R 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. go-x-net: DC LR M B U A L2 C2 6
neqo-latest vs. haproxy: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. kwik: H DC LR C20 M S R 3 B U A L2 C1 C2 6 V2
neqo-latest vs. lsquic: H DC LR C20 M S R Z 3 B U E A L1 L2 C2 6 V2
neqo-latest vs. msquic: H DC LR C20 M S R B U L2 C1 C2 6 V2
neqo-latest vs. mvfst: H DC LR M R Z 3 B U L2 C2 6
neqo-latest vs. neqo: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. nginx: H DC LR C20 M S R Z 3 B U A L2 C2 6
neqo-latest vs. ngtcp2: H DC LR C20 M S R Z 3 B U E A L1 L2 C2 6 V2
neqo-latest vs. picoquic: H DC LR C20 M S R Z 3 B U E A L2 C2 6 V2
neqo-latest vs. quic-go: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. quiche: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. quinn: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. s2n-quic: H DC LR C20 M S 3 B U E A L1 L2 C1 C2 6
neqo-latest vs. xquic: H DC LR C20 M R Z 3 B U L2 C2 6

neqo-latest as server

aioquic vs. neqo-latest: H DC LR C20 M S R Z 3 B A L1 L2 C2 6 V2
chrome vs. neqo-latest: 3
go-x-net vs. neqo-latest: H DC LR M B U A L2 C2 6
kwik vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
msquic vs. neqo-latest: H DC LR C20 M S R Z B U A L1 L2 C1 C2 6 V2
mvfst vs. neqo-latest: H DC LR M B L2 C2 6
neqo vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
picoquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
quic-go vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
quiche vs. neqo-latest: H DC LR M S R Z B A L1 L2 C1 C2 6
quinn vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6
s2n-quic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6
xquic vs. neqo-latest: H DC LR C20 S R Z 3 B U A L1 L2 C1 C2 6

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: E
neqo-latest vs. go-x-net: C20 S R Z 3 E L1 C1 V2
neqo-latest vs. haproxy: E
neqo-latest vs. kwik: E
neqo-latest vs. msquic: 3 E
neqo-latest vs. mvfst: C20 S E V2
neqo-latest vs. nginx: E V2
neqo-latest vs. quic-go: E V2
neqo-latest vs. quiche: E V2
neqo-latest vs. quinn: V2
neqo-latest vs. s2n-quic: Z V2
neqo-latest vs. xquic: S E V2

neqo-latest as server

aioquic vs. neqo-latest: U E
chrome vs. neqo-latest: H DC LR C20 M S R Z B U E A L1 L2 C1 C2 6 V2
go-x-net vs. neqo-latest: C20 S R Z 3 E L1 C1 V2
kwik vs. neqo-latest: E
msquic vs. neqo-latest: 3 E
mvfst vs. neqo-latest: C20 S R U E V2
quic-go vs. neqo-latest: E V2
quiche vs. neqo-latest: C20 U E V2
s2n-quic vs. neqo-latest: C20 Z U V2
xquic vs. neqo-latest: E V2

codecov · 2024-07-26T15:19:17Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.36%. Comparing base (f801c29) to head (bad3673).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2008      +/-   ##
==========================================
- Coverage   95.36%   95.36%   -0.01%     
==========================================
  Files         112      112              
  Lines       36475    36475              
==========================================
- Hits        34784    34783       -1     
- Misses       1691     1692       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2024-07-26T15:35:42Z

Benchmark results

Performance differences relative to f801c29.

coalesce_acked_from_zero 1+1 entries: 💔 Performance has regressed.

       time:   [196.61 ns 197.05 ns 197.53 ns]
       change: [+1.4739% +1.8071% +2.1494%] (p = 0.00 < 0.05)
Found 16 outliers among 100 measurements (16.00%)

1 (1.00%) low mild

6 (6.00%) high mild

9 (9.00%) high severe

coalesce_acked_from_zero 3+1 entries: Change within noise threshold.

       time:   [237.21 ns 237.92 ns 238.71 ns]
       change: [+0.0346% +0.3689% +0.7299%] (p = 0.04 < 0.05)
Found 16 outliers among 100 measurements (16.00%)

16 (16.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.

       time:   [236.98 ns 238.05 ns 239.36 ns]
       change: [-0.1969% +0.3499% +0.9639%] (p = 0.23 > 0.05)
Found 9 outliers among 100 measurements (9.00%)

9 (9.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.

       time:   [215.66 ns 215.88 ns 216.12 ns]
       change: [-0.0640% +0.7259% +1.4315%] (p = 0.06 > 0.05)
Found 10 outliers among 100 measurements (10.00%)

4 (4.00%) high mild

6 (6.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.

       time:   [119.60 ms 119.68 ms 119.76 ms]
       change: [+0.8078% +0.9063% +1.0011%] (p = 0.00 < 0.05)
Found 2 outliers among 100 measurements (2.00%)

2 (2.00%) high mild

transfer/pacing-false/varying-seeds: No change in performance detected.

       time:   [40.423 ms 42.503 ms 44.609 ms]
       change: [-5.6161% +0.7236% +7.8640%] (p = 0.83 > 0.05)
Found 3 outliers among 100 measurements (3.00%)

1 (1.00%) low mild

2 (2.00%) high mild

transfer/pacing-true/varying-seeds: Change within noise threshold.

       time:   [57.763 ms 60.818 ms 63.932 ms]
       change: [+2.2889% +10.117% +18.409%] (p = 0.01 < 0.05)
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high mild

transfer/pacing-false/same-seed: No change in performance detected.

       time:   [47.965 ms 49.768 ms 51.492 ms]
       change: [-6.1545% -1.5034% +3.4935%] (p = 0.55 > 0.05)
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) low mild

transfer/pacing-true/same-seed: No change in performance detected.

       time:   [60.923 ms 67.805 ms 74.674 ms]
       change: [-18.847% -6.3929% +7.1145%] (p = 0.35 > 0.05)

1-conn/1-100mb-resp (aka. Download)/client: 💚 Performance has improved.

       time:   [126.55 ms 127.76 ms 129.52 ms]
       thrpt:  [772.10 MiB/s 782.71 MiB/s 790.18 MiB/s]
change:
       time:   [-27.403% -26.268% -24.915%] (p = 0.00 < 0.05)
       thrpt:  [+33.182% +35.626% +37.747%]
Found 8 outliers among 100 measurements (8.00%)

7 (7.00%) low mild

1 (1.00%) high severe

1-conn/10_000-parallel-1b-resp (aka. RPS)/client: No change in performance detected.

       time:   [408.49 ms 411.58 ms 414.65 ms]
       thrpt:  [24.117 Kelem/s 24.296 Kelem/s 24.481 Kelem/s]
change:
       time:   [-0.9293% +0.2571% +1.4515%] (p = 0.66 > 0.05)
       thrpt:  [-1.4308% -0.2564% +0.9380%]
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high mild

1-conn/1-1b-resp (aka. HPS)/client: No change in performance detected.

       time:   [46.648 ms 47.456 ms 48.297 ms]
       thrpt:  [20.705  elem/s 21.072  elem/s 21.437  elem/s]
change:
       time:   [-1.6077% +0.7764% +3.0788%] (p = 0.52 > 0.05)
       thrpt:  [-2.9869% -0.7704% +1.6340%]
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high mild

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client	Server	CC	Pacing	Mean [ms]	Min [ms]	Max [ms]	Relative
msquic	msquic			198.2 ± 84.1	98.9	322.2	1.00
neqo	msquic	reno	on	298.3 ± 71.2	259.3	466.2	1.00
neqo	msquic	reno		279.3 ± 25.0	247.5	332.7	1.00
neqo	msquic	cubic	on	315.0 ± 106.3	255.0	597.3	1.00
neqo	msquic	cubic		281.3 ± 42.2	250.4	398.3	1.00
msquic	neqo	reno	on	167.5 ± 108.4	85.6	530.2	1.00
msquic	neqo	reno		157.2 ± 91.3	84.4	359.1	1.00
msquic	neqo	cubic	on	135.0 ± 96.7	84.5	458.3	1.00
msquic	neqo	cubic		167.8 ± 96.0	85.4	346.6	1.00
neqo	neqo	reno	on	192.0 ± 66.5	151.9	363.2	1.00
neqo	neqo	reno		190.8 ± 68.6	151.6	407.3	1.00
neqo	neqo	cubic	on	258.4 ± 135.3	159.7	596.3	1.00
neqo	neqo	cubic		311.5 ± 175.9	173.7	664.5	1.00

⬇️ Download logs

github-actions · 2024-07-26T17:07:33Z

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

Linux: Debug Release
macOS: Debug Release
Windows: Debug Release

mxinden · 2024-07-29T09:12:36Z

Strange. "No change in performance detected", but a 97% increase in runtime. Will look into it now.

1-conn/1-100mb-resp (aka. Download)/client: No change in performance detected.

       time:   [358.86 ms 589.33 ms 967.64 ms]
       thrpt:  [103.34 MiB/s 169.68 MiB/s 278.66 MiB/s]
change:
       time:   [+21.333% +97.949% +256.03%] (p = 0.08 > 0.05)
       thrpt:  [-71.912% -49.482% -17.582%]
       No change
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

mxinden · 2024-08-04T09:01:29Z

Still debugging the performance regression. I expected a performance improvement.

Interesting difference in --stats of a ~1GB download with rx: 819060 on main and rx: 25662 on 32k.

branch main

9s627ms INFO stats for Client ...
  rx: 819060 drop 1 dup 0 saved 1
  tx: 119448 lost 14789 lateack 4 ptoack 28
  pmtud: 8 sent 6 acked 0 lost 0 change
  resumed: false
  frames rx:
    crypto 6 done 5 token 4 close 0
    ack 2449 (max 119418) ping 4 padding 2666
    stream 817514 reset 0 stop 0
    max: stream 0 data 0 stream_data 0
    blocked: stream 0 data 0 stream_data 1299
    datagram 0
    ncid 35 rcid 0 pchallenge 0 presponse 0
    ack_frequency 10
  frames tx:
    crypto 3 done 0 token 0 close 3
    ack 117360 (max 820038) ping 46 padding 8
    stream 8 reset 0 stop 0
    max: stream 0 data 0 stream_data 2521
    blocked: stream 0 data 0 stream_data 0
    datagram 0
    ncid 0 rcid 0 pchallenge 0 presponse 0
    ack_frequency 2

branch 32k

3s488ms INFO stats for Client ...
  rx: 25662 drop 1 dup 0 saved 1
  tx: 15781 lost 0 lateack 0 ptoack 187
  pmtud: 10 sent 10 acked 0 lost 0 change
  resumed: false
  frames rx:
    crypto 3 done 1 token 1 close 0
    ack 2811 (max 15765) ping 78 padding 375850
    stream 24043 reset 0 stop 0
    max: stream 0 data 0 stream_data 0
    blocked: stream 0 data 0 stream_data 2670
    datagram 0
    ncid 7 rcid 0 pchallenge 0 presponse 0
    ack_frequency 3
  frames tx:
    crypto 2 done 0 token 0 close 3
    ack 13998 (max 26763) ping 149 padding 10
    stream 5 reset 0 stop 0
    max: stream 0 data 0 stream_data 2764
    blocked: stream 0 data 0 stream_data 0
    datagram 0
    ncid 0 rcid 0 pchallenge 0 presponse 0
    ack_frequency 1

mxinden · 2024-08-04T09:24:49Z

Without pacing, this pull request outperforms main:

➜  neqo-bin git:(main) ✗ critcmp main main-no-pacing 32k 32k-no-pacing -f "Download" --list
1-conn/1-100mb-resp (aka. Download)/client
------------------------------------------
32k-no-pacing      1.00    114.8±49.78ms   871.3 MB/sec
main-no-pacing     1.05    121.0±83.97ms   826.4 MB/sec
main               1.45    165.9±56.69ms   602.8 MB/sec
32k               11.39  1307.8±1038.97ms   76.5 MB/sec

neqo-bin/src/lib.rs

mxinden · 2024-08-11T12:16:42Z

This is ready for a review. #2035 resolved the performance regressions. In addition, as expected, this now improves the Download benchmark:

1-conn/1-100mb-resp (aka. Download)/client: 💚 Performance has improved.
   time:   [126.55 ms 127.76 ms 129.52 ms]
   thrpt:  [772.10 MiB/s 782.71 MiB/s 790.18 MiB/s]
change:
time: [-27.403% -26.268% -24.915%] (p = 0.00 < 0.05)
thrpt: [+33.182% +35.626% +37.747%]

Note: The benefit of this pull request is not the performance improvement above, but the fact that the benchmark now executes closer to how Firefox uses neqo-*.

When POSTing a large request to a server, don't allocate the entire request upfront, but instead, as is done in `neqo-bin/src/server/mod.rs`, iterate over a static buffer. Reuses the same logic from `neqo-bin/src/server/mod.rs`, i.e. `SendData`. See previous similar change on server side mozilla#2008.

* bench(bin/client): don't allocate upload payload upfront When POSTing a large request to a server, don't allocate the entire request upfront, but instead, as is done in `neqo-bin/src/server/mod.rs`, iterate over a static buffer. Reuses the same logic from `neqo-bin/src/server/mod.rs`, i.e. `SendData`. See previous similar change on server side #2008. * Inline done()

mxinden commented Jul 26, 2024

View reviewed changes

neqo-bin/src/server/http09.rs Show resolved Hide resolved

neqo-bin/src/server/http3.rs Show resolved Hide resolved

Trigger benchmarks

9d7810f

mxinden mentioned this pull request Jul 29, 2024

bench(bin): remove sample-size override #2018

Merged

Merge branch 'main' of https://github.com/mozilla/neqo into 32k

78df44e

mxinden mentioned this pull request Aug 4, 2024

fix(transport): don't pace below timer granularity #2035

Merged

martinthomson reviewed Aug 4, 2024

View reviewed changes

neqo-bin/src/lib.rs Show resolved Hide resolved

Merge branch 'main' into 32k

bad3673

mxinden marked this pull request as ready for review August 11, 2024 12:15

mxinden requested review from KershawChang and larseggert as code owners August 11, 2024 12:15

larseggert enabled auto-merge August 12, 2024 10:41

larseggert approved these changes Aug 12, 2024

View reviewed changes

larseggert added this pull request to the merge queue Aug 12, 2024

github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Aug 12, 2024

larseggert added this pull request to the merge queue Aug 12, 2024

Merged via the queue into mozilla:main with commit 4a5a041 Aug 12, 2024
55 of 57 checks passed

mxinden mentioned this pull request Oct 25, 2024

bench(bin/client): don't allocate upload payload upfront #2200

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(bin): use 32k stream IO buffer #2008

refactor(bin): use 32k stream IO buffer #2008

mxinden commented Jul 26, 2024

github-actions bot commented Jul 26, 2024 •

edited

Loading

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

codecov bot commented Jul 26, 2024 •

edited

Loading

github-actions bot commented Jul 26, 2024 •

edited

Loading

github-actions bot commented Jul 26, 2024 •

edited

Loading

mxinden commented Jul 29, 2024

mxinden commented Aug 4, 2024 •

edited

Loading

mxinden commented Aug 4, 2024

mxinden commented Aug 11, 2024 •

edited

Loading

refactor(bin): use 32k stream IO buffer #2008

refactor(bin): use 32k stream IO buffer #2008

Conversation

mxinden commented Jul 26, 2024

github-actions bot commented Jul 26, 2024 • edited Loading

Failed Interop Tests

neqo-latest as client

neqo-latest as server

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

codecov bot commented Jul 26, 2024 • edited Loading

Codecov Report

github-actions bot commented Jul 26, 2024 • edited Loading

Benchmark results

Client/server transfer results

github-actions bot commented Jul 26, 2024 • edited Loading

Firefox builds for this PR

mxinden commented Jul 29, 2024

mxinden commented Aug 4, 2024 • edited Loading

mxinden commented Aug 4, 2024

mxinden commented Aug 11, 2024 • edited Loading

github-actions bot commented Jul 26, 2024 •

edited

Loading

codecov bot commented Jul 26, 2024 •

edited

Loading

github-actions bot commented Jul 26, 2024 •

edited

Loading

github-actions bot commented Jul 26, 2024 •

edited

Loading

mxinden commented Aug 4, 2024 •

edited

Loading

mxinden commented Aug 11, 2024 •

edited

Loading