Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(bin): use 32k stream IO buffer #2008

Merged
merged 4 commits into from
Aug 12, 2024
Merged

Conversation

mxinden
Copy link
Collaborator

@mxinden mxinden commented Jul 26, 2024

Firefox by default uses a 32k IO buffer for streams.

https://searchfox.org/mozilla-central/rev/f6e3b81aac49e602f06c204f9278da30993cdc8a/modules/libpref/init/all.js#3212

This commit makes neqo-bin use the same buffer size across http09/3 and client/server.

Along the way it consolidates various buffer logic and reuses buffers whereever feasible.

Firefox by default uses a 32k IO buffer for streams.

https://searchfox.org/mozilla-central/rev/f6e3b81aac49e602f06c204f9278da30993cdc8a/modules/libpref/init/all.js#3212

This commit makes `neqo-bin` use the same buffer size across http09/3 and
client/server.

Along the way it consolidates various buffer logic and reuses buffers whereever
feasible.
neqo-bin/src/server/http09.rs Show resolved Hide resolved
neqo-bin/src/server/http3.rs Show resolved Hide resolved
Copy link

github-actions bot commented Jul 26, 2024

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Copy link

codecov bot commented Jul 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.36%. Comparing base (f801c29) to head (bad3673).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2008      +/-   ##
==========================================
- Coverage   95.36%   95.36%   -0.01%     
==========================================
  Files         112      112              
  Lines       36475    36475              
==========================================
- Hits        34784    34783       -1     
- Misses       1691     1692       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Jul 26, 2024

Benchmark results

Performance differences relative to f801c29.

coalesce_acked_from_zero 1+1 entries: 💔 Performance has regressed.
       time:   [196.61 ns 197.05 ns 197.53 ns]
       change: [+1.4739% +1.8071% +2.1494%] (p = 0.00 < 0.05)

Found 16 outliers among 100 measurements (16.00%)
1 (1.00%) low mild
6 (6.00%) high mild
9 (9.00%) high severe

coalesce_acked_from_zero 3+1 entries: Change within noise threshold.
       time:   [237.21 ns 237.92 ns 238.71 ns]
       change: [+0.0346% +0.3689% +0.7299%] (p = 0.04 < 0.05)

Found 16 outliers among 100 measurements (16.00%)
16 (16.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [236.98 ns 238.05 ns 239.36 ns]
       change: [-0.1969% +0.3499% +0.9639%] (p = 0.23 > 0.05)

Found 9 outliers among 100 measurements (9.00%)
9 (9.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [215.66 ns 215.88 ns 216.12 ns]
       change: [-0.0640% +0.7259% +1.4315%] (p = 0.06 > 0.05)

Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [119.60 ms 119.68 ms 119.76 ms]
       change: [+0.8078% +0.9063% +1.0011%] (p = 0.00 < 0.05)

Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

transfer/pacing-false/varying-seeds: No change in performance detected.
       time:   [40.423 ms 42.503 ms 44.609 ms]
       change: [-5.6161% +0.7236% +7.8640%] (p = 0.83 > 0.05)

Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
2 (2.00%) high mild

transfer/pacing-true/varying-seeds: Change within noise threshold.
       time:   [57.763 ms 60.818 ms 63.932 ms]
       change: [+2.2889% +10.117% +18.409%] (p = 0.01 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

transfer/pacing-false/same-seed: No change in performance detected.
       time:   [47.965 ms 49.768 ms 51.492 ms]
       change: [-6.1545% -1.5034% +3.4935%] (p = 0.55 > 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild

transfer/pacing-true/same-seed: No change in performance detected.
       time:   [60.923 ms 67.805 ms 74.674 ms]
       change: [-18.847% -6.3929% +7.1145%] (p = 0.35 > 0.05)
1-conn/1-100mb-resp (aka. Download)/client: 💚 Performance has improved.
       time:   [126.55 ms 127.76 ms 129.52 ms]
       thrpt:  [772.10 MiB/s 782.71 MiB/s 790.18 MiB/s]
change:
       time:   [-27.403% -26.268% -24.915%] (p = 0.00 < 0.05)
       thrpt:  [+33.182% +35.626% +37.747%]

Found 8 outliers among 100 measurements (8.00%)
7 (7.00%) low mild
1 (1.00%) high severe

1-conn/10_000-parallel-1b-resp (aka. RPS)/client: No change in performance detected.
       time:   [408.49 ms 411.58 ms 414.65 ms]
       thrpt:  [24.117 Kelem/s 24.296 Kelem/s 24.481 Kelem/s]
change:
       time:   [-0.9293% +0.2571% +1.4515%] (p = 0.66 > 0.05)
       thrpt:  [-1.4308% -0.2564% +0.9380%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1-conn/1-1b-resp (aka. HPS)/client: No change in performance detected.
       time:   [46.648 ms 47.456 ms 48.297 ms]
       thrpt:  [20.705  elem/s 21.072  elem/s 21.437  elem/s]
change:
       time:   [-1.6077% +0.7764% +3.0788%] (p = 0.52 > 0.05)
       thrpt:  [-2.9869% -0.7704% +1.6340%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 198.2 ± 84.1 98.9 322.2 1.00
neqo msquic reno on 298.3 ± 71.2 259.3 466.2 1.00
neqo msquic reno 279.3 ± 25.0 247.5 332.7 1.00
neqo msquic cubic on 315.0 ± 106.3 255.0 597.3 1.00
neqo msquic cubic 281.3 ± 42.2 250.4 398.3 1.00
msquic neqo reno on 167.5 ± 108.4 85.6 530.2 1.00
msquic neqo reno 157.2 ± 91.3 84.4 359.1 1.00
msquic neqo cubic on 135.0 ± 96.7 84.5 458.3 1.00
msquic neqo cubic 167.8 ± 96.0 85.4 346.6 1.00
neqo neqo reno on 192.0 ± 66.5 151.9 363.2 1.00
neqo neqo reno 190.8 ± 68.6 151.6 407.3 1.00
neqo neqo cubic on 258.4 ± 135.3 159.7 596.3 1.00
neqo neqo cubic 311.5 ± 175.9 173.7 664.5 1.00

⬇️ Download logs

Copy link

github-actions bot commented Jul 26, 2024

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

@mxinden
Copy link
Collaborator Author

mxinden commented Jul 29, 2024

Strange. "No change in performance detected", but a 97% increase in runtime. Will look into it now.

1-conn/1-100mb-resp (aka. Download)/client: No change in performance detected.

       time:   [358.86 ms 589.33 ms 967.64 ms]
       thrpt:  [103.34 MiB/s 169.68 MiB/s 278.66 MiB/s]
change:
       time:   [+21.333% +97.949% +256.03%] (p = 0.08 > 0.05)
       thrpt:  [-71.912% -49.482% -17.582%]
       No change
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

@mxinden
Copy link
Collaborator Author

mxinden commented Aug 4, 2024

Still debugging the performance regression. I expected a performance improvement.

Interesting difference in --stats of a ~1GB download with rx: 819060 on main and rx: 25662 on 32k.

branch main

9s627ms INFO stats for Client ...
  rx: 819060 drop 1 dup 0 saved 1
  tx: 119448 lost 14789 lateack 4 ptoack 28
  pmtud: 8 sent 6 acked 0 lost 0 change
  resumed: false
  frames rx:
    crypto 6 done 5 token 4 close 0
    ack 2449 (max 119418) ping 4 padding 2666
    stream 817514 reset 0 stop 0
    max: stream 0 data 0 stream_data 0
    blocked: stream 0 data 0 stream_data 1299
    datagram 0
    ncid 35 rcid 0 pchallenge 0 presponse 0
    ack_frequency 10
  frames tx:
    crypto 3 done 0 token 0 close 3
    ack 117360 (max 820038) ping 46 padding 8
    stream 8 reset 0 stop 0
    max: stream 0 data 0 stream_data 2521
    blocked: stream 0 data 0 stream_data 0
    datagram 0
    ncid 0 rcid 0 pchallenge 0 presponse 0
    ack_frequency 2

branch 32k

3s488ms INFO stats for Client ...
  rx: 25662 drop 1 dup 0 saved 1
  tx: 15781 lost 0 lateack 0 ptoack 187
  pmtud: 10 sent 10 acked 0 lost 0 change
  resumed: false
  frames rx:
    crypto 3 done 1 token 1 close 0
    ack 2811 (max 15765) ping 78 padding 375850
    stream 24043 reset 0 stop 0
    max: stream 0 data 0 stream_data 0
    blocked: stream 0 data 0 stream_data 2670
    datagram 0
    ncid 7 rcid 0 pchallenge 0 presponse 0
    ack_frequency 3
  frames tx:
    crypto 2 done 0 token 0 close 3
    ack 13998 (max 26763) ping 149 padding 10
    stream 5 reset 0 stop 0
    max: stream 0 data 0 stream_data 2764
    blocked: stream 0 data 0 stream_data 0
    datagram 0
    ncid 0 rcid 0 pchallenge 0 presponse 0
    ack_frequency 1

@mxinden
Copy link
Collaborator Author

mxinden commented Aug 4, 2024

Without pacing, this pull request outperforms main:

➜  neqo-bin git:(main) ✗ critcmp main main-no-pacing 32k 32k-no-pacing -f "Download" --list
1-conn/1-100mb-resp (aka. Download)/client
------------------------------------------
32k-no-pacing      1.00    114.8±49.78ms   871.3 MB/sec
main-no-pacing     1.05    121.0±83.97ms   826.4 MB/sec
main               1.45    165.9±56.69ms   602.8 MB/sec
32k               11.39  1307.8±1038.97ms   76.5 MB/sec

@mxinden mxinden marked this pull request as ready for review August 11, 2024 12:15
@mxinden
Copy link
Collaborator Author

mxinden commented Aug 11, 2024

This is ready for a review. #2035 resolved the performance regressions. In addition, as expected, this now improves the Download benchmark:

1-conn/1-100mb-resp (aka. Download)/client: 💚 Performance has improved.

   time:   [126.55 ms 127.76 ms 129.52 ms]
   thrpt:  [772.10 MiB/s 782.71 MiB/s 790.18 MiB/s]

change:
time: [-27.403% -26.268% -24.915%] (p = 0.00 < 0.05)
thrpt: [+33.182% +35.626% +37.747%]

Note: The benefit of this pull request is not the performance improvement above, but the fact that the benchmark now executes closer to how Firefox uses neqo-*.

@larseggert larseggert enabled auto-merge August 12, 2024 10:41
@larseggert larseggert added this pull request to the merge queue Aug 12, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Aug 12, 2024
@larseggert larseggert added this pull request to the merge queue Aug 12, 2024
Merged via the queue into mozilla:main with commit 4a5a041 Aug 12, 2024
55 of 57 checks passed
mxinden added a commit to mxinden/neqo that referenced this pull request Oct 25, 2024
When POSTing a large request to a server, don't allocate the entire request
upfront, but instead, as is done in `neqo-bin/src/server/mod.rs`, iterate over a
static buffer.

Reuses the same logic from `neqo-bin/src/server/mod.rs`, i.e. `SendData`.

See previous similar change on server side mozilla#2008.
mxinden added a commit to mxinden/neqo that referenced this pull request Oct 26, 2024
When POSTing a large request to a server, don't allocate the entire request
upfront, but instead, as is done in `neqo-bin/src/server/mod.rs`, iterate over a
static buffer.

Reuses the same logic from `neqo-bin/src/server/mod.rs`, i.e. `SendData`.

See previous similar change on server side mozilla#2008.
github-merge-queue bot pushed a commit that referenced this pull request Oct 31, 2024
* bench(bin/client): don't allocate upload payload upfront

When POSTing a large request to a server, don't allocate the entire request
upfront, but instead, as is done in `neqo-bin/src/server/mod.rs`, iterate over a
static buffer.

Reuses the same logic from `neqo-bin/src/server/mod.rs`, i.e. `SendData`.

See previous similar change on server side #2008.

* Inline done()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants