Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Single-packet RTX in Initial/Handshake + RTX handshake ACKs #2027

Merged
merged 14 commits into from
Aug 13, 2024

Conversation

larseggert
Copy link
Collaborator

@larseggert larseggert commented Aug 1, 2024

With this change, we're now only sending one packet on PTO in the Initial and Handshake packet number spaces, for both server and client. Before, the server was allowed to send two and the client was allowed to send two during Handshake, which meant that if one packet was lost, we would RTX the lost packet and a (useless) empty PING. The server would then hit the amplification limit. With this change, we only RTX the lost packet, making the Handshake more robust under loss.

A second change here is to immediately force an ACK during the Handshake when a previous ACK was lost. Before this, the original Initial form the server contained an ACK, but any RTX did not, because we stopped the ACK timer when generating the first ACK and never restarting it.

Both these changes reduce L1 and C1 interop failures.

This is not yet complete. It works for the first RTX of a lost server
Initial, which now contains the same data as the original
transmission. A third RTX only contains pings. Something is off about
our probe logic. Will continue investigating, but want to see what
the bench and QNS impacts of this change are.
Copy link

github-actions bot commented Aug 1, 2024

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Copy link

codecov bot commented Aug 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.38%. Comparing base (bff53b7) to head (3cc3957).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2027      +/-   ##
==========================================
+ Coverage   95.35%   95.38%   +0.02%     
==========================================
  Files         112      112              
  Lines       36507    36516       +9     
==========================================
+ Hits        34812    34831      +19     
+ Misses       1695     1685      -10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Aug 1, 2024

Benchmark results

Performance differences relative to 4a5a041.

coalesce_acked_from_zero 1+1 entries: Change within noise threshold.
       time:   [191.94 ns 192.40 ns 192.94 ns]
       change: [+0.3425% +0.7406% +1.1239%] (p = 0.00 < 0.05)

Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe

coalesce_acked_from_zero 3+1 entries: 💔 Performance has regressed.
       time:   [235.81 ns 236.38 ns 236.97 ns]
       change: [+1.1324% +1.5194% +1.9216%] (p = 0.00 < 0.05)

Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) low mild
4 (4.00%) high mild
10 (10.00%) high severe

coalesce_acked_from_zero 10+1 entries: 💔 Performance has regressed.
       time:   [236.01 ns 236.88 ns 237.90 ns]
       change: [+1.6128% +2.2558% +3.0309%] (p = 0.00 < 0.05)

Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) low mild
12 (12.00%) high severe

coalesce_acked_from_zero 1000+1 entries: Change within noise threshold.
       time:   [216.95 ns 217.03 ns 217.11 ns]
       change: [+0.1359% +0.8773% +1.5342%] (p = 0.01 < 0.05)

Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [119.21 ms 119.30 ms 119.39 ms]
       change: [-0.4797% -0.3708% -0.2698%] (p = 0.00 < 0.05)

Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild

transfer/pacing-false/varying-seeds: Change within noise threshold.
       time:   [38.945 ms 40.916 ms 42.944 ms]
       change: [-12.524% -6.5467% +0.0121%] (p = 0.04 < 0.05)

Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

transfer/pacing-true/varying-seeds: No change in performance detected.
       time:   [57.063 ms 59.989 ms 62.928 ms]
       change: [-1.1142% +6.3451% +14.212%] (p = 0.10 > 0.05)

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild

transfer/pacing-false/same-seed: No change in performance detected.
       time:   [48.774 ms 50.516 ms 52.214 ms]
       change: [-2.0857% +2.8388% +7.8430%] (p = 0.26 > 0.05)

Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) low mild

transfer/pacing-true/same-seed: No change in performance detected.
       time:   [72.918 ms 79.133 ms 85.317 ms]
       change: [-2.4009% +8.5439% +20.421%] (p = 0.14 > 0.05)
1-conn/1-100mb-resp (aka. Download)/client: No change in performance detected.
       time:   [129.18 ms 129.88 ms 130.53 ms]
       thrpt:  [766.10 MiB/s 769.97 MiB/s 774.09 MiB/s]
change:
       time:   [-7.0309% -1.3579% +2.1144%] (p = 0.76 > 0.05)
       thrpt:  [-2.0706% +1.3766% +7.5627%]

Found 7 outliers among 100 measurements (7.00%)
7 (7.00%) low mild

1-conn/10_000-parallel-1b-resp (aka. RPS)/client: Change within noise threshold.
       time:   [409.09 ms 412.98 ms 416.82 ms]
       thrpt:  [23.991 Kelem/s 24.214 Kelem/s 24.444 Kelem/s]
change:
       time:   [-0.0080% +1.3512% +2.7025%] (p = 0.05 < 0.05)
       thrpt:  [-2.6314% -1.3332% +0.0080%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild

1-conn/1-1b-resp (aka. HPS)/client: No change in performance detected.
       time:   [46.490 ms 47.264 ms 48.041 ms]
       thrpt:  [20.816  elem/s 21.158  elem/s 21.510  elem/s]
change:
       time:   [-1.5841% +0.6018% +2.9852%] (p = 0.61 > 0.05)
       thrpt:  [-2.8987% -0.5982% +1.6096%]

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 150.5 ± 64.3 103.2 335.2 1.00
neqo msquic reno on 264.3 ± 11.5 253.2 284.2 1.00
neqo msquic reno 289.3 ± 66.0 257.4 476.1 1.00
neqo msquic cubic on 327.1 ± 94.0 260.9 513.6 1.00
neqo msquic cubic 262.9 ± 10.2 249.9 289.2 1.00
msquic neqo reno on 135.2 ± 81.2 83.5 368.5 1.00
msquic neqo reno 119.6 ± 68.9 84.4 346.9 1.00
msquic neqo cubic on 140.2 ± 85.4 85.7 387.8 1.00
msquic neqo cubic 145.4 ± 86.0 84.7 372.2 1.00
neqo neqo reno on 190.0 ± 54.9 151.8 361.5 1.00
neqo neqo reno 219.0 ± 72.8 158.7 358.3 1.00
neqo neqo cubic on 222.4 ± 85.4 152.6 513.3 1.00
neqo neqo cubic 251.6 ± 119.7 169.3 538.7 1.00

⬇️ Download logs

Copy link

github-actions bot commented Aug 1, 2024

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

@larseggert larseggert marked this pull request as ready for review August 2, 2024 10:48
@larseggert larseggert changed the title fix: Send ACKs during handshake after they are lost fix: Single-packet RTX in Initial + RTX handshake ACKs Aug 2, 2024
Copy link
Collaborator

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what my review is worth, this makes sense to me. That said, I suggest waiting for another review, given the subtle nature of the change.

neqo-transport/src/connection/tests/handshake.rs Outdated Show resolved Hide resolved
neqo-transport/src/tracking.rs Outdated Show resolved Hide resolved
@larseggert larseggert changed the title fix: Single-packet RTX in Initial + RTX handshake ACKs fix: Single-packet RTX in Initial/Handshake + RTX handshake ACKs Aug 6, 2024
@larseggert
Copy link
Collaborator Author

That last change seems to really have helped with L1 and C1. I think this is ready for review.

@larseggert larseggert requested a review from mxinden August 6, 2024 11:39
@larseggert
Copy link
Collaborator Author

larseggert commented Aug 6, 2024

And now the L1/C1 failures are back :-(

neqo-transport/src/recovery/mod.rs Outdated Show resolved Hide resolved
neqo-transport/src/tracking.rs Show resolved Hide resolved
Signed-off-by: Lars Eggert <lars@eggert.org>
Signed-off-by: Lars Eggert <lars@eggert.org>
Signed-off-by: Lars Eggert <lars@eggert.org>
@larseggert larseggert added this pull request to the merge queue Aug 13, 2024
Merged via the queue into mozilla:main with commit 16e9085 Aug 13, 2024
57 checks passed
@larseggert larseggert deleted the fix-handshake-ack branch August 13, 2024 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants