Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Don't encode large RTT guesses in tickets #2114

Merged
merged 5 commits into from
Sep 17, 2024

Conversation

larseggert
Copy link
Collaborator

Because under lossy conditions (e.g., QNS handshakeloss test), the guess can be multiple times the actual RTT, which when encoded in the resumption ticket will cause an extremely slow second handshake, often causing the test to time out.

Broken out of #1998
Fixes #2088

Because under lossy conditions (e.g., QNS `handshakeloss` test), the
guess can be multiple times the actual RTT, which when encoded in the
resumption ticket will cause an extremely slow second handshake, often
causing the test to time out.

Broken out of mozilla#1998
Fixes mozilla#2088
Copy link

github-actions bot commented Sep 16, 2024

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Copy link

codecov bot commented Sep 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.35%. Comparing base (0cb89a9) to head (6de4d70).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2114   +/-   ##
=======================================
  Coverage   95.35%   95.35%           
=======================================
  Files         112      112           
  Lines       36316    36334   +18     
=======================================
+ Hits        34628    34647   +19     
+ Misses       1688     1687    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Sep 16, 2024

Benchmark results

coalesce_acked_from_zero 1+1 entries:
       time:   [99.188 ns 99.537 ns 99.890 ns]
Found 13 outliers among 100 measurements (13.00%)
  10 (10.00%) high mild
  3 (3.00%) high severe
coalesce_acked_from_zero 3+1 entries:
       time:   [116.94 ns 117.30 ns 117.68 ns]
Found 15 outliers among 100 measurements (15.00%)
  3 (3.00%) low mild
  12 (12.00%) high severe
coalesce_acked_from_zero 10+1 entries:
       time:   [116.54 ns 117.73 ns 119.75 ns]
Found 14 outliers among 100 measurements (14.00%)
  4 (4.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  7 (7.00%) high severe
coalesce_acked_from_zero 1000+1 entries:
       time:   [97.464 ns 97.579 ns 97.710 ns]
Found 15 outliers among 100 measurements (15.00%)
  9 (9.00%) high mild
  6 (6.00%) high severe
RxStreamOrderer::inbound_frame():
       time:   [110.92 ms 111.06 ms 111.28 ms]
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) low mild
  1 (1.00%) high severe
transfer/pacing-false/varying-seeds:
       time:   [26.854 ms 27.948 ms 29.038 ms]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild
transfer/pacing-true/varying-seeds:
       time:   [33.862 ms 35.583 ms 37.332 ms]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
transfer/pacing-false/same-seed:
       time:   [32.313 ms 33.059 ms 33.798 ms]
transfer/pacing-true/same-seed:
       time:   [40.343 ms 42.721 ms 45.125 ms]
1-conn/1-100mb-resp (aka. Download)/client:
       time:   [114.10 ms 114.42 ms 114.74 ms]
       thrpt:  [871.52 MiB/s 873.96 MiB/s 876.43 MiB/s]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild
1-conn/10_000-parallel-1b-resp (aka. RPS)/client:
       time:   [313.76 ms 317.20 ms 320.61 ms]
       thrpt:  [31.190 Kelem/s 31.525 Kelem/s 31.871 Kelem/s]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild
1-conn/1-1b-resp (aka. HPS)/client:
       time:   [33.876 ms 34.118 ms 34.381 ms]
       thrpt:  [29.086  elem/s 29.310  elem/s 29.520  elem/s]
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 182.0 ± 109.8 102.1 516.5 1.00
neqo msquic reno on 287.1 ± 90.9 217.9 461.5 1.00
neqo msquic reno 241.8 ± 56.0 206.2 415.1 1.00
neqo msquic cubic on 237.9 ± 68.3 206.6 451.5 1.00
neqo msquic cubic 226.0 ± 12.0 208.4 244.6 1.00
msquic neqo reno on 166.8 ± 90.6 96.1 380.7 1.00
msquic neqo reno 140.4 ± 88.9 83.6 338.6 1.00
msquic neqo cubic on 195.3 ± 127.1 98.0 530.4 1.00
msquic neqo cubic 160.8 ± 123.4 83.8 610.1 1.00
neqo neqo reno on 204.2 ± 106.4 127.6 429.3 1.00
neqo neqo reno 234.5 ± 154.9 142.9 609.4 1.00
neqo neqo cubic on 213.8 ± 89.0 123.6 426.2 1.00
neqo neqo cubic 207.6 ± 94.9 130.8 411.3 1.00

⬇️ Download logs

Copy link

github-actions bot commented Sep 16, 2024

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

@larseggert larseggert marked this pull request as draft September 16, 2024 12:30
@larseggert
Copy link
Collaborator Author

This needs more work. I thought the code here

if self.rtt.first_sample_time().is_none() {
// When discarding a packet there might not be a good RTT estimate.
// But discards only occur after receiving something, so that means
// that there is some RTT information, which is better than nothing.
// Two cases: 1. at the client when handling a Retry and
// 2. at the server when disposing the Initial packet number space.
qinfo!(
[self],
"discarding a packet without an RTT estimate; guessing RTT={:?}",
now - sent.time_sent()
);
stats.rtt_init_guess = true;
self.rtt.update(
&self.qlog,
now - sent.time_sent(),
Duration::new(0, 0),
false,
now,
);
}

was an optimization, but the idle_timeout_crazy_rtt test hangs when I remove it.

@martinthomson do you remember why this was put in? This kind of RTT guessing isn't using packet number and so becomes very wrong very quickly when there is loss.

@larseggert larseggert marked this pull request as ready for review September 16, 2024 13:19
Copy link
Collaborator

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the elaborate test!

neqo-transport/src/rtt.rs Show resolved Hide resolved
@larseggert larseggert added this pull request to the merge queue Sep 17, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 17, 2024
@larseggert larseggert added this pull request to the merge queue Sep 17, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 17, 2024
@larseggert larseggert added this pull request to the merge queue Sep 17, 2024
Merged via the queue into mozilla:main with commit b72b3ba Sep 17, 2024
56 checks passed
@larseggert larseggert deleted the fix-no-large-rtt-guesses branch September 17, 2024 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Huge timeout in L1 QNS test
2 participants