feat: DPLPMTUD #1903

larseggert · 2024-05-14T15:05:21Z

This implements a simplified variant of PLDPMTUD (aka RFC8899), which by default probes for increasingly larger PMTUs using an MTU table.

~~There is currently no attempt to repeat the PMTUD at intervals.~~ There is also no attempt to detect PMTUs that are in between values in the table. ~~There is no attempt to handle the case where the PMTU shrinks.~~

A lot of the existing tests (~50%) break when PMTUD is enabled, so this PR disables it by default. New tests that cover PMTUD were added to this PR.

Fixes #243

github-actions · 2024-05-14T15:21:55Z

Failed Interop Tests

QUIC Interop Runner, client vs. server

aioquic vs. neqo-latest: A
go-x-net vs. neqo-latest: A
kwik vs. neqo-latest: A
lsquic vs. neqo-latest: A
msquic vs. neqo-latest: LR A L1
mvfst vs. neqo-latest: Z 3 A L1 C1
neqo vs. neqo-latest: LR A C1
neqo-latest vs. aioquic: Z L1 C1
neqo-latest vs. haproxy: Z
neqo-latest vs. kwik: Z L1
neqo-latest vs. lsquic: Z
neqo-latest vs. msquic: Z A L1 C1
neqo-latest vs. mvfst: DC U A L1 L2 C1 C2
neqo-latest vs. neqo: LR A C1
neqo-latest vs. neqo-latest: LR A C1
neqo-latest vs. nginx: L1 C1
neqo-latest vs. ngtcp2: Z
neqo-latest vs. quic-go: Z
neqo-latest vs. quinn: Z E A
neqo-latest vs. s2n-quic: R L1
neqo-latest vs. xquic: Z A
ngtcp2 vs. neqo-latest: LR A
picoquic vs. neqo-latest: R A
quic-go vs. neqo-latest: A L1 C1
quiche vs. neqo-latest: 3 A
quinn vs. neqo-latest: Z E A
s2n-quic vs. neqo-latest: E A L1 C1
xquic vs. neqo-latest: M R A C1

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

aioquic vs. neqo-latest: H DC LR C20 M S R Z 3 B L1 L2 C1 C2 6 V2
chrome vs. neqo-latest: 3
go-x-net vs. neqo-latest: H DC LR M B U L2 C2 6
kwik vs. neqo-latest: H DC LR C20 M S R Z 3 B U L1 L2 C1 C2 6 V2
lsquic vs. neqo-latest: H DC LR M S R 3 B E L1 L2 C1 C2 6 V2
msquic vs. neqo-latest: H DC C20 M S R Z B U L2 C1 C2 6 V2
mvfst vs. neqo-latest: H DC LR M B L2 C2 6
neqo vs. neqo-latest: H DC C20 M S R Z 3 B U E L1 L2 C2 6 V2
neqo-latest vs. aioquic: H DC LR C20 M S R 3 B U A L2 C2 6 V2
neqo-latest vs. go-x-net: H DC LR M B U A L2 C2 6
neqo-latest vs. haproxy: H DC LR C20 M S R 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. kwik: H DC LR C20 M S R 3 B U A L2 C1 C2 6 V2
neqo-latest vs. lsquic: H DC LR C20 M S R 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. msquic: H DC LR C20 M S R B U L2 C2 6 V2
neqo-latest vs. mvfst: H LR M R Z 3 B 6
neqo-latest vs. neqo: H DC C20 M S R Z 3 B U E L1 L2 C2 6 V2
neqo-latest vs. neqo-latest: H DC C20 M S R Z 3 B U E L1 L2 C2 6 V2
neqo-latest vs. nginx: H DC LR C20 M S R Z 3 B U A L2 C2 6
neqo-latest vs. ngtcp2: H DC LR C20 M S R 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. picoquic: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. quic-go: H DC LR C20 M S R 3 B U A L1 L2 C1 C2 6
neqo-latest vs. quiche: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. quinn: H DC LR C20 M S R 3 B U L2 C2 6
neqo-latest vs. s2n-quic: H DC LR C20 M S 3 B U E A L2 C1 C2 6
neqo-latest vs. xquic: H DC LR C20 M R 3 B U L1 L2 C1 C2 6
ngtcp2 vs. neqo-latest: H DC C20 M S R Z 3 B U E L1 L2 C1 C2 6 V2
picoquic vs. neqo-latest: H DC LR C20 M S Z 3 B U E L1 L2 C1 C2 6 V2
quic-go vs. neqo-latest: H DC LR C20 M S R Z 3 B U L2 C2 6
quiche vs. neqo-latest: H DC LR M S R Z B L1 L2 C1 C2 6
quinn vs. neqo-latest: H DC LR C20 M S R 3 B U L2 C2 6
s2n-quic vs. neqo-latest: H DC LR M S R 3 B L2 C2 6
xquic vs. neqo-latest: H DC LR C20 S Z 3 B U L1 L2 C2 6

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

aioquic vs. neqo-latest: U E
chrome vs. neqo-latest: H DC LR C20 M S R Z B U E A L1 L2 C1 C2 6 V2
go-x-net vs. neqo-latest: C20 S R Z 3 E L1 C1 V2
kwik vs. neqo-latest: E
lsquic vs. neqo-latest: C20 Z U
msquic vs. neqo-latest: 3 E
mvfst vs. neqo-latest: C20 S R U E V2
neqo-latest vs. aioquic: E
neqo-latest vs. go-x-net: C20 S R Z 3 E L1 C1 V2
neqo-latest vs. haproxy: E
neqo-latest vs. kwik: E
neqo-latest vs. msquic: 3 E
neqo-latest vs. mvfst: C20 S E V2
neqo-latest vs. nginx: E V2
neqo-latest vs. quic-go: E V2
neqo-latest vs. quiche: E V2
neqo-latest vs. quinn: L1 C1 V2
neqo-latest vs. s2n-quic: Z V2
neqo-latest vs. xquic: S E V2
quic-go vs. neqo-latest: E V2
quiche vs. neqo-latest: C20 U E V2
quinn vs. neqo-latest: L1 C1 V2
s2n-quic vs. neqo-latest: C20 Z U V2
xquic vs. neqo-latest: E V2

mxinden · 2024-05-14T16:15:56Z

(There are also a bunch of warning about unused code that is actually used. I don't understand why that is, since those functions mirror existing ones such as cwnd_avail.)

As far as I can tell the trait function CongestionControl::cwnd_min and its implementation <ClassicCongestionControl<T> as CongestionControl>::cwnd_min are only called in PacketSender::cwnd_min. PacketSender::cwnd_min is only called in testing code. Thus, cargo complains about the 3 not being used.

Does that make sense @larseggert?

neqo-transport/src/path.rs

github-actions · 2024-05-14T16:37:47Z

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

Linux: Debug Release
macOS: Debug Release
Windows: ~~Debug~~ ~~Release~~

neqo-transport/src/pmtud.rs

neqo-transport/src/path.rs

github-actions · 2024-05-14T17:37:20Z

Benchmark results

Performance differences relative to 7d610ed.

coalesce_acked_from_zero 1+1 entries: Change within noise threshold.

       time:   [193.61 ns 194.06 ns 194.53 ns]
       change: [+0.0719% +0.4048% +0.7829%] (p = 0.02 < 0.05)
Found 13 outliers among 100 measurements (13.00%)

1 (1.00%) low mild

8 (8.00%) high mild

4 (4.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.

       time:   [234.67 ns 235.73 ns 237.08 ns]
       change: [+0.6006% +2.3382% +6.2565%] (p = 0.06 > 0.05)
Found 17 outliers among 100 measurements (17.00%)

11 (11.00%) high mild

6 (6.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.

       time:   [233.27 ns 234.03 ns 234.96 ns]
       change: [-0.1847% +0.3385% +0.8310%] (p = 0.21 > 0.05)
Found 8 outliers among 100 measurements (8.00%)

1 (1.00%) high mild

7 (7.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.

       time:   [215.28 ns 217.58 ns 222.98 ns]
       change: [-0.5398% +6.4762% +20.010%] (p = 0.44 > 0.05)
Found 10 outliers among 100 measurements (10.00%)

3 (3.00%) high mild

7 (7.00%) high severe

RxStreamOrderer::inbound_frame(): 💔 Performance has regressed.

       time:   [120.23 ms 120.29 ms 120.37 ms]
       change: [+1.1136% +1.1966% +1.2758%] (p = 0.00 < 0.05)
Found 2 outliers among 100 measurements (2.00%)

2 (2.00%) high mild

transfer/Run multiple transfers with varying seeds: 💚 Performance has improved.

       time:   [55.737 ms 59.125 ms 62.506 ms]
       thrpt:  [63.994 MiB/s 67.653 MiB/s 71.766 MiB/s]
change:
       time:   [-53.858% -51.136% -48.358%] (p = 0.00 < 0.05)
       thrpt:  [+93.642% +104.65% +116.72%]

transfer/Run multiple transfers with the same seed: 💚 Performance has improved.

       time:   [65.695 ms 70.817 ms 75.893 ms]
       thrpt:  [52.706 MiB/s 56.484 MiB/s 60.887 MiB/s]
change:
       time:   [-45.571% -41.490% -37.459%] (p = 0.00 < 0.05)
       thrpt:  [+59.896% +70.912% +83.726%]

1-conn/1-100mb-resp (aka. Download)/client: 💚 Performance has improved.

       time:   [154.81 ms 160.68 ms 166.89 ms]
       thrpt:  [599.19 MiB/s 622.35 MiB/s 645.97 MiB/s]
change:
       time:   [-86.471% -85.914% -85.283%] (p = 0.00 < 0.05)
       thrpt:  [+579.47% +609.92% +639.15%]

1-conn/10_000-parallel-1b-resp (aka. RPS)/client: Change within noise threshold.

       time:   [430.16 ms 433.60 ms 437.05 ms]
       thrpt:  [22.880 Kelem/s 23.063 Kelem/s 23.247 Kelem/s]
change:
       time:   [-2.5302% -1.4809% -0.4847%] (p = 0.00 < 0.05)
       thrpt:  [+0.4870% +1.5031% +2.5959%]

1-conn/1-1b-resp (aka. HPS)/client: 💚 Performance has improved.

       time:   [43.569 ms 44.108 ms 44.652 ms]
       thrpt:  [22.395  elem/s 22.671  elem/s 22.952  elem/s]
change:
       time:   [-3.7644% -2.5274% -1.2515%] (p = 0.00 < 0.05)
       thrpt:  [+1.2674% +2.5930% +3.9116%]

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client	Server	CC	Pacing	Mean [ms]	Min [ms]	Max [ms]	Relative
msquic	msquic			111.9 ± 13.4	88.9	147.8	1.00
neqo	msquic	reno	on	267.4 ± 7.3	250.7	279.4	1.00
neqo	msquic	reno		271.8 ± 6.0	264.6	283.5	1.00
neqo	msquic	cubic	on	275.0 ± 15.4	245.0	299.2	1.00
neqo	msquic	cubic		267.1 ± 7.8	256.6	281.1	1.00
msquic	neqo	reno	on	235.3 ± 166.2	89.0	648.3	1.00
msquic	neqo	reno		140.0 ± 20.3	111.6	172.6	1.00
msquic	neqo	cubic	on	164.5 ± 59.6	113.1	331.7	1.00
msquic	neqo	cubic		206.5 ± 70.2	126.4	361.7	1.00
neqo	neqo	reno	on	192.5 ± 24.7	151.4	237.1	1.00
neqo	neqo	reno		173.1 ± 19.6	144.6	215.4	1.00
neqo	neqo	cubic	on	186.7 ± 47.4	146.1	358.3	1.00
neqo	neqo	cubic		171.5 ± 9.3	156.5	187.0	1.00

⬇️ Download logs

neqo-transport/src/crypto.rs

martinthomson

I'm not seeing PMTUD tests, which would be necessary for this.

The big question I have is the one that Christian makes about PTMUD generally: how do you know that the bytes you use on PMTUD pay you back?

There is probably a case for sending probes when you have spare sending capacity and nothing better to send. Indeed, successfully probing will let us push congestion windows up more and could even improve performance.

What I'm seeing here displaces other data. I'd like to see something that doesn't do that. There's a fundamental problem that needs analysis though. You can't predict that a connection will be used for uploads, so you don't know when probes will really help. I see a few cases:

The connection is short-lived or low volume. Probes are strictly wasteful.
The connection is long-lived and high volume, with ample idle time for probing. Probes can use gaps. This might be a video stream, where probing can fit into a warmup period. Probes are therefore easy and super-helpful.
The connection exists only to support a smaller upload. The upload is small enough that probes are wasteful.
The connection exists only to support a larger upload. The upload is large enough that spending bytes on probing early on is a good investment.

Case 1 and 2 are easy to deal with. We could probe on an idle connection and tolerate a small amount of waste for case 1 if it makes case 2 appreciably better.

The split between 3 and 4 is rough. There is an uncertain zone between the two as well where some probing is justified, but successive rounds of probing might be wasteful as the throughput gain over the remaining time diminishes relative to the wasted effort of extra probes.

Right now, you don't send real data in probes. You are effectively betting on the probes being lost. But you could send data, which would reduce the harm in case 3. It might even make the code slightly simpler.

neqo-transport/src/pmtud.rs

neqo-transport/src/pace.rs

neqo-transport/src/path.rs

neqo-transport/src/pmtud.rs

neqo-transport/src/path.rs

Co-authored-by: Martin Thomson <mt@lowentropy.net> Signed-off-by: Lars Eggert <lars@eggert.org>

Co-authored-by: Max Inden <mail@max-inden.de> Signed-off-by: Lars Eggert <lars@eggert.org>

Co-authored-by: Martin Thomson <mt@lowentropy.net> Signed-off-by: Lars Eggert <lars@eggert.org>

Follow-up on mozilla#1903 (comment)

`Probe` is a small simple enum on the stack, thus convention is to implement `Copy` instead of only `Clone` with a call to `clone()`. The following helped me in the past: > When should my type be Copy? > > Generally speaking, if your type can implement Copy, it should. Keep in mind, > though, that implementing Copy is part of the public API of your type. If the > type might become non-Copy in the future, it could be prudent to omit the Copy > implementation now, to avoid a breaking API change. https://doc.rust-lang.org/std/marker/trait.Copy.html#when-should-my-type-be-copy

…iewers,kershaw,janerik This commit adds four Glean probes: - http3_udp_datagram_segment_size_sent - http3_udp_datagram_segment_size_received - http3_udp_datagram_size_received - http3_udp_datagram_num_segments_received Given the performance impact tracking Glean metrics in the UDP hot path, see https://phabricator.services.mozilla.com/D216034#7453056, this commit introduces a sample buffer per metric. This will enable us to measure the impact of: - Implementation of Packetization Layer Path MTU Discovery for Datagram Transports (RFC 8899) [in Neqo](mozilla/neqo#1903) - [Fast UDP for Firefox](https://bugzilla.mozilla.org/show_bug.cgi?id=1901292) Differential Revision: https://phabricator.services.mozilla.com/D216034

…iewers,kershaw,janerik This commit adds four Glean probes: - http3_udp_datagram_segment_size_sent - http3_udp_datagram_segment_size_received - http3_udp_datagram_size_received - http3_udp_datagram_num_segments_received Given the performance impact tracking Glean metrics in the UDP hot path, see https://phabricator.services.mozilla.com/D216034#7453056, this commit introduces a sample buffer per metric. This will enable us to measure the impact of: - Implementation of Packetization Layer Path MTU Discovery for Datagram Transports (RFC 8899) [in Neqo](mozilla/neqo#1903) - [Fast UDP for Firefox](https://bugzilla.mozilla.org/show_bug.cgi?id=1901292) Differential Revision: https://phabricator.services.mozilla.com/D216034 UltraBlame original commit: 16d3f312970444dd8bc8af65629cbef7d7dbcd62

larseggert added 5 commits May 13, 2024 12:46

WIP

de41d72

Merge remote-tracking branch 'origin/main' into feat-dplpmtud

a3b0f0c

Fixes

80fd884

Minimize diff

76f3fd4

Progress

3cc307b

larseggert changed the title ~~feat: Groudwork for DPLPMTUD~~ feat: Groundwork for DPLPMTUD May 14, 2024

larseggert commented May 14, 2024

View reviewed changes

neqo-transport/src/path.rs Outdated Show resolved Hide resolved

Fix clippy

9f51395

Reduce diff to main

9ecfda2

mxinden reviewed May 14, 2024

View reviewed changes

neqo-transport/src/pmtud.rs Outdated Show resolved Hide resolved

neqo-transport/src/path.rs Outdated Show resolved Hide resolved

larseggert added 3 commits May 15, 2024 14:53

Merge branch 'main' into feat-dplpmtud

213ad01

Use RefCell

19c2f44

Make Pacer use PmtudState

103ac4e

larseggert marked this pull request as ready for review May 15, 2024 16:29

larseggert requested review from KershawChang and martinthomson as code owners May 15, 2024 16:29

larseggert added 5 commits May 16, 2024 16:36

Merge branch 'main' into feat-dplpmtud

cc03dc2

Renamings

3a7a923

Fix tests broken by changing PATH_MTU_V6

b42b525

WIP

e02ddf7

Finalize

b3e4fd0

larseggert changed the title ~~feat: Groundwork for DPLPMTUD~~ feat: DPLPMTUD May 21, 2024

larseggert commented May 21, 2024

View reviewed changes

neqo-transport/src/crypto.rs Outdated Show resolved Hide resolved

Merge branch 'main' into feat-dplpmtud

da3cbb9

martinthomson reviewed May 22, 2024

View reviewed changes

Update neqo-transport/src/path.rs

608e1c6

Co-authored-by: Martin Thomson <mt@lowentropy.net> Signed-off-by: Lars Eggert <lars@eggert.org>

larseggert and others added 6 commits July 10, 2024 07:56

Update neqo-transport/src/stats.rs

be7dcdb

Co-authored-by: Max Inden <mail@max-inden.de> Signed-off-by: Lars Eggert <lars@eggert.org>

Suggestions from Max

3a8f2c4

Update neqo-transport/src/pmtud.rs

fd2af00

Co-authored-by: Martin Thomson <mt@lowentropy.net> Signed-off-by: Lars Eggert <lars@eggert.org>

Suggestions from Martin

3d8ba19

More suggestions

1f7e277

Add TODO

5c06141

mxinden added a commit to mxinden/neqo that referenced this pull request Jul 10, 2024

Use filter fn

3808dcc

Follow-up on mozilla#1903 (comment)

mxinden mentioned this pull request Jul 10, 2024

Use filter fn larseggert/neqo#25

Merged

mxinden added a commit to mxinden/neqo that referenced this pull request Jul 10, 2024

Use filter fn

4b195a0

Follow-up on mozilla#1903 (comment)

mxinden added a commit to mxinden/neqo that referenced this pull request Jul 10, 2024

Use filter fn

32635f5

Follow-up on mozilla#1903 (comment)

mxinden added 2 commits July 10, 2024 09:54

Use filter fn

6f03b99

Follow-up on mozilla#1903 (comment)

clippy

bd58c96

mxinden mentioned this pull request Jul 10, 2024

Release neqo v0.8.0 #1971

Closed

larseggert and others added 6 commits July 10, 2024 13:03

Merge pull request #25 from mxinden/feat-dplpmtud

3a3d554

Fixes

8d1d1cd

doc fix

673216a

Merge pull request #26 from mxinden/feat-dplpmtud

2ce8db1

Make search_tables identical length, and deal with the fallout

151e802

larseggert enabled auto-merge July 10, 2024 14:05

More

5a1f659

larseggert added this pull request to the merge queue Jul 10, 2024

Merged via the queue into mozilla:main with commit 4852dc6 Jul 10, 2024
54 of 56 checks passed

larseggert deleted the feat-dplpmtud branch July 10, 2024 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: DPLPMTUD #1903

feat: DPLPMTUD #1903

larseggert commented May 14, 2024 •

edited

Loading

github-actions bot commented May 14, 2024 •

edited

Loading

Succeeded Interop Tests

Unsupported Interop Tests

mxinden commented May 14, 2024

github-actions bot commented May 14, 2024 •

edited

Loading

github-actions bot commented May 14, 2024 •

edited

Loading

martinthomson left a comment

feat: DPLPMTUD #1903

feat: DPLPMTUD #1903

Conversation

larseggert commented May 14, 2024 • edited Loading

github-actions bot commented May 14, 2024 • edited Loading

Failed Interop Tests

Succeeded Interop Tests

Unsupported Interop Tests

mxinden commented May 14, 2024

github-actions bot commented May 14, 2024 • edited Loading

Firefox builds for this PR

github-actions bot commented May 14, 2024 • edited Loading

Benchmark results

Client/server transfer results

martinthomson left a comment

Choose a reason for hiding this comment

larseggert commented May 14, 2024 •

edited

Loading

github-actions bot commented May 14, 2024 •

edited

Loading

github-actions bot commented May 14, 2024 •

edited

Loading

github-actions bot commented May 14, 2024 •

edited

Loading