Implement acknowledgement frequency #1553

aochagavia · 2023-05-03T11:00:11Z

Overview of the feature:

The ack frequency extension allows one to control the ACK frequency of the remote peer.
Quinn provides an AckFrequencyConfig to control the parameters, which are sent to the peer in an ACK_FREQUENCY frame at the beginning of the connection (technically, the extension allows modifying the ack frequency parameters multiple times during a connection, but then you would need to come up with an algorithm to automatically tune the parameters, which IMO is out of scope for this first implementation).
On the other side of the connection, Quinn will receive and process the ACK_FREQUENCY frame, updating the local ack frequency parameters based on the new parameters sent by the peer.
If the local peer has requested a custom ack frequency from the remote peer, Quinn will send an IMMEDIATE_ACK frame every RTT if there are any unacked ack-eliciting packets (this is a new kind of frame to immediately elicit an ACK). This is recommended by the draft as a way to ensure congestion control and loss detection work properly.
The draft is ambiguous when it comes to out-of-order packet detection. I wrote the specific details in this issue, hoping it gets clarified. In the meantime, I am sticking to the text of the draft and ignoring the parts in the examples that deviate from the draft's definitions.

I have a bunch of questions / comments:

I introduced two timers, one to be notified when max_ack_delay is elapsed, and one to be notified when an RTT is elapsed. I tried to come up with an approach that wouldn't require additional timers, but I'm not sure it is possible. It would be nice if you can double-check that, given your knowledge of the codebase.
I introduced a reset_rtt_timer function that arms the Timer::Rtt for the first time. I wanted to arm it right after we have reached the Data space and have an RTT estimate. I ended up calling it inside process_payload. Please let me know if there is a more suitable place.
I rewrote a long comment inside PendingAcks::acks_sent, which I found difficult to understand. Hopefully the new version says the same in clearer terms (otherwise let me know and I'll update it)

aochagavia · 2023-05-03T11:00:37Z

And... here are some benchmarks!

Practical details

I'm running the server using cargo run --release --bin perf_server -- --no-protection and the client using cargo run --release --bin perf_client -- --no-protection --duration 15 (note that this makes the benchmarks misleading for real-world use cases, because encryption is disabled). As you can deduce from the cargo commands used, I'm running both benchmarks on the same machine.

For the scenarios with ACK frequency enabled I'm doing so through the following code:

let mut ack_freq = AckFrequencyConfig::default();
ack_freq.ack_eliciting_threshold(10);
transport.ack_frequency_config(Some(ack_freq));

Conclusion

There seems to be no significant performance difference in the benchmarks, though we probably need a more statistically rigorous approach to benchmarking if we want to be sure. @stormshield-damiend suggested we might see clearer differences in performance if there is higher latency (instead of it being in the order of microseconds), so I'll benchmark again next week with a more realistic latency.

Baseline (before this PR)

      │ Upload Duration │ Download Duration | FBL        | Upload Throughput | Download Throughput
──────┼─────────────────┼───────────────────┼────────────┼───────────────────┼────────────────────
 AVG  │        862.00µs │          811.00µs │    50.00µs │     1199.97 MiB/s │       1293.80 MiB/s
 P0   │        661.00µs │          534.00µs │     1.00µs │       32.34 MiB/s │         34.34 MiB/s
 P10  │        741.00µs │          678.00µs │     2.00µs │      986.50 MiB/s │       1089.00 MiB/s
 P50  │        810.00µs │          762.00µs │    47.00µs │     1235.00 MiB/s │       1313.00 MiB/s
 P90  │          1.01ms │          919.00µs │    79.00µs │     1350.00 MiB/s │       1475.00 MiB/s
 P100 │         30.91ms │           29.12ms │   470.00µs │     1513.00 MiB/s │       1873.00 MiB/s

ACK frequency disabled (threshold = 1)

      │ Upload Duration │ Download Duration | FBL        | Upload Throughput | Download Throughput
──────┼─────────────────┼───────────────────┼────────────┼───────────────────┼────────────────────
 AVG  │        920.00µs │          805.00µs │    53.00µs │     1190.95 MiB/s │       1280.93 MiB/s
 P0   │        670.00µs │          134.00µs │     1.00µs │       33.44 MiB/s │         34.12 MiB/s
 P10  │        746.00µs │          699.00µs │     4.00µs │      996.50 MiB/s │       1113.00 MiB/s
 P50  │        819.00µs │          774.00µs │    49.00µs │     1222.00 MiB/s │       1292.00 MiB/s
 P90  │          1.00ms │          899.00µs │    82.00µs │     1341.00 MiB/s │       1431.00 MiB/s
 P100 │         29.92ms │           29.30ms │   847.00µs │     1493.00 MiB/s │       7464.00 MiB/s

ACK frequency enabled on the client (threshold = 10)

      │ Upload Duration │ Download Duration | FBL        | Upload Throughput | Download Throughput
──────┼─────────────────┼───────────────────┼────────────┼───────────────────┼────────────────────
 AVG  │        866.00µs │          767.00µs │    48.00µs │     1187.26 MiB/s │       1479.18 MiB/s
 P0   │        686.00µs │           70.00µs │     1.00µs │       33.25 MiB/s │         32.25 MiB/s
 P10  │        753.00µs │          680.00µs │     3.00µs │     1020.50 MiB/s │       1198.00 MiB/s
 P50  │        821.00µs │          749.00µs │    46.00µs │     1219.00 MiB/s │       1336.00 MiB/s
 P90  │        980.00µs │          835.00µs │    77.00µs │     1329.00 MiB/s │       1471.00 MiB/s
 P100 │         30.08ms │           30.99ms │   421.00µs │     1458.00 MiB/s │      14288.00 MiB/s

ACK frequency enabled on the server (threshold = 10)

      │ Upload Duration │ Download Duration | FBL        | Upload Throughput | Download Throughput
──────┼─────────────────┼───────────────────┼────────────┼───────────────────┼────────────────────
 AVG  │        890.00µs │          844.00µs │    51.00µs │     1182.28 MiB/s │       1202.34 MiB/s
 P0   │        684.00µs │          122.00µs │     1.00µs │       34.59 MiB/s │        100.25 MiB/s
 P10  │        766.00µs │          753.00µs │     2.00µs │     1016.50 MiB/s │       1064.00 MiB/s
 P50  │        826.00µs │          828.00µs │    49.00µs │     1211.00 MiB/s │       1208.00 MiB/s
 P90  │        984.00µs │          940.00µs │    86.00µs │     1306.00 MiB/s │       1329.00 MiB/s
 P100 │         28.90ms │            9.98ms │   427.00µs │     1462.00 MiB/s │       8200.00 MiB/s

aochagavia · 2023-05-03T15:04:48Z

How do you feel about the benchmark results? It'd be nice to be able to measure things in a more principled way, to ensure any performance difference (or lack thereof) is statistically significant. Maybe it'd make sense to simulate the networking part in memory (I have a semi-working local prototype), to make the output more predictable. I probably won't have the time to work on that, though...

aochagavia · 2023-05-08T15:37:08Z

Today I did another round of benchmarks using the changes from #1559, so I could simulate the link's delay (using bench instead of perf).

Command used: cargo run --release --bin bulk -- --no-protection --simulate-network --simulated-link-delay [delay] --download-size 200M --stats. Each run has its own value of delay, as you can see in the results below.

I ran the benchmarks on top of the main branch (equivalent to an ack-eliciting threshold of 0), and on top of this branch (using an ack-eliciting threshold of 1 and 10). In all cases, throughput remained similar, so I'm only reporting it a single time here, for each benchmarked link delay.

I assume throughput is being constrained by pacing. It would be interesting to tweak the config to start right away at a high transfer rate, because that would put real stress on Quinn and thereby reveal the true performance difference. I tried setting the initial congestion window to an absurdly high value, but somehow it didn't make any difference. Any clues on how to achieve higher throughput from the start? @Ralith, @djc

Another question: do these results make any sense at all? I am yet to develop a gut feeling for the relationship between delay and throughput.

Results

Delay = 5ms

      │  Throughput   │ Duration
──────┼───────────────┼──────────
 AVG  │   79.59 MiB/s │     2.51s
 P0   │   79.56 MiB/s │     2.51s
 P10  │   79.62 MiB/s │     2.51s
 P50  │   79.62 MiB/s │     2.51s
 P90  │   79.62 MiB/s │     2.51s
 P100 │   79.62 MiB/s │     2.51s

Delay = 10ms

      │  Throughput   │ Duration
──────┼───────────────┼──────────
 AVG  │   46.83 MiB/s │     4.27s
 P0   │   46.81 MiB/s │     4.27s
 P10  │   46.84 MiB/s │     4.27s
 P50  │   46.84 MiB/s │     4.27s
 P90  │   46.84 MiB/s │     4.27s
 P100 │   46.84 MiB/s │     4.27s

Delay = 50ms

      │  Throughput   │ Duration
──────┼───────────────┼──────────
 AVG  │   10.77 MiB/s │    18.57s
 P0   │   10.77 MiB/s │    18.56s
 P10  │   10.77 MiB/s │    18.58s
 P50  │   10.77 MiB/s │    18.58s
 P90  │   10.77 MiB/s │    18.58s
 P100 │   10.77 MiB/s │    18.58s

Delay = 200ms

      │  Throughput   │ Duration
──────┼───────────────┼──────────
 AVG  │    2.78 MiB/s │    72.03s
 P0   │    2.78 MiB/s │    72.00s
 P10  │    2.78 MiB/s │    72.06s
 P50  │    2.78 MiB/s │    72.06s
 P90  │    2.78 MiB/s │    72.06s
 P100 │    2.78 MiB/s │    72.06s

Ralith · 2023-05-08T19:02:40Z

I concur that testing on a clean loopback is probably going to make it difficult to judge the impact of this change.

I assume throughput is being constrained by pacing

This should be easy to test by hacking Pacer::delay to always return None.

do these results make any sense at all?

It's been a while since I've done manual WAN testing, but that seems much too slow.

I think we should expect throughput equal to one congestion window per RTT, and the congestion window should grow reasonably quickly unless there's packet loss, which should be impossible in the simulated context. The bandwidth you report seems almost exactly inversely proportional to round trip time, which suggests the congestion window is not growing. I wonder if Connection::app_limited is getting set spuriously? This is intended to prevent the congestion window from growing endlessly when sending less than the link's capacity. The logic is a bit fiddly (see e.g. #1126).

It should be easy to verify whether the congestion window is growing by reporting PathStats::cwnd at intervals (or even just at start/end). A large initial congestion window really should have an obvious impact, though; even pacing is cwnd-based. When the initial window is set very high, and data is nonetheless not being sent, where does the Connection::poll_transmit call bail out?

It might be interesting to contrast the simulated network behavior in bench with the behavior of perf using netem or similar to artificially introduce delay/jitter. This should tell us whether there's something specifically weird about the simulated network logic.

aochagavia · 2023-05-09T10:57:28Z

As it turns out, throughput was limited by the default send and receive windows. We were increasing the RTT, yet keeping the window sizes constant, so the same amount of traffic had to be spread over a longer period, leading to low throughput. The solution:

config.stream_receive_window(VarInt::MAX);
config.send_window(u64::MAX);

I re-benchmarked this with the config above, and now we are seeing proper performance. For instance, with delay = 5 and ack-eliciting threshold = 10:

      │  Throughput   │ Duration 
──────┼───────────────┼──────────
 AVG  │ 1160.82 MiB/s │  885.00ms
 P0   │  980.50 MiB/s │  808.00ms
 P10  │  981.00 MiB/s │  808.00ms
 P50  │ 1161.00 MiB/s │  876.00ms
 P90  │ 1250.00 MiB/s │  904.00ms
 P100 │ 1268.00 MiB/s │     1.04s

Unfortunately, testing with different ack frequencies and delays seemed to make no difference as regards to performance (on my system).

@Ralith @djc all this time I've been working under the assumption that this PR only makes sense if performance improves (otherwise it introduces complexity to the codebase for no benefit). Or do you think it is worthwhile to have this anyway, as long as we can show that performance doesn't worsen?

By the way, if you are interested I can prepare a branch you can use to easily benchmark locally, using the loopback interface and introducing delays with tc to ensure nothing fishy is going on with the simulated network code (unfortunately tc doesn't work on WSL, and trying it out in a cloud VM yielded weird results).

djc · 2023-05-09T11:07:23Z

IIRC this draft was specifically introduced to improve performance, so if we can't reproduce that result there's likely some more digging to do. As such, I think as long as this doesn't introduce a (non-trivial) performance regression, I think it would still be okay to merge?

aochagavia · 2023-05-16T18:43:03Z

Just rebased on top of latest main, to resolve a few conflicts. Any chance this PR can be reviewed soon? I don't think there is anything left to do on my side (please let me know otherwise).

Ralith · 2023-05-19T00:54:52Z

Will get this in the next day or two; thanks for your patience!

Ralith

Went over everything in the main commit carefully, save for a bit of skimming in the ack_frequency unit test. Overall this looks great, thank you!

quinn-proto/src/config.rs

quinn-proto/src/connection/mod.rs

quinn-proto/src/tests/mod.rs

quinn-proto/src/tests/util.rs

quinn-proto/src/transport_parameters.rs

aochagavia · 2023-05-29T09:21:44Z

Thanks for the thorough review! Just rebased on top of latest main and pushed a new version that incorporates all suggestions (unless anything slipped through the cracks).

quinn-proto/src/config.rs

quinn-proto/src/connection/ack_frequency.rs

quinn-proto/src/connection/mod.rs

quinn-proto/src/tests/util.rs

quinn-proto/src/connection/mod.rs

aochagavia · 2023-06-16T15:10:00Z

Just rebased and pushed changes to address some of your suggestions. For those that need more clarification I left some comments and questions

quinn-proto/src/connection/packet_crypto.rs

quinn-proto/src/connection/mod.rs

quinn-proto/src/packet.rs

aochagavia · 2023-06-22T10:58:09Z

Thanks for the review! Just rebased an pushed fixes for all points we discussed 😉

Almost there!

aochagavia · 2023-06-29T18:03:46Z

Is there anything else I can do to help land this? Just FYI, after July 7th it will be pretty difficult for me to find time to work on this

Ralith

This is looking really good; thanks for your persistence in driving it forward! Nits aside, just a few things left. I'll endeavor to be more responsive in the coming days, and if I don't manage that then I'll address any remaining items myself.

It didn't occur to me at first, but this also fixes #438 independently of support for the extension, which is a nice win.

quinn-proto/src/tests/util.rs

quinn-proto/src/config.rs

quinn-proto/src/connection/mod.rs

quinn-proto/src/connection/timer.rs

quinn-proto/src/tests/mod.rs

quinn-proto/src/connection/ack_frequency.rs

quinn-proto/src/tests/mod.rs

aochagavia · 2023-07-06T10:19:53Z

Thanks again for the review! Just rebased and pushed a new version with all your comments addressed.

Sponsored by Stormshield

Ralith

LGTM, thank you for seeing this through!

djc · 2023-07-24T12:08:20Z

Sorry for taking so long to review this, thanks for all the work!

aochagavia · 2023-07-24T12:14:38Z

No worries, thank you and @Ralith for creating and maintaining Quinn!

aochagavia force-pushed the delayed-ack branch from 9749859 to 286b8b2 Compare May 3, 2023 11:01

aochagavia mentioned this pull request May 3, 2023

Implement acknowledgement frequency #1543

Closed

aochagavia force-pushed the delayed-ack branch from 286b8b2 to 8ae3678 Compare May 8, 2023 15:35

aochagavia force-pushed the delayed-ack branch from 8ae3678 to 60a8673 Compare May 16, 2023 18:39

aochagavia force-pushed the delayed-ack branch from 60a8673 to 93ee03d Compare May 16, 2023 18:44

Ralith requested changes May 20, 2023

View reviewed changes

aochagavia force-pushed the delayed-ack branch 2 times, most recently from dd2a0df to 8f1bdd6 Compare May 29, 2023 09:19

djc reviewed Jun 12, 2023

View reviewed changes

aochagavia force-pushed the delayed-ack branch 2 times, most recently from 1bf73f9 to 13216ce Compare June 16, 2023 15:07

aochagavia force-pushed the delayed-ack branch from 13216ce to 8244ded Compare June 22, 2023 09:26

djc reviewed Jun 22, 2023

View reviewed changes

quinn-proto/src/connection/packet_crypto.rs Outdated Show resolved Hide resolved

quinn-proto/src/connection/mod.rs Outdated Show resolved Hide resolved

quinn-proto/src/connection/mod.rs Outdated Show resolved Hide resolved

quinn-proto/src/packet.rs Outdated Show resolved Hide resolved

aochagavia force-pushed the delayed-ack branch from 8244ded to 2024126 Compare June 22, 2023 10:54

Ralith requested changes Jul 2, 2023

View reviewed changes

aochagavia force-pushed the delayed-ack branch from 2024126 to 16b5822 Compare July 6, 2023 10:17

Implement acknowledgement frequency

6b0aea4

Sponsored by Stormshield

aochagavia added 3 commits July 6, 2023 12:22

Extract unprotect_header function

52a3a8b

Sponsored by Stormshield

Extract decrypt_packet_body function

4779d6f

Sponsored by Stormshield

Ensure ack delay is correctly set

1d8b077

Sponsored by Stormshield

aochagavia force-pushed the delayed-ack branch from 16b5822 to 1d8b077 Compare July 6, 2023 10:22

Ralith approved these changes Jul 7, 2023

View reviewed changes

djc merged commit 0f47d35 into quinn-rs:main Jul 24, 2023

aochagavia deleted the delayed-ack branch July 24, 2023 12:14

link2xt mentioned this pull request Apr 12, 2024

Connections initially have a bad RTT, take too long to settle down. n0-computer/iroh#2176

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement acknowledgement frequency #1553

Implement acknowledgement frequency #1553

aochagavia commented May 3, 2023

aochagavia commented May 3, 2023 •

edited

Loading

aochagavia commented May 3, 2023 •

edited

Loading

aochagavia commented May 8, 2023 •

edited

Loading

Ralith commented May 8, 2023 •

edited

Loading

aochagavia commented May 9, 2023

djc commented May 9, 2023

aochagavia commented May 16, 2023

Ralith commented May 19, 2023

Ralith left a comment •

edited

Loading

aochagavia commented May 29, 2023

aochagavia commented Jun 16, 2023

aochagavia commented Jun 22, 2023

aochagavia commented Jun 29, 2023

Ralith left a comment •

edited

Loading

aochagavia commented Jul 6, 2023

Ralith left a comment

djc commented Jul 24, 2023

aochagavia commented Jul 24, 2023

Implement acknowledgement frequency #1553

Implement acknowledgement frequency #1553

Conversation

aochagavia commented May 3, 2023

aochagavia commented May 3, 2023 • edited Loading

Practical details

Conclusion

Baseline (before this PR)

ACK frequency disabled (threshold = 1)

ACK frequency enabled on the client (threshold = 10)

ACK frequency enabled on the server (threshold = 10)

aochagavia commented May 3, 2023 • edited Loading

aochagavia commented May 8, 2023 • edited Loading

Results

Ralith commented May 8, 2023 • edited Loading

aochagavia commented May 9, 2023

djc commented May 9, 2023

aochagavia commented May 16, 2023

Ralith commented May 19, 2023

Ralith left a comment • edited Loading

Choose a reason for hiding this comment

aochagavia commented May 29, 2023

aochagavia commented Jun 16, 2023

aochagavia commented Jun 22, 2023

aochagavia commented Jun 29, 2023

Ralith left a comment • edited Loading

Choose a reason for hiding this comment

aochagavia commented Jul 6, 2023

Ralith left a comment

Choose a reason for hiding this comment

djc commented Jul 24, 2023

aochagavia commented Jul 24, 2023

aochagavia commented May 3, 2023 •

edited

Loading

aochagavia commented May 3, 2023 •

edited

Loading

aochagavia commented May 8, 2023 •

edited

Loading

Ralith commented May 8, 2023 •

edited

Loading

Ralith left a comment •

edited

Loading

Ralith left a comment •

edited

Loading