-
-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement acknowledgement frequency #1553
Conversation
And... here are some benchmarks! Practical detailsI'm running the server using For the scenarios with ACK frequency enabled I'm doing so through the following code: let mut ack_freq = AckFrequencyConfig::default();
ack_freq.ack_eliciting_threshold(10);
transport.ack_frequency_config(Some(ack_freq)); ConclusionThere seems to be no significant performance difference in the benchmarks, though we probably need a more statistically rigorous approach to benchmarking if we want to be sure. @stormshield-damiend suggested we might see clearer differences in performance if there is higher latency (instead of it being in the order of microseconds), so I'll benchmark again next week with a more realistic latency. Baseline (before this PR)
ACK frequency disabled (threshold = 1)
ACK frequency enabled on the client (threshold = 10)
ACK frequency enabled on the server (threshold = 10)
|
How do you feel about the benchmark results? It'd be nice to be able to measure things in a more principled way, to ensure any performance difference (or lack thereof) is statistically significant. Maybe it'd make sense to simulate the networking part in memory (I have a semi-working local prototype), to make the output more predictable. I probably won't have the time to work on that, though... |
Today I did another round of benchmarks using the changes from #1559, so I could simulate the link's delay (using Command used: I ran the benchmarks on top of the main branch (equivalent to an ack-eliciting threshold of 0), and on top of this branch (using an ack-eliciting threshold of 1 and 10). In all cases, throughput remained similar, so I'm only reporting it a single time here, for each benchmarked link delay. I assume throughput is being constrained by pacing. It would be interesting to tweak the config to start right away at a high transfer rate, because that would put real stress on Quinn and thereby reveal the true performance difference. I tried setting the initial congestion window to an absurdly high value, but somehow it didn't make any difference. Any clues on how to achieve higher throughput from the start? @Ralith, @djc Another question: do these results make any sense at all? I am yet to develop a gut feeling for the relationship between delay and throughput. ResultsDelay = 5ms
Delay = 10ms
Delay = 50ms
Delay = 200ms
|
I concur that testing on a clean loopback is probably going to make it difficult to judge the impact of this change.
This should be easy to test by hacking
It's been a while since I've done manual WAN testing, but that seems much too slow. I think we should expect throughput equal to one congestion window per RTT, and the congestion window should grow reasonably quickly unless there's packet loss, which should be impossible in the simulated context. The bandwidth you report seems almost exactly inversely proportional to round trip time, which suggests the congestion window is not growing. I wonder if It should be easy to verify whether the congestion window is growing by reporting It might be interesting to contrast the simulated network behavior in |
As it turns out, throughput was limited by the default send and receive windows. We were increasing the RTT, yet keeping the window sizes constant, so the same amount of traffic had to be spread over a longer period, leading to low throughput. The solution: config.stream_receive_window(VarInt::MAX);
config.send_window(u64::MAX); I re-benchmarked this with the config above, and now we are seeing proper performance. For instance, with delay = 5 and ack-eliciting threshold = 10:
Unfortunately, testing with different ack frequencies and delays seemed to make no difference as regards to performance (on my system). @Ralith @djc all this time I've been working under the assumption that this PR only makes sense if performance improves (otherwise it introduces complexity to the codebase for no benefit). Or do you think it is worthwhile to have this anyway, as long as we can show that performance doesn't worsen? By the way, if you are interested I can prepare a branch you can use to easily benchmark locally, using the loopback interface and introducing delays with |
IIRC this draft was specifically introduced to improve performance, so if we can't reproduce that result there's likely some more digging to do. As such, I think as long as this doesn't introduce a (non-trivial) performance regression, I think it would still be okay to merge? |
Just rebased on top of latest main, to resolve a few conflicts. Any chance this PR can be reviewed soon? I don't think there is anything left to do on my side (please let me know otherwise). |
Will get this in the next day or two; thanks for your patience! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went over everything in the main commit carefully, save for a bit of skimming in the ack_frequency
unit test. Overall this looks great, thank you!
dd2a0df
to
8f1bdd6
Compare
Thanks for the thorough review! Just rebased on top of latest main and pushed a new version that incorporates all suggestions (unless anything slipped through the cracks). |
1bf73f9
to
13216ce
Compare
Just rebased and pushed changes to address some of your suggestions. For those that need more clarification I left some comments and questions |
Thanks for the review! Just rebased an pushed fixes for all points we discussed 😉 Almost there! |
Is there anything else I can do to help land this? Just FYI, after July 7th it will be pretty difficult for me to find time to work on this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good; thanks for your persistence in driving it forward! Nits aside, just a few things left. I'll endeavor to be more responsive in the coming days, and if I don't manage that then I'll address any remaining items myself.
It didn't occur to me at first, but this also fixes #438 independently of support for the extension, which is a nice win.
Thanks again for the review! Just rebased and pushed a new version with all your comments addressed. |
Sponsored by Stormshield
Sponsored by Stormshield
Sponsored by Stormshield
Sponsored by Stormshield
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for seeing this through!
Sorry for taking so long to review this, thanks for all the work! |
No worries, thank you and @Ralith for creating and maintaining Quinn! |
Overview of the feature:
AckFrequencyConfig
to control the parameters, which are sent to the peer in anACK_FREQUENCY
frame at the beginning of the connection (technically, the extension allows modifying the ack frequency parameters multiple times during a connection, but then you would need to come up with an algorithm to automatically tune the parameters, which IMO is out of scope for this first implementation).ACK_FREQUENCY
frame, updating the local ack frequency parameters based on the new parameters sent by the peer.IMMEDIATE_ACK
frame every RTT if there are any unacked ack-eliciting packets (this is a new kind of frame to immediately elicit an ACK). This is recommended by the draft as a way to ensure congestion control and loss detection work properly.I have a bunch of questions / comments:
max_ack_delay
is elapsed, and one to be notified when an RTT is elapsed. I tried to come up with an approach that wouldn't require additional timers, but I'm not sure it is possible. It would be nice if you can double-check that, given your knowledge of the codebase.reset_rtt_timer
function that arms theTimer::Rtt
for the first time. I wanted to arm it right after we have reached the Data space and have an RTT estimate. I ended up calling it insideprocess_payload
. Please let me know if there is a more suitable place.PendingAcks::acks_sent
, which I found difficult to understand. Hopefully the new version says the same in clearer terms (otherwise let me know and I'll update it)