-
Notifications
You must be signed in to change notification settings - Fork 848
Fix HTTP/2 Buffering Issues #11143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix HTTP/2 Buffering Issues #11143
Conversation
Is it for the case that there is long pauses during data transfer but the stream is not tunneling? I'm not keen on adding the new event for retransmission, and wondering if we can use the flush flag instead. Specifically:
I'm not sure if this really works. It may kill the thresholds if the origin/cache is slow and the reader always becomes empty. The current code for the retransmission event seems to cause a crash, if the session is destroyed before the event gets processed, although the chance is low. The scheduled event probably needs to be canceled when |
9660d90 to
82f7121
Compare
|
Thank you for the thoughts.
Definitely not for the tunneling case - we don't attempt any buffering when tunneling. What we saw was not actually long pauses, but kind of the opposite. An internal property had to turn the write buffering off because they had many spread out but very small DATA frames which didn't exceed the write buffer threshold, but the write timeout never appropriately triggered after 10ms to flush it out. This kind of "many small DATA frames" is what the new autest reproduces. Before this patch, the frames did eventually get flushed when transactions completed since we force flush on END_STREAM (and probably other cases), but their perceived experience was "jerky" rather than smooth. This patch makes it more certain that we wake up to send the data. It is possible, btw, that this is an HTTP/2 to origin only issue. I believe the property was using h2 on the origin side. But I'm not sure and unfortunately it has been a few months since I helped them with the issue.
Yes, thank you for this feedback. That's the conclusion I came to as well and changed the logic to store the event and cancel when appropriate (both when a flush happens, and when the appropriate owning objects are destroyed). |
|
I still don't understand the problematic case. Are you saying write_time_threshold is not working? As long as WRITE_READY event is issued appropriately, the data in the buffer should be flushed even without END_STREAM flag at the end. I don't get why new event is needed. In other words, the new XMIT event could try to send data where the connection or session is not really WRITE_READY. A possible bug is that WRITE_READY is not scheduled for some reason. Maybe reenable() is not called. |
Yes, sorry if I wasn't clear. To be concise: Without this patch, write_size_threshold can hold up a frame, but write_time_threshold (default of 100ms) is not functionally, or at least not consistenly, taking place. The new autest demonstrates this by sending thousands of small DATA frames, one every ms. The server should get chunks of data at least every write_size_threshold (set to 10ms for the test). Instead, without this patch, it gets a DATA frame or a few at the beginning, then none of the rest till the end.
Calling flush in the new retransmit event will call reenable. Adding this event scheduled in The new event seems like a consistent way to guarantee that the data gets flushed in the appropriate amount of time, within write_time_threshold. But if there's a better way to accomplish this without the event, I'm definitely open to that. Why is the WRITE_READY event guaranteed to be triggered presently about write_time_threshold ms after write_size_threshold holds up a packet? Currently that doesn't happen, but maybe there's something I can do to address that? |
a981c01 to
7f2235d
Compare
7f2235d to
671c222
Compare
|
[approve ci fedora] |
671c222 to
c57a58a
Compare
maskit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked with @moonchen , and now I get why we have to do this. I can live with this change, but my view is that we have to do this because NetVC does not provide a generic way to deal with the situation. We only have the buffering stuff in H2 at this moment, but buffering is not an H2 specific thing and it can happen on any application. I'd say an ideal way to deal with the issue is to add the functionality to NetVC / iocore layer.
|
Cherry-picked to v10.0.x |
When I ported #8747 into HTTP/2 to origin logic, I missed setting the watermark on outgoing connections. This essentially made it so that regardless of configuration, POST requests or the like would always send every little bit of DATA frames regardless of the configured watermark. The watermark was effectively zero on the server side. This patch fixes that by appying the configured watermark. This patch also fixes the HTTP/2 write_time_threshold configuration so it consistently wakes up ATS to transmit the frames held up behidn write_buffer_threshold. If the write_buffer_threshold is not met, then I noticed ATS never woke up to send the accumulated frames until a xmit with a flush was sent at the end of the transaction. This ensures that the frames are transmitted per the write_time_threshold by scheduling an event for the retransmission. Here's a list of changes made in this patch: * Adding a write_threshold test * Set buffer watermark for origin h2 connections * HTTP2_SESSION_EVENT_XMIT -> HTTP2_SESSION_EVENT_PRIO * schedule flushing event for write_time_threshold (cherry picked from commit 66940f5)
When working on the patch for apache#11143, I accidentally left in some Note log messages that should have been cleaned up before I merged in the changes. This removes those noisy Note messages.
When working on the patch for #11143, I accidentally left in some Note log messages that should have been cleaned up before I merged in the changes. This removes those noisy Note messages.
When working on the patch for apache#11143, I accidentally left in some Note log messages that should have been cleaned up before I merged in the changes. This removes those noisy Note messages. (cherry picked from commit c0a42a6)
When I ported #8747 into HTTP/2 to origin logic, I missed setting the watermark on outgoing connections. This essentially made it so that regardless of configuration, POST requests or the like would always send every little bit of DATA frames regardless of the configured watermark. The watermark was effectively zero on the server side. This patch fixes that by appying the configured watermark.
This patch also fixes the HTTP/2 write_time_threshold configuration so it consistently wakes up ATS to transmit the frames held up behidn write_buffer_threshold. If the write_buffer_threshold is not met, then I noticed ATS never woke up to send the accumulated frames until a xmit with a flush was sent at the end of the transaction. This ensures that the frames are transmitted per the write_time_threshold by scheduling an event for the retransmission.
For Review
I've kept the parts of this review in separate commits to hopefully help the review of this.