Skip to content

Conversation

@shinrich
Copy link
Member

Yet another solution to issue #6313.

Unlike PR #6407 and #6401, this PR does not update the H2 logic to send a READ_COMPLETE when the end-of-data bit is set or when the read_vio.ntodo() is 0. Rather if the user agent never sends the end-of-data bit, the user agent will eventually send a EOS or get timed out and this PR adds logic in the HttpSM to cleanup the server_session without crashing.

While this PR is more spec compliant, our organization may still go with PR #6407 in the short term anyway because we do have user agents that send POST data but do not set the end-of-data bit. While running with this patch on a production box, the number of H2 POST ERR_CLIENT_ABORTS increased by an order of magnitude over a 15 minute logging interval. For us tracking down the appropriate app dev teams takes some time, and getting clients to update apps takes even longer.

You'll noticed in this PR, that I added a boolean return value to Http2Stream::update_read. The idea was that it would return false if ndone > nbytes. Then the caller could issue a stream error because the client was breaking spec by sending more data than indicated by the content-length. However, in many cases nbytes is 0 because after the request header is processed, the read_vio nbytes is set to ndone which at that point is 0. For about an hour of panicked investigation, I thought that @masaori335's post optimization had broken things if the data frame arrived before the Tunnel's do_io_read was all set up. But in fact all is well. The logic in HttpSM::do_setup_post_tunnel will copy extra data from the end of the ua_buffer (which is the _request_buffer for Http2Stream) into the tunnel's buffer, so all is well there anyway.

@shinrich shinrich added this to the 10.0.0 milestone Feb 11, 2020
@shinrich shinrich self-assigned this Feb 11, 2020
@shinrich shinrich requested review from masaori335 and zwoop February 11, 2020 22:37
@randall
Copy link
Contributor

randall commented Feb 11, 2020

[approve ci autest]

Copy link
Contributor

@masaori335 masaori335 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need #6408 too, right?

// Completed successfully
c->write_success = true;
c->write_success = true;
server_entry->in_tunnel = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this single line change will fix the crash. However, calling release_server_session() before setting server_session = nullptr make sense.

Copy link
Contributor

@masaori335 masaori335 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The crash looks gone with this patch, but this doesn't fire up Http2Stream to send 408.

[Feb 12 12:24:29.922] [ET_NET 1] DEBUG: <HttpTunnel.cc:1299 (consumer_handler)> (http_tunnel) [1] consumer_handler [http server post VC_EVENT_WRITE_COMPLETE/TS_EVENT_VCONN_WRITE_COMPLETE]
[Feb 12 12:24:29.922] [ET_NET 1] DEBUG: <HttpSM.cc:3600 (tunnel_handler_post_server)> (http) [1] [&HttpSM::tunnel_handler_post_server, VC_EVENT_WRITE_COMPLETE/TS_EVENT_VCONN_WRITE_COMPLETE]
[Feb 12 12:24:35.510] [ET_NET 1] DEBUG: <HttpSM.cc:2631 (main_handler)> (http) [1] [HttpSM::main_handler, VC_EVENT_ACTIVE_TIMEOUT/TS_EVENT_VCONN_ACTIVE_TIMEOUT]
[Feb 12 12:24:35.510] [ET_NET 1] DEBUG: <HttpSM.cc:2051 (state_send_server_request_header)> (http) [1] [&HttpSM::state_send_server_request_header, VC_EVENT_ACTIVE_TIMEOUT/TS_EVENT_VCONN_ACTIVE_TIMEOUT]
[Feb 12 12:24:35.510] [ET_NET 1] DEBUG: <HttpSM.cc:5496 (handle_server_setup_error)> (http) [1] [&HttpSM::handle_server_setup_error, VC_EVENT_ACTIVE_TIMEOUT/TS_EVENT_VCONN_ACTIVE_TIMEOUT]
[Feb 12 12:24:35.510] [ET_NET 1] DEBUG: <HttpSM.cc:5506 (handle_server_setup_error)> (http) [1] [handle_server_setup_error] forwarding event VC_EVENT_ACTIVE_TIMEOUT/TS_EVENT_VCONN_ACTIVE_TIMEOUT to post tunnel

@masaori335
Copy link
Contributor

If I revert all changes of HTTP/2, I could get 408 and no crash. Updating read_vio.ndone might introduce or uncover an issue.
Let's narrow down changes of this PR to only HttpSM.cc, any HTTP/2 changes is not necessary here.

@masaori335
Copy link
Contributor

While this PR is more spec compliant, our organization may still go with PR #6407 in the short term anyway because we do have user agents that send POST data but do not set the end-of-data bit.
While running with this patch on a production box, the number of H2 POST ERR_CLIENT_ABORTS increased by an order of magnitude over a 15 minute logging interval. For us tracking down the appropriate app dev teams takes some time, and getting clients to update apps takes even longer.

I commented on #6404, I found that ATS ignore "pure END_STREAM" (DATA frame with END_STREAM & length = 0) and VC_EVENT_READ_COMPLETE is not signaled. It's possible that
this bug made some clients aborts. I prefer to polish this PR & land with #6404.

@shinrich shinrich force-pushed the fix-client-without-data-end branch from 07d2d8e to 20e0423 Compare February 25, 2020 17:17
@shinrich
Copy link
Member Author

Updated to address @masaori335 comments. Tested with PR #6404 successfully. Updating issue #6313 with details.

@zwoop zwoop added the OnDocs This is for PR currently running, or will run, on the Docs ATS server label Feb 25, 2020
@masaori335
Copy link
Contributor

[approve ci autest]

Copy link
Contributor

@masaori335 masaori335 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Tested by nghttp with --trailer, no crash.

@shinrich shinrich merged commit 7d4f66b into apache:master Feb 27, 2020
@zwoop zwoop removed the OnDocs This is for PR currently running, or will run, on the Docs ATS server label Feb 27, 2020
@zwoop zwoop modified the milestones: 10.0.0, 9.0.0 Feb 27, 2020
@zwoop
Copy link
Contributor

zwoop commented Feb 27, 2020

Cherry-picked to v9.0.x branch.

@zwoop zwoop modified the milestones: 9.0.0, 8.1.0 Apr 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants