Skip to content

Conversation

@shinrich
Copy link
Member

@shinrich shinrich commented Feb 7, 2020

Breaking apart the PR in #6401. This one includes the logic to better ensure that the server_session is closed and deleted in a timely manner. The crash we were seeing was due to a server_session staying around after the HttpSM had been deleted. The inactivity cop would execute the handleEvent on the deleted HttpSM continuation.

@shinrich shinrich added this to the 10.0.0 milestone Feb 7, 2020
@shinrich shinrich self-assigned this Feb 7, 2020
tunnel.deallocate_buffers();
tunnel.reset();
if (server_entry) {
server_entry->in_tunnel = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? It looks like this is done in line 5421.

SMDebug("http_tunnel", "send 408 response to client to vc %p, tunnel vc %p", ua_txn->get_netvc(), p->vc);

tunnel.chain_abort_all(p);
server_session = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the crash will be fixed by setting server_entry->in_tunnel = false here.
I'm a bit afraid that this PR is too generic for fixing a crash.

@masaori335
Copy link
Contributor

masaori335 commented Feb 10, 2020

Unfortunately, I still get the crash with this patch. Have you tried this with --trailer? Am I missing something?

nghttp -v --no-dep -s "https://127.0.0.1:4443/post/" -d /usr/local/var/www/128kb --trailer 'foo: bar'
CONFIG proxy.config.http.transaction_no_activity_timeout_in INT 3
CONFIG proxy.config.http.transaction_active_timeout_out INT 5

@shinrich
Copy link
Member Author

We also needed PR #6407 to fix the crash. Perhaps that is all we needed. Will do some experiments today with combinations of this PR #6407 and PR #6404

@shinrich
Copy link
Member Author

So far, our 9.0.x plus PR #6407 and PR #6404 is not crashing. Our crashes were intermittent, so I'll keep an eye on things today.

Unfortunately, PR #6407 and #6404 do not play together well. With the fix for #6407, the read_complete has already been sent for the post data so the HttpSM handler is state_watch_for_client_abort when the second trailer READ_COMPLETE is sent. This causes the transaction to fail and the nghttp request is responded with a 502 error.

In our case, the faulty clients are not sending trailing headers, so just fixing it in the trailing header logic is not sufficient.

@masaori335
Copy link
Contributor

masaori335 commented Feb 10, 2020

So far, our 9.0.x plus PR #6407 and PR #6404 is not crashing.

Do you mean #6407 and #6408 ?
#6407 will hide the issue which uncovered by the commit.

The reason why I asked trying nghttp with trailers is that this patch should fix the crash without #6407 nor #6404 changes.

For trailers, we should not signal READ_COMPLETE on final DATA frame. I can do that on #6404, after #6407 is merged.

@shinrich
Copy link
Member Author

I talked with @bryancall today, and since the according to the spec we should really be returning a a HTTP/2 error to the client if they are not sending a data frame EOS. I didn't get time today, but I would like to spend some more time tomorrow correctly identifying and cleaning up this case. While PR #6408 avoids the crash, it does not return the error to the client.

@shinrich
Copy link
Member Author

I think we can take this one out of the mix

@shinrich shinrich closed this Feb 21, 2020
@zwoop zwoop removed this from the 10.0.0 milestone Feb 21, 2020
@zwoop
Copy link
Contributor

zwoop commented Feb 21, 2020

Remember to clear Milestone / Project when closing a PR without merging :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants