http: add evhttp_connection_set_closecb to avoid g_requests hang #27909

Crypt-iQ · 2023-06-17T15:14:08Z

Prior to this patch, it was possible that if an RPC was interrupted before the reply was received, the success callback evhttp_request_set_on_complete_cb would never be called and g_requests wouldn't be cleaned up. When attempting to shutdown bitcoind, it would hang waiting for g_requests.empty().

Fixes #27722

I am currently working on testing this with a libevent version of 2.2.1 to ensure that the patch also works for that version.

DrahtBot · 2023-06-17T15:14:10Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Approach ACK	stickies-v

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#28551 (http: bugfix: allow server shutdown in case of remote client disconnection by stickies-v)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

fanquake · 2023-06-20T09:17:53Z

cc @fjahr

stickies-v

Concept ACK.

Was able to reproduce the bug, and this fixes the issue for me (libevent 2.1.12, macOS 13.4).

Need to read up on the libevent workaround a bit more first.

promag · 2023-06-20T18:11:54Z

Why I am not co-author?

Crypt-iQ · 2023-06-20T18:16:22Z

Why I am not co-author?

Happy to add you of course since you wrote it -- I couldn't find your email via git log to add it and I wasn't sure how to give credit w/o an email

jonatack · 2023-06-20T18:17:17Z

I'd suggest using @promag's commit authored by him (and then maybe/maybe not an additional commit if any substantive new changes).

Crypt-iQ · 2023-06-20T18:20:09Z

Can you advise on how to do that? I can close this in favor of another and let somebody take it over with proper commit attribution. I opened this PR since it wasn't addressed for a month

jonatack · 2023-06-20T18:21:05Z

git log for me has @promag's email as João Barbosa <joao.paulo.barbosa@gmail.com>

Would be something like git commit --amend --author="João Barbosa <joao.paulo.barbosa@gmail.com>"

jonatack · 2023-06-20T18:21:54Z

or git cherry-pick d41736b

Crypt-iQ · 2023-06-20T18:26:17Z

git log for me has @promag's email as João Barbosa joao.paulo.barbosa@gmail.com

I see, I grepped for promag instead of the name. Anyways, the commit attribution is fixed. Sorry about that @promag

or git cherry-pick d41736b

I didn't do this because that commit only fixes it for certain libevent versions

promag · 2023-06-20T18:51:03Z

@Crypt-iQ No worries mate, thanks for picking this work!

stickies-v

I agree with the approach of calling evhttp_connection_set_closecb regardless of whether EV_READ is enabled or disabled, which is currently the only difference between this PR and #27245.

With this PR:

In the case that EV_READ is enabled, evhttp_connection_set_closecb will be called as soon as the connection is closed, including in case of a remote client disconnection (especially useful during long-running requests as described here), which is the most efficient as we're not wasting server resources on completing the request that won't get served anyway.
In the case that EV_READ is disabled, evhttp_connection_set_closecb will not be called immediately in case of remote client disconnection (because we're not listening for it), but it will still be called when the server is shutting down, still resolving the issue in http: add evhttp_connection_set_closecb to avoid g_requests hang #27909 where bitcoind can't gracefully shutdown but potentially wasting resources on requests of which the connection has been closed.

However, I'm not sure the entire code block to enable EV_READ is safe, as per my comment. I'll continue digging into that but looking forward to hearing other views on this? If it's not safe, I think we need to leave this out and look for alternatives to monitoring remote client disconnections. A fix is suggested in azat/libevent@7d2e2a8, but I can't see it in the 2.2.1-alpha release yet. Even with EV_READ disabled, I still think calling evhttp_connection_set_closecb is a very worthwhile improvement.

src/httpserver.cpp

stickies-v · 2023-06-22T17:54:57Z

src/httpserver.cpp

+            // Enable EV_READ to detect remote disconnection meaning that the close
+            // callback set below is called.
+            bufferevent* bev = evhttp_connection_get_bufferevent(conn);
+            bufferevent_enable(bev, EV_READ);


Is this safe to do? Unless I misunderstand, this effectively disables the bugfix introduced in libevent/libevent@5ff8eb2, which means that if we get new data in our inbound buffer this will put the connection in an invalid state. I'm not sure yet how that would get triggered in practice, so I'm just going off what I read so far as I can't reproduce the race condition reported in #11593.

I wasn't able to reproduce using this #11593 (comment) and since you pointed out that the change wasn't merged into 2.2, I think it's better to err on the side of caution and not enable reading

I've updated the PR and removed this

fjahr · 2023-06-22T22:34:08Z

I opened this PR since it wasn't addressed for a month

Why didn't you ask me first if I am still working on this? A month isn't a long time in this project when you working on multiple things at the same time...

Typically people leave a review comment in a situation like this instead of opening a new PR.

fjahr · 2023-06-23T08:54:43Z

@Crypt-iQ There is no need to close this PR now since it got attention and review. I will close mine if you want to continue. But I wanted to clarify this for the future.

Crypt-iQ · 2023-06-23T08:59:45Z

Ok I reopened, it would be helpful if these things were made explicit (leave a comment) rather than implicit

Crypt-iQ · 2023-06-23T17:56:46Z

@Crypt-iQ There is no need to close this PR now since it got attention and review. I will close mine if you want to continue. But I wanted to clarify this for the future.

I think you can reopen your PR to include the bufferevent read enable. I've updated this PR to include a minimal fix for the issue (without the bufferevent read enable) because I'm still a little unclear on what exactly the bug fix was originally for and I can't reproduce it. Also happy to close this PR in favor of yours since this is now missing the read enable.

…ect version 79d343a http: update libevent workaround to correct version (stickies-v) Pull request description: The libevent bug described in libevent/libevent@5ff8eb2 was already patched in [release-2.1.9-beta](https://github.com/libevent/libevent/releases/tag/release-2.1.9-beta), with cherry-picked commits [5b40744d1581447f5b4496ee8d4807383e468e7a](libevent/libevent@5b40744) and [b25813800f97179b2355a7b4b3557e6a7f568df2](libevent/libevent@b258138). There should be no side-effects by re-applying the workaround on an already patched version of libevent (as is currently done in master for people running libevent between 2.1.9 and 2.1.12), but it is best to just set the correct version number to avoid confusion. This will prevent situations like e.g. in bitcoin/bitcoin#27909 (comment), where a reverse workaround was incorrectly applied to the wrong version range. ACKs for top commit: fanquake: ACK 79d343a Tree-SHA512: 56d2576411cf38e56d0976523fec951e032a48e35af293ed1ef3af820af940b26f779b9197baaed6d8b79bd1c7f7334646b9d73f80610d63cffbc955958ca8a0

stickies-v

Approach ACK 710985d

I think improving remote client disconnect detection is best left for a separate PR; this is a worthwhile improvement on its own and I think uncontroversial.

src/httpserver.cpp

stickies-v · 2023-06-28T12:36:13Z

src/httpserver.cpp

+            auto n{g_requests.erase(req)};
+            if (n == 1 && g_requests.empty()) g_requests_cv.notify_all();


I'm curious why we make notify_all() dependent on g_requests.empty() here, when in all other places we notify for all added and removed events?

Suggested change

auto n{g_requests.erase(req)};

if (n == 1 && g_requests.empty()) g_requests_cv.notify_all();

if (g_requests.erase(req)) g_requests_cv.notify_all();

I think because the only call to g_requests_cv.wait(...) has a predicate that waits for g_requests.empty(). I've removed the dependency in the latest patch

79d343a http: update libevent workaround to correct version (stickies-v) Pull request description: The libevent bug described in libevent/libevent@5ff8eb2 was already patched in [release-2.1.9-beta](https://github.com/libevent/libevent/releases/tag/release-2.1.9-beta), with cherry-picked commits [5b40744d1581447f5b4496ee8d4807383e468e7a](libevent/libevent@5b40744) and [b25813800f97179b2355a7b4b3557e6a7f568df2](libevent/libevent@b258138). There should be no side-effects by re-applying the workaround on an already patched version of libevent (as is currently done in master for people running libevent between 2.1.9 and 2.1.12), but it is best to just set the correct version number to avoid confusion. This will prevent situations like e.g. in bitcoin#27909 (comment), where a reverse workaround was incorrectly applied to the wrong version range. ACKs for top commit: fanquake: ACK 79d343a Tree-SHA512: 56d2576411cf38e56d0976523fec951e032a48e35af293ed1ef3af820af940b26f779b9197baaed6d8b79bd1c7f7334646b9d73f80610d63cffbc955958ca8a0

Prior to this patch, it was possible that if an RPC was interrupted before the reply was received, the success callback evhttp_request_set_on_complete_cb would never be called and g_requests wouldn't be cleaned up. When attempting to shutdown bitcoind, it would hang waiting for g_requests.empty(). Co-authored-by: Fabian Jahr <fjahr@protonmail.com>

Crypt-iQ · 2023-07-14T00:33:34Z

I've removed the assert(n == 1) in evhttp_request_set_on_complete_cb because of this comment: #27722 (comment). It appears that both evhttp_connection_set_closecb and evhttp_request_set_on_complete_cb can run which causes a crash. I wasn't able to reproduce it and still want to know what exactly happened, but I think removing the assert here is certainly better than having a crash.

Crypt-iQ · 2023-07-19T16:39:00Z

This assumes that a connection can never have more than 1 request, and I'm not sure if that's the case.

I'm also not sure. This PR body leads me to believe that it is possible to have more than 1 request per connection: libevent/libevent#592 (comment) , however I'm not sure whether it's possible for them to be interleaved and cause problems with your connection-based patch. Something I'll try to investigate.

Crypt-iQ · 2023-07-21T14:14:26Z

It is possible for a connection to have more than one request (see: libevent/libevent#1383 (comment)) or evhttp_associate_new_request_with_connection in the libevent code. It appears though that the requests are sequential so when one completes the next one can run. I modified the python script to test this, but it doesn't lead to a deadlock or crash because when the next request comes in, it uses the same evhttp_connection* and so the insert into g_http_conns is a no-op. The next request will also update the callback on the evhttp_connection* via evhttp_connection_set_closecb so that only one callback is run. When the connection is finally closed, it will find the connection & delete it. I think this means your patch works, but I'll continue to investigate a little more

python script

import http.client
import base64
from concurrent.futures import ThreadPoolExecutor
import time

host = "localhost"
port = 18443

auth_string = base64.b64encode('user:pass'.encode()).decode()

num_threads =  1
iterations_per_thread = 1

def send_requests(i):
    for _ in range(iterations_per_thread):
        conn = http.client.HTTPConnection(host, port)
        conn.request("POST", "/", headers={"Keep-Alive": "timeout=500, max=1000"})
        
        #response = conn.getresponse()
        #conn.close()

        time.sleep(3)

        conn.request("POST", "/", headers={"Keep-Alive": "timeout=500, max=1000"})

        time.sleep(100)

with ThreadPoolExecutor(max_workers=num_threads) as executor:
    executor.map(send_requests, range(num_threads))

Crypt-iQ · 2023-08-04T02:03:17Z

I haven't been able to cause a crash with the connection-based code. Given that the connection-based code is different than the crashing code here, do you want to open a new PR with the connection-based code @stickies-v ?

fanquake · 2023-09-15T09:12:40Z

What is that status of this? cc @stickies-v.

vasild · 2023-09-19T08:12:12Z

libevent/libevent#78 is relevant. Need to check how libevent was fixed to detect client close and make use of that in bitcoind.

stickies-v · 2023-09-19T10:32:41Z

Need to check how libevent was fixed to detect client close and make use of that in bitcoind.

As far as I could find, it's not yet fixed but it's also largely orthogonal, see my comment here and this open libevent issue. Detecting a remote disconnect would help save resources by not completing a request we'll no longer be serving, but isn't necessary to avoid the hangs on shutdown.

Introduces and uses a HTTPRequestTracker class to keep track of how many HTTP requests are currently active, so we don't stop the server before they're all handled. This has two purposes: 1. In a next commit, allows us to untrack all requests associated with a connection without running into lifetime issues of the connection living longer than the request (see bitcoin#27909 (comment)) 2. Improve encapsulation by making the mutex and cv internal members, and exposing just the WaitUntilEmpty() method that can be safely used.

fanquake · 2023-10-02T09:37:44Z

Going to mark this as draft for now, given that #28551 is open.

Introduces and uses a HTTPRequestTracker class to keep track of how many HTTP requests are currently active, so we don't stop the server before they're all handled. This has two purposes: 1. In a next commit, allows us to untrack all requests associated with a connection without running into lifetime issues of the connection living longer than the request (see bitcoin#27909 (comment)) 2. Improve encapsulation by making the mutex and cv internal members, and exposing just the WaitUntilEmpty() method that can be safely used.

…ase of remote client disconnection 68f23f5 http: bugfix: track closed connection (stickies-v) 084d037 http: log connection instead of request count (stickies-v) 41f9027 http: refactor: use encapsulated HTTPRequestTracker (stickies-v) Pull request description: #26742 significantly increased the http server shutdown speed, but also introduced a bug (#27722 - see bitcoin/bitcoin#27722 (comment) for steps to reproduce on master) that causes http server shutdown to halt in case of a remote client disconnection. This happens because `evhttp_request_set_on_complete_cb` is never called and thus the request is never removed from `g_requests`. This PR fixes that bug, and improves robustness of the code by encapsulating the request tracking logic. Earlier approaches (#27909, #27245, #19434) attempted to resolve this but [imo are fundamentally unsafe](bitcoin/bitcoin#27909 (comment)) because of differences in lifetime between an `evhttp_request` and `evhttp_connection`. We don't need to keep track of open requests or connections, we just [need to ensure](bitcoin/bitcoin#19420 (comment)) that there are no active requests on server shutdown. Because a connection can have multiple requests, and a request can be completed in various ways (the request actually being handled, or the client performing a remote disconnect), keeping a counter per connection seems like the approach with the least overhead to me. Fixes #27722 ACKs for top commit: vasild: ACK 68f23f5 theStack: ACK 68f23f5 Tree-SHA512: dfa711ff55ec75ba44d73e9e6fac16b0be25cf3c20868c2145a844a7878ad9fc6998d9ff62d72f3a210bfa411ef03d3757b73d68a7c22926e874c421e51444d6

Introduces and uses a HTTPRequestTracker class to keep track of how many HTTP requests are currently active, so we don't stop the server before they're all handled. This has two purposes: 1. In a next commit, allows us to untrack all requests associated with a connection without running into lifetime issues of the connection living longer than the request (see bitcoin#27909 (comment)) 2. Improve encapsulation by making the mutex and cv internal members, and exposing just the WaitUntilEmpty() method that can be safely used. Github-Pull: bitcoin#28551 Rebased-From: 41f9027

stickies-v · 2023-10-04T09:50:47Z

Superseded by #28551, can be closed now.

fanquake · 2023-10-04T09:51:58Z

Closing for now.

…emote client disconnection 68f23f5 http: bugfix: track closed connection (stickies-v) 084d037 http: log connection instead of request count (stickies-v) 41f9027 http: refactor: use encapsulated HTTPRequestTracker (stickies-v) Pull request description: bitcoin#26742 significantly increased the http server shutdown speed, but also introduced a bug (bitcoin#27722 - see bitcoin#27722 (comment) for steps to reproduce on master) that causes http server shutdown to halt in case of a remote client disconnection. This happens because `evhttp_request_set_on_complete_cb` is never called and thus the request is never removed from `g_requests`. This PR fixes that bug, and improves robustness of the code by encapsulating the request tracking logic. Earlier approaches (bitcoin#27909, bitcoin#27245, bitcoin#19434) attempted to resolve this but [imo are fundamentally unsafe](bitcoin#27909 (comment)) because of differences in lifetime between an `evhttp_request` and `evhttp_connection`. We don't need to keep track of open requests or connections, we just [need to ensure](bitcoin#19420 (comment)) that there are no active requests on server shutdown. Because a connection can have multiple requests, and a request can be completed in various ways (the request actually being handled, or the client performing a remote disconnect), keeping a counter per connection seems like the approach with the least overhead to me. Fixes bitcoin#27722 ACKs for top commit: vasild: ACK 68f23f5 theStack: ACK 68f23f5 Tree-SHA512: dfa711ff55ec75ba44d73e9e6fac16b0be25cf3c20868c2145a844a7878ad9fc6998d9ff62d72f3a210bfa411ef03d3757b73d68a7c22926e874c421e51444d6

Introduces and uses a HTTPRequestTracker class to keep track of how many HTTP requests are currently active, so we don't stop the server before they're all handled. This has two purposes: 1. In a next commit, allows us to untrack all requests associated with a connection without running into lifetime issues of the connection living longer than the request (see bitcoin/bitcoin#27909 (comment)) 2. Improve encapsulation by making the mutex and cv internal members, and exposing just the WaitUntilEmpty() method that can be safely used. Github-Pull: #28551 Rebased-From: 41f9027

Introduces and uses a HTTPRequestTracker class to keep track of how many HTTP requests are currently active, so we don't stop the server before they're all handled. This has two purposes: 1. In a next commit, allows us to untrack all requests associated with a connection without running into lifetime issues of the connection living longer than the request (see bitcoin/bitcoin#27909 (comment)) 2. Improve encapsulation by making the mutex and cv internal members, and exposing just the WaitUntilEmpty() method that can be safely used.

Introduces and uses a HTTPRequestTracker class to keep track of how many HTTP requests are currently active, so we don't stop the server before they're all handled. This has two purposes: 1. In a next commit, allows us to untrack all requests associated with a connection without running into lifetime issues of the connection living longer than the request (see bitcoin/bitcoin#27909 (comment)) 2. Improve encapsulation by making the mutex and cv internal members, and exposing just the WaitUntilEmpty() method that can be safely used. Github-Pull: #28551 Rebased-From: 41f9027813f849a9fd6a1479bbb74b9037990c3c

DrahtBot added the RPC/REST/ZMQ label Jun 17, 2023

DrahtBot mentioned this pull request Jun 17, 2023

http: Detect remote disconnect - 2nd attempt #27245

Closed

stickies-v reviewed Jun 20, 2023

View reviewed changes

Crypt-iQ force-pushed the httpserver_fix_wo_fu branch 2 times, most recently from ebe2df3 to 1296038 Compare June 20, 2023 18:24

stickies-v reviewed Jun 22, 2023

View reviewed changes

Crypt-iQ closed this Jun 23, 2023

Crypt-iQ reopened this Jun 23, 2023

stickies-v mentioned this pull request Jun 23, 2023

http: update libevent workaround to correct version #27949

Merged

Crypt-iQ force-pushed the httpserver_fix_wo_fu branch from 1296038 to 710985d Compare June 23, 2023 17:53

stickies-v reviewed Jun 28, 2023

View reviewed changes

Crypt-iQ force-pushed the httpserver_fix_wo_fu branch from 710985d to d170e6c Compare July 14, 2023 00:30

DrahtBot added the CI failed label Sep 15, 2023

DrahtBot removed the CI failed label Sep 26, 2023

stickies-v mentioned this pull request Sep 29, 2023

http: bugfix: allow server shutdown in case of remote client disconnection #28551

Merged

fanquake removed this from the 25.1 milestone Oct 2, 2023

fanquake marked this pull request as draft October 2, 2023 09:37

fanquake closed this Oct 4, 2023

bitcoin locked and limited conversation to collaborators Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

http: add evhttp_connection_set_closecb to avoid g_requests hang #27909

http: add evhttp_connection_set_closecb to avoid g_requests hang #27909

Crypt-iQ commented Jun 17, 2023 •

edited

Loading

DrahtBot commented Jun 17, 2023 •

edited

Loading

fanquake commented Jun 20, 2023

stickies-v left a comment

promag commented Jun 20, 2023

Crypt-iQ commented Jun 20, 2023

jonatack commented Jun 20, 2023

Crypt-iQ commented Jun 20, 2023 •

edited

Loading

jonatack commented Jun 20, 2023

jonatack commented Jun 20, 2023

Crypt-iQ commented Jun 20, 2023 •

edited

Loading

promag commented Jun 20, 2023

stickies-v left a comment •

edited

Loading

stickies-v Jun 22, 2023

Crypt-iQ Jun 22, 2023

Crypt-iQ Jun 23, 2023

fjahr commented Jun 22, 2023 •

edited

Loading

fjahr commented Jun 23, 2023

Crypt-iQ commented Jun 23, 2023

Crypt-iQ commented Jun 23, 2023

stickies-v left a comment

stickies-v Jun 28, 2023

Crypt-iQ Jul 14, 2023

Crypt-iQ commented Jul 14, 2023

Crypt-iQ commented Jul 19, 2023

Crypt-iQ commented Jul 21, 2023

Crypt-iQ commented Aug 4, 2023

fanquake commented Sep 15, 2023

vasild commented Sep 19, 2023

stickies-v commented Sep 19, 2023

fanquake commented Oct 2, 2023

stickies-v commented Oct 4, 2023

fanquake commented Oct 4, 2023

		auto n{g_requests.erase(req)};
		if (n == 1 && g_requests.empty()) g_requests_cv.notify_all();

	auto n{g_requests.erase(req)};
	if (n == 1 && g_requests.empty()) g_requests_cv.notify_all();
	if (g_requests.erase(req)) g_requests_cv.notify_all();

http: add evhttp_connection_set_closecb to avoid g_requests hang #27909

http: add evhttp_connection_set_closecb to avoid g_requests hang #27909

Conversation

Crypt-iQ commented Jun 17, 2023 • edited Loading

DrahtBot commented Jun 17, 2023 • edited Loading

Reviews

Conflicts

fanquake commented Jun 20, 2023

stickies-v left a comment

Choose a reason for hiding this comment

promag commented Jun 20, 2023

Crypt-iQ commented Jun 20, 2023

jonatack commented Jun 20, 2023

Crypt-iQ commented Jun 20, 2023 • edited Loading

jonatack commented Jun 20, 2023

jonatack commented Jun 20, 2023

Crypt-iQ commented Jun 20, 2023 • edited Loading

promag commented Jun 20, 2023

stickies-v left a comment • edited Loading

Choose a reason for hiding this comment

stickies-v Jun 22, 2023

Choose a reason for hiding this comment

Crypt-iQ Jun 22, 2023

Choose a reason for hiding this comment

Crypt-iQ Jun 23, 2023

Choose a reason for hiding this comment

fjahr commented Jun 22, 2023 • edited Loading

fjahr commented Jun 23, 2023

Crypt-iQ commented Jun 23, 2023

Crypt-iQ commented Jun 23, 2023

stickies-v left a comment

Choose a reason for hiding this comment

stickies-v Jun 28, 2023

Choose a reason for hiding this comment

Crypt-iQ Jul 14, 2023

Choose a reason for hiding this comment

Crypt-iQ commented Jul 14, 2023

Crypt-iQ commented Jul 19, 2023

Crypt-iQ commented Jul 21, 2023

Crypt-iQ commented Aug 4, 2023

fanquake commented Sep 15, 2023

vasild commented Sep 19, 2023

stickies-v commented Sep 19, 2023

fanquake commented Oct 2, 2023

stickies-v commented Oct 4, 2023

fanquake commented Oct 4, 2023

Crypt-iQ commented Jun 17, 2023 •

edited

Loading

DrahtBot commented Jun 17, 2023 •

edited

Loading

Crypt-iQ commented Jun 20, 2023 •

edited

Loading

Crypt-iQ commented Jun 20, 2023 •

edited

Loading

stickies-v left a comment •

edited

Loading

fjahr commented Jun 22, 2023 •

edited

Loading