-
Notifications
You must be signed in to change notification settings - Fork 36.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http: add evhttp_connection_set_closecb to avoid g_requests hang #27909
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
cc @fjahr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK.
Was able to reproduce the bug, and this fixes the issue for me (libevent 2.1.12, macOS 13.4).
Need to read up on the libevent workaround a bit more first.
Why I am not co-author? |
Happy to add you of course since you wrote it -- I couldn't find your email via |
I'd suggest using @promag's commit authored by him (and then maybe/maybe not an additional commit if any substantive new changes). |
Can you advise on how to do that? I can close this in favor of another and let somebody take it over with proper commit attribution. I opened this PR since it wasn't addressed for a month |
Would be something like |
or git cherry-pick d41736b |
ebe2df3
to
1296038
Compare
I see, I grepped for
I didn't do this because that commit only fixes it for certain libevent versions |
@Crypt-iQ No worries mate, thanks for picking this work! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the approach of calling evhttp_connection_set_closecb
regardless of whether EV_READ
is enabled or disabled, which is currently the only difference between this PR and #27245.
With this PR:
-
In the case that
EV_READ
is enabled,evhttp_connection_set_closecb
will be called as soon as the connection is closed, including in case of a remote client disconnection (especially useful during long-running requests as described here), which is the most efficient as we're not wasting server resources on completing the request that won't get served anyway. -
In the case that
EV_READ
is disabled,evhttp_connection_set_closecb
will not be called immediately in case of remote client disconnection (because we're not listening for it), but it will still be called when the server is shutting down, still resolving the issue in http: add evhttp_connection_set_closecb to avoid g_requests hang #27909 where bitcoind can't gracefully shutdown but potentially wasting resources on requests of which the connection has been closed.
However, I'm not sure the entire code block to enable EV_READ
is safe, as per my comment. I'll continue digging into that but looking forward to hearing other views on this? If it's not safe, I think we need to leave this out and look for alternatives to monitoring remote client disconnections. A fix is suggested in azat/libevent@7d2e2a8, but I can't see it in the 2.2.1-alpha release yet. Even with EV_READ
disabled, I still think calling evhttp_connection_set_closecb
is a very worthwhile improvement.
src/httpserver.cpp
Outdated
// Enable EV_READ to detect remote disconnection meaning that the close | ||
// callback set below is called. | ||
bufferevent* bev = evhttp_connection_get_bufferevent(conn); | ||
bufferevent_enable(bev, EV_READ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe to do? Unless I misunderstand, this effectively disables the bugfix introduced in libevent/libevent@5ff8eb2, which means that if we get new data in our inbound buffer this will put the connection in an invalid state. I'm not sure yet how that would get triggered in practice, so I'm just going off what I read so far as I can't reproduce the race condition reported in #11593.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to reproduce using this #11593 (comment) and since you pointed out that the change wasn't merged into 2.2, I think it's better to err on the side of caution and not enable reading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the PR and removed this
Why didn't you ask me first if I am still working on this? A month isn't a long time in this project when you working on multiple things at the same time... Typically people leave a review comment in a situation like this instead of opening a new PR. |
@Crypt-iQ There is no need to close this PR now since it got attention and review. I will close mine if you want to continue. But I wanted to clarify this for the future. |
Ok I reopened, it would be helpful if these things were made explicit (leave a comment) rather than implicit |
1296038
to
710985d
Compare
I think you can reopen your PR to include the bufferevent read enable. I've updated this PR to include a minimal fix for the issue (without the bufferevent read enable) because I'm still a little unclear on what exactly the bug fix was originally for and I can't reproduce it. Also happy to close this PR in favor of yours since this is now missing the read enable. |
…ect version 79d343a http: update libevent workaround to correct version (stickies-v) Pull request description: The libevent bug described in libevent/libevent@5ff8eb2 was already patched in [release-2.1.9-beta](https://github.com/libevent/libevent/releases/tag/release-2.1.9-beta), with cherry-picked commits [5b40744d1581447f5b4496ee8d4807383e468e7a](libevent/libevent@5b40744) and [b25813800f97179b2355a7b4b3557e6a7f568df2](libevent/libevent@b258138). There should be no side-effects by re-applying the workaround on an already patched version of libevent (as is currently done in master for people running libevent between 2.1.9 and 2.1.12), but it is best to just set the correct version number to avoid confusion. This will prevent situations like e.g. in bitcoin/bitcoin#27909 (comment), where a reverse workaround was incorrectly applied to the wrong version range. ACKs for top commit: fanquake: ACK 79d343a Tree-SHA512: 56d2576411cf38e56d0976523fec951e032a48e35af293ed1ef3af820af940b26f779b9197baaed6d8b79bd1c7f7334646b9d73f80610d63cffbc955958ca8a0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK 710985d
I think improving remote client disconnect detection is best left for a separate PR; this is a worthwhile improvement on its own and I think uncontroversial.
src/httpserver.cpp
Outdated
auto n{g_requests.erase(req)}; | ||
if (n == 1 && g_requests.empty()) g_requests_cv.notify_all(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why we make notify_all()
dependent on g_requests.empty()
here, when in all other places we notify for all added and removed events?
auto n{g_requests.erase(req)}; | |
if (n == 1 && g_requests.empty()) g_requests_cv.notify_all(); | |
if (g_requests.erase(req)) g_requests_cv.notify_all(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think because the only call to g_requests_cv.wait(...)
has a predicate that waits for g_requests.empty()
. I've removed the dependency in the latest patch
79d343a http: update libevent workaround to correct version (stickies-v) Pull request description: The libevent bug described in libevent/libevent@5ff8eb2 was already patched in [release-2.1.9-beta](https://github.com/libevent/libevent/releases/tag/release-2.1.9-beta), with cherry-picked commits [5b40744d1581447f5b4496ee8d4807383e468e7a](libevent/libevent@5b40744) and [b25813800f97179b2355a7b4b3557e6a7f568df2](libevent/libevent@b258138). There should be no side-effects by re-applying the workaround on an already patched version of libevent (as is currently done in master for people running libevent between 2.1.9 and 2.1.12), but it is best to just set the correct version number to avoid confusion. This will prevent situations like e.g. in bitcoin#27909 (comment), where a reverse workaround was incorrectly applied to the wrong version range. ACKs for top commit: fanquake: ACK 79d343a Tree-SHA512: 56d2576411cf38e56d0976523fec951e032a48e35af293ed1ef3af820af940b26f779b9197baaed6d8b79bd1c7f7334646b9d73f80610d63cffbc955958ca8a0
Prior to this patch, it was possible that if an RPC was interrupted before the reply was received, the success callback evhttp_request_set_on_complete_cb would never be called and g_requests wouldn't be cleaned up. When attempting to shutdown bitcoind, it would hang waiting for g_requests.empty(). Co-authored-by: Fabian Jahr <fjahr@protonmail.com>
710985d
to
d170e6c
Compare
I've removed the |
I'm also not sure. This PR body leads me to believe that it is possible to have more than 1 request per connection: libevent/libevent#592 (comment) , however I'm not sure whether it's possible for them to be interleaved and cause problems with your connection-based patch. Something I'll try to investigate. |
It is possible for a connection to have more than one request (see: libevent/libevent#1383 (comment)) or python scriptimport http.client
import base64
from concurrent.futures import ThreadPoolExecutor
import time
host = "localhost"
port = 18443
auth_string = base64.b64encode('user:pass'.encode()).decode()
num_threads = 1
iterations_per_thread = 1
def send_requests(i):
for _ in range(iterations_per_thread):
conn = http.client.HTTPConnection(host, port)
conn.request("POST", "/", headers={"Keep-Alive": "timeout=500, max=1000"})
#response = conn.getresponse()
#conn.close()
time.sleep(3)
conn.request("POST", "/", headers={"Keep-Alive": "timeout=500, max=1000"})
time.sleep(100)
with ThreadPoolExecutor(max_workers=num_threads) as executor:
executor.map(send_requests, range(num_threads)) |
I haven't been able to cause a crash with the connection-based code. Given that the connection-based code is different than the crashing code here, do you want to open a new PR with the connection-based code @stickies-v ? |
What is that status of this? cc @stickies-v. |
libevent/libevent#78 is relevant. Need to check how libevent was fixed to detect client close and make use of that in |
As far as I could find, it's not yet fixed but it's also largely orthogonal, see my comment here and this open libevent issue. Detecting a remote disconnect would help save resources by not completing a request we'll no longer be serving, but isn't necessary to avoid the hangs on shutdown. |
Introduces and uses a HTTPRequestTracker class to keep track of how many HTTP requests are currently active, so we don't stop the server before they're all handled. This has two purposes: 1. In a next commit, allows us to untrack all requests associated with a connection without running into lifetime issues of the connection living longer than the request (see bitcoin#27909 (comment)) 2. Improve encapsulation by making the mutex and cv internal members, and exposing just the WaitUntilEmpty() method that can be safely used.
Introduces and uses a HTTPRequestTracker class to keep track of how many HTTP requests are currently active, so we don't stop the server before they're all handled. This has two purposes: 1. In a next commit, allows us to untrack all requests associated with a connection without running into lifetime issues of the connection living longer than the request (see bitcoin#27909 (comment)) 2. Improve encapsulation by making the mutex and cv internal members, and exposing just the WaitUntilEmpty() method that can be safely used.
Introduces and uses a HTTPRequestTracker class to keep track of how many HTTP requests are currently active, so we don't stop the server before they're all handled. This has two purposes: 1. In a next commit, allows us to untrack all requests associated with a connection without running into lifetime issues of the connection living longer than the request (see bitcoin#27909 (comment)) 2. Improve encapsulation by making the mutex and cv internal members, and exposing just the WaitUntilEmpty() method that can be safely used.
Introduces and uses a HTTPRequestTracker class to keep track of how many HTTP requests are currently active, so we don't stop the server before they're all handled. This has two purposes: 1. In a next commit, allows us to untrack all requests associated with a connection without running into lifetime issues of the connection living longer than the request (see bitcoin#27909 (comment)) 2. Improve encapsulation by making the mutex and cv internal members, and exposing just the WaitUntilEmpty() method that can be safely used.
Going to mark this as draft for now, given that #28551 is open. |
Introduces and uses a HTTPRequestTracker class to keep track of how many HTTP requests are currently active, so we don't stop the server before they're all handled. This has two purposes: 1. In a next commit, allows us to untrack all requests associated with a connection without running into lifetime issues of the connection living longer than the request (see bitcoin#27909 (comment)) 2. Improve encapsulation by making the mutex and cv internal members, and exposing just the WaitUntilEmpty() method that can be safely used.
Introduces and uses a HTTPRequestTracker class to keep track of how many HTTP requests are currently active, so we don't stop the server before they're all handled. This has two purposes: 1. In a next commit, allows us to untrack all requests associated with a connection without running into lifetime issues of the connection living longer than the request (see bitcoin#27909 (comment)) 2. Improve encapsulation by making the mutex and cv internal members, and exposing just the WaitUntilEmpty() method that can be safely used.
Introduces and uses a HTTPRequestTracker class to keep track of how many HTTP requests are currently active, so we don't stop the server before they're all handled. This has two purposes: 1. In a next commit, allows us to untrack all requests associated with a connection without running into lifetime issues of the connection living longer than the request (see bitcoin#27909 (comment)) 2. Improve encapsulation by making the mutex and cv internal members, and exposing just the WaitUntilEmpty() method that can be safely used.
Introduces and uses a HTTPRequestTracker class to keep track of how many HTTP requests are currently active, so we don't stop the server before they're all handled. This has two purposes: 1. In a next commit, allows us to untrack all requests associated with a connection without running into lifetime issues of the connection living longer than the request (see bitcoin#27909 (comment)) 2. Improve encapsulation by making the mutex and cv internal members, and exposing just the WaitUntilEmpty() method that can be safely used.
…ase of remote client disconnection 68f23f5 http: bugfix: track closed connection (stickies-v) 084d037 http: log connection instead of request count (stickies-v) 41f9027 http: refactor: use encapsulated HTTPRequestTracker (stickies-v) Pull request description: #26742 significantly increased the http server shutdown speed, but also introduced a bug (#27722 - see bitcoin/bitcoin#27722 (comment) for steps to reproduce on master) that causes http server shutdown to halt in case of a remote client disconnection. This happens because `evhttp_request_set_on_complete_cb` is never called and thus the request is never removed from `g_requests`. This PR fixes that bug, and improves robustness of the code by encapsulating the request tracking logic. Earlier approaches (#27909, #27245, #19434) attempted to resolve this but [imo are fundamentally unsafe](bitcoin/bitcoin#27909 (comment)) because of differences in lifetime between an `evhttp_request` and `evhttp_connection`. We don't need to keep track of open requests or connections, we just [need to ensure](bitcoin/bitcoin#19420 (comment)) that there are no active requests on server shutdown. Because a connection can have multiple requests, and a request can be completed in various ways (the request actually being handled, or the client performing a remote disconnect), keeping a counter per connection seems like the approach with the least overhead to me. Fixes #27722 ACKs for top commit: vasild: ACK 68f23f5 theStack: ACK 68f23f5 Tree-SHA512: dfa711ff55ec75ba44d73e9e6fac16b0be25cf3c20868c2145a844a7878ad9fc6998d9ff62d72f3a210bfa411ef03d3757b73d68a7c22926e874c421e51444d6
Introduces and uses a HTTPRequestTracker class to keep track of how many HTTP requests are currently active, so we don't stop the server before they're all handled. This has two purposes: 1. In a next commit, allows us to untrack all requests associated with a connection without running into lifetime issues of the connection living longer than the request (see bitcoin#27909 (comment)) 2. Improve encapsulation by making the mutex and cv internal members, and exposing just the WaitUntilEmpty() method that can be safely used. Github-Pull: bitcoin#28551 Rebased-From: 41f9027
Superseded by #28551, can be closed now. |
Closing for now. |
…emote client disconnection 68f23f5 http: bugfix: track closed connection (stickies-v) 084d037 http: log connection instead of request count (stickies-v) 41f9027 http: refactor: use encapsulated HTTPRequestTracker (stickies-v) Pull request description: bitcoin#26742 significantly increased the http server shutdown speed, but also introduced a bug (bitcoin#27722 - see bitcoin#27722 (comment) for steps to reproduce on master) that causes http server shutdown to halt in case of a remote client disconnection. This happens because `evhttp_request_set_on_complete_cb` is never called and thus the request is never removed from `g_requests`. This PR fixes that bug, and improves robustness of the code by encapsulating the request tracking logic. Earlier approaches (bitcoin#27909, bitcoin#27245, bitcoin#19434) attempted to resolve this but [imo are fundamentally unsafe](bitcoin#27909 (comment)) because of differences in lifetime between an `evhttp_request` and `evhttp_connection`. We don't need to keep track of open requests or connections, we just [need to ensure](bitcoin#19420 (comment)) that there are no active requests on server shutdown. Because a connection can have multiple requests, and a request can be completed in various ways (the request actually being handled, or the client performing a remote disconnect), keeping a counter per connection seems like the approach with the least overhead to me. Fixes bitcoin#27722 ACKs for top commit: vasild: ACK 68f23f5 theStack: ACK 68f23f5 Tree-SHA512: dfa711ff55ec75ba44d73e9e6fac16b0be25cf3c20868c2145a844a7878ad9fc6998d9ff62d72f3a210bfa411ef03d3757b73d68a7c22926e874c421e51444d6
Introduces and uses a HTTPRequestTracker class to keep track of how many HTTP requests are currently active, so we don't stop the server before they're all handled. This has two purposes: 1. In a next commit, allows us to untrack all requests associated with a connection without running into lifetime issues of the connection living longer than the request (see bitcoin/bitcoin#27909 (comment)) 2. Improve encapsulation by making the mutex and cv internal members, and exposing just the WaitUntilEmpty() method that can be safely used. Github-Pull: #28551 Rebased-From: 41f9027
Introduces and uses a HTTPRequestTracker class to keep track of how many HTTP requests are currently active, so we don't stop the server before they're all handled. This has two purposes: 1. In a next commit, allows us to untrack all requests associated with a connection without running into lifetime issues of the connection living longer than the request (see bitcoin/bitcoin#27909 (comment)) 2. Improve encapsulation by making the mutex and cv internal members, and exposing just the WaitUntilEmpty() method that can be safely used.
Introduces and uses a HTTPRequestTracker class to keep track of how many HTTP requests are currently active, so we don't stop the server before they're all handled. This has two purposes: 1. In a next commit, allows us to untrack all requests associated with a connection without running into lifetime issues of the connection living longer than the request (see bitcoin/bitcoin#27909 (comment)) 2. Improve encapsulation by making the mutex and cv internal members, and exposing just the WaitUntilEmpty() method that can be safely used. Github-Pull: #28551 Rebased-From: 41f9027813f849a9fd6a1479bbb74b9037990c3c
Prior to this patch, it was possible that if an RPC was interrupted before the reply was received, the success callback evhttp_request_set_on_complete_cb would never be called and g_requests wouldn't be cleaned up. When attempting to shutdown bitcoind, it would hang waiting for g_requests.empty().
Fixes #27722
I am currently working on testing this with a libevent version of 2.2.1 to ensure that the patch also works for that version.