-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix RPC Webhook queue dropping #5163
base: develop
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #5163 +/- ##
=========================================
+ Coverage 76.2% 77.8% +1.7%
=========================================
Files 760 783 +23
Lines 61568 66674 +5106
Branches 8126 8125 -1
=========================================
+ Hits 46909 51902 +4993
- Misses 14659 14772 +113
|
src/xrpld/net/detail/RPCCall.cpp
Outdated
// Wietse: used to be 10m, but which backend | ||
// ever requires 10 minutes to respond? | ||
// Lower = prevent stacking pending calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is more of a PR comment than something that needs to be in the code
src/xrpld/net/detail/RPCSub.cpp
Outdated
JLOG(j_.warn()) << "RPCCall::fromNetwork drop"; | ||
mDeque.pop_back(); | ||
} | ||
// Wietse: we're not going to limit this, this is admin-port only, scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
src/xrpld/net/detail/RPCSub.cpp
Outdated
@@ -182,7 +186,11 @@ class RPCSubImp : public RPCSub | |||
} | |||
|
|||
private: | |||
enum { eventQueueMax = 32 }; | |||
// Wietse: we're not going to limit this, this is admin-port only, scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @mvadari that the code comments should be removed. Also added a question.
@@ -78,12 +78,16 @@ class RPCSubImp : public RPCSub | |||
{ | |||
std::lock_guard sl(mLock); | |||
|
|||
if (mDeque.size() >= eventQueueMax) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unbound queues are often trouble. In this particular case, because it's protected by the ADMIN role, perhaps this is acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about increasing eventQueueMax
to a more reasonable number instead of removing this block?
@@ -1623,7 +1626,7 @@ fromNetwork( | |||
std::placeholders::_2, | |||
j), | |||
RPC_REPLY_MAX_BYTES, | |||
RPC_NOTIFY, | |||
RPC_WEBHOOK_TIMEOUT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that no activity for 30s means client gets disconnected? Is there at least ping-pong going on in the background to avoid disconnecting clients who are waiting for some subscription?
All this looks very different to Clio and i'm not familiar with how this works so i just wanted to make sure i understand what the side-effect of this change can be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means it's giving up after 30 seconds of trying to deliver the webhook. E.g. the recipient receives the HTTP call but starts some processing, and doesn't respond... After 30 seconds, just give up instead of keeping the outbound HTTP connection open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, got it. Does this code guaranteed to only be used with the webhook stuff or does this affect any other calls?
@mvadari comments removed. |
Problem:
When using
subscribe
at admin RPC port to send webhooks for the transaction stream to a backend, on large(r) ledgers the endpoint was consistently receiving fewer HTTP POSTs with TX information than the amount of transactions in a ledger.This resulted in some XamanWallet users, on larger ledgers, not always receiving their transaction push notifications.
Details
Admin command RPC Post to URL had a 32 queue length (hardcoded) resulting in dropping TX notifications.
As this is an admin-command only, I stripped out the entire queue length check. If admin, you should know what you are doing. If your endpoint can't efficiently handle the TPS, your problem.
Also: shorter TTL for outgoing RPC HTTP calls: was 10 minutes PER REQUEST, now is 30 seconds (still too long, but 10 minutes is a guaranteed shit show if the calls keep on hanging and stack up, especially since the 32 queue length for HTTP calls is now removed).
While dropping the queue length limit on sent WebHooks could be considered dangerous, it's guarded by admin-RPC port anyway:
rippled/src/xrpld/rpc/handlers/Subscribe.cpp
Line 51 in 63209c2
Finally: