Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate timeout in LNURL withdraw flow #1004

Merged
merged 3 commits into from
Jul 3, 2024

Conversation

andrei-21
Copy link
Contributor

We have an issue with withdrawing from lnbits, sometimes we get operation timed out, but after some time the payment usually arrives. The reason for that that the SDK has 30 seconds timeout for http requests, when the wallet submits an invoice, lnbits tries to pay it synchronously. If the payment takes more than 30 seconds the SDK fails the request and returns the error, but lnbits is still paying the invoice and likely will pay.
In that case the issue is on lnbits side (it should not wait for successful payment before responding to the http request), I reported them lnbits/withdraw#37. But I afraid the are not going to fix it soon and not every instance is going to upgrade quick.

Here is a proposal to propagate this timeout back to the client and let him decide if the request needs to be retried, tolerated, or failed.

@dangeross
Copy link
Collaborator

Sorry for the slow review @andrei-21, can you resolve the conflicts?

@roeierez
Copy link
Member

I am fine with the solution. We may also increase the timeout to 1 minute in this case which will also make the case work in most of the times.
Needs rebase.

Copy link
Member

@roeierez roeierez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andrei-21 andrei-21 marked this pull request as ready for review June 19, 2024 16:44
data: LnUrlWithdrawSuccessData { invoice },
},
LnUrlCallbackStatus::ErrorStatus { data } => LnUrlWithdrawResult::ErrorStatus { data },
Ok(LnUrlCallbackStatus::ErrorStatus { data }) => LnUrlWithdrawResult::ErrorStatus { data },
Err(e) if e.to_string().contains("operation timed out") => LnUrlWithdrawResult::Timeout {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is the best solution. It makes this conditional on the exact error string from lnbits, which can change anytime even with a simple HTTP server upgrade.

Perhaps a more robust approach is to make the REST client timeout configurable in the SDK config? Then clients that expect to interact with slower endpoints can bump it to 60s?

Or maybe we can just bump the default from 30 to 60s?

What do you guys think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not against merging this, just wondering if there's a better way.

Copy link
Member

@roeierez roeierez Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrei-21 since it is lnbit specific (which is also against the spec) I think @ok300 has a point.
What do you think? bumping to 60 seconds will solve this issue in a reasonable way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this message comes from reqwest, not from lnbits (nothing comes from lnbits, that is the point ;).
I see the concern with relying on the specific text, we can do better, by actually checking if it was timeout by using is_timeout().

Regarding increasing the timeout. The problem is not that specific to lnbits, timeout can happen with any server (e.g. due to a network issue). Increasing the timeout will improve the situation, but not solve it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout idea was a practical suggestion for what appears to be a temporary problem, namely that lnbits is not (yet) spec-conform in handling LNURL-withdraws.

The is_timeout idea makes sense.

A few thoughts on the details: The right place to handle it is probably get_and_log_response, which is where the reqwest error originates. This means the error it typically throws, ServiceConnectivityError, would need two variants, a Generic { err } and a Timeout. This would propagate all the way to get_parse_and_log_response and validate_lnurl_withdraw, where it can result in the LnUrlWithdrawResult::Timeout you added.

Copy link
Contributor

@ok300 ok300 Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrei-21 I have a general question about this.

AFAIK the timeout would not distinguish between

  • a connection issue (network, etc)
  • a server issue (down for maintenance, etc)
  • the synchronous payment issue (lnbits trying to pay the invoice, which takes >30s, so LNURL timeout)

In the PR its written

... a proposal to propagate this timeout back to the client and let him decide if the request needs to be retried, tolerated, or failed.

Isn't it a bad idea for the client to retry a timed-out Withdraw, not knowing the actual root cause of the timeout? Couldn't the 2nd, 3rd Withdraw call lead to a 2nd and 3rd actual withdrawal, and not just a "retry"?

Or am I missing something and there's a way to tell the difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ok300

The right place to handle it is probably get_and_log_response

I agree.

AFAIK the timeout would not distinguish between

Maybe we can distinguish between connection timeout and (communication) timeout, but in general you are right. (I just noticed that the title of the PR is not correct.)

Isn't it a bad idea for the client to retry a timed-out Withdraw, not knowing the actual root cause of the timeout?

LNURL-w does not support any form of a safe retry. So in general it is dangerous to retry. In general the only safe thing to do is to show the problem to the user, but it results in a bad UX. Ignoring the error also is not bad since anyway the wallet must handle such situation (incoming payment can always fail).
Retry maybe also an option if the wallet knows more about a specific LNURL-w that it is a one-time withdraw.
Anyway as you can see the SDK cannot do better than just returning this error to the client.

I will move checking of specific network issue into get_and_log_response.

@andrei-21 andrei-21 changed the title Propagate connection timeout in LNURL withdraw flow Propagate timeout in LNURL withdraw flow Jul 2, 2024
@andrei-21
Copy link
Contributor Author

@ok300 please check out.

Copy link
Contributor

@ok300 ok300 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the PR @andrei-21

@roeierez roeierez merged commit 8c27e8b into breez:main Jul 3, 2024
9 checks passed
@andrei-21 andrei-21 deleted the feature/propagate-timeout branch July 3, 2024 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants