Propagate timeout in LNURL withdraw flow #1004

andrei-21 · 2024-06-13T10:30:15Z

We have an issue with withdrawing from lnbits, sometimes we get operation timed out, but after some time the payment usually arrives. The reason for that that the SDK has 30 seconds timeout for http requests, when the wallet submits an invoice, lnbits tries to pay it synchronously. If the payment takes more than 30 seconds the SDK fails the request and returns the error, but lnbits is still paying the invoice and likely will pay.
In that case the issue is on lnbits side (it should not wait for successful payment before responding to the http request), I reported them lnbits/withdraw#37. But I afraid the are not going to fix it soon and not every instance is going to upgrade quick.

Here is a proposal to propagate this timeout back to the client and let him decide if the request needs to be retried, tolerated, or failed.

dangeross · 2024-06-19T10:08:18Z

Sorry for the slow review @andrei-21, can you resolve the conflicts?

roeierez · 2024-06-19T10:57:54Z

I am fine with the solution. We may also increase the timeout to 1 minute in this case which will also make the case work in most of the times.
Needs rebase.

roeierez

LGTM

ok300 · 2024-06-20T11:11:13Z

libs/sdk-common/src/lnurl/specs/withdraw.rs

            data: LnUrlWithdrawSuccessData { invoice },
        },
-        LnUrlCallbackStatus::ErrorStatus { data } => LnUrlWithdrawResult::ErrorStatus { data },
+        Ok(LnUrlCallbackStatus::ErrorStatus { data }) => LnUrlWithdrawResult::ErrorStatus { data },
+        Err(e) if e.to_string().contains("operation timed out") => LnUrlWithdrawResult::Timeout {


I don't know if this is the best solution. It makes this conditional on the exact error string from lnbits, which can change anytime even with a simple HTTP server upgrade.

Perhaps a more robust approach is to make the REST client timeout configurable in the SDK config? Then clients that expect to interact with slower endpoints can bump it to 60s?

Or maybe we can just bump the default from 30 to 60s?

What do you guys think?

Not against merging this, just wondering if there's a better way.

@andrei-21 since it is lnbit specific (which is also against the spec) I think @ok300 has a point.
What do you think? bumping to 60 seconds will solve this issue in a reasonable way?

To be clear, this message comes from reqwest, not from lnbits (nothing comes from lnbits, that is the point ;).
I see the concern with relying on the specific text, we can do better, by actually checking if it was timeout by using is_timeout().

Regarding increasing the timeout. The problem is not that specific to lnbits, timeout can happen with any server (e.g. due to a network issue). Increasing the timeout will improve the situation, but not solve it.

The timeout idea was a practical suggestion for what appears to be a temporary problem, namely that lnbits is not (yet) spec-conform in handling LNURL-withdraws.

The is_timeout idea makes sense.

A few thoughts on the details: The right place to handle it is probably get_and_log_response, which is where the reqwest error originates. This means the error it typically throws, ServiceConnectivityError, would need two variants, a Generic { err } and a Timeout. This would propagate all the way to get_parse_and_log_response and validate_lnurl_withdraw, where it can result in the LnUrlWithdrawResult::Timeout you added.

@andrei-21 I have a general question about this.

AFAIK the timeout would not distinguish between

a connection issue (network, etc)

a server issue (down for maintenance, etc)

the synchronous payment issue (lnbits trying to pay the invoice, which takes >30s, so LNURL timeout)

In the PR its written

... a proposal to propagate this timeout back to the client and let him decide if the request needs to be retried, tolerated, or failed.

Isn't it a bad idea for the client to retry a timed-out Withdraw, not knowing the actual root cause of the timeout? Couldn't the 2nd, 3rd Withdraw call lead to a 2nd and 3rd actual withdrawal, and not just a "retry"?

Or am I missing something and there's a way to tell the difference?

@ok300

The right place to handle it is probably get_and_log_response

I agree.

AFAIK the timeout would not distinguish between

Maybe we can distinguish between connection timeout and (communication) timeout, but in general you are right. (I just noticed that the title of the PR is not correct.)

Isn't it a bad idea for the client to retry a timed-out Withdraw, not knowing the actual root cause of the timeout?

LNURL-w does not support any form of a safe retry. So in general it is dangerous to retry. In general the only safe thing to do is to show the problem to the user, but it results in a bad UX. Ignoring the error also is not bad since anyway the wallet must handle such situation (incoming payment can always fail).
Retry maybe also an option if the wallet knows more about a specific LNURL-w that it is a one-time withdraw.
Anyway as you can see the SDK cannot do better than just returning this error to the client.

I will move checking of specific network issue into get_and_log_response.

andrei-21 · 2024-07-02T12:19:50Z

@ok300 please check out.

ok300

Looks good, thanks for the PR @andrei-21

dangeross requested review from roeierez and dangeross June 19, 2024 10:08

andrei-21 force-pushed the feature/propagate-timeout branch from aeb4301 to f23e480 Compare June 19, 2024 14:18

roeierez approved these changes Jun 19, 2024

View reviewed changes

danielgranhao force-pushed the feature/propagate-timeout branch from 400a53b to 254b3bc Compare June 19, 2024 16:43

andrei-21 marked this pull request as ready for review June 19, 2024 16:44

danielgranhao force-pushed the feature/propagate-timeout branch from 254b3bc to 9386edb Compare June 20, 2024 08:59

dangeross approved these changes Jun 20, 2024

View reviewed changes

ok300 reviewed Jun 20, 2024

View reviewed changes

andrei-21 and others added 3 commits July 2, 2024 13:08

Propagate connection timeout in LNURL withdraw flow

d3d658f

Generate bindings

6b5c44a

Propagate reqwest error code

9c40a62

andrei-21 force-pushed the feature/propagate-timeout branch from 04725ce to b0e3e1f Compare July 2, 2024 12:18

andrei-21 changed the title ~~Propagate connection timeout in LNURL withdraw flow~~ Propagate timeout in LNURL withdraw flow Jul 2, 2024

ok300 approved these changes Jul 2, 2024

View reviewed changes

andrei-21 force-pushed the feature/propagate-timeout branch from b0e3e1f to 9c40a62 Compare July 2, 2024 16:00

roeierez merged commit 8c27e8b into breez:main Jul 3, 2024
9 checks passed

andrei-21 deleted the feature/propagate-timeout branch July 3, 2024 08:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagate timeout in LNURL withdraw flow #1004

Propagate timeout in LNURL withdraw flow #1004

andrei-21 commented Jun 13, 2024

dangeross commented Jun 19, 2024

roeierez commented Jun 19, 2024

roeierez left a comment

ok300 Jun 20, 2024

ok300 Jun 20, 2024

roeierez Jun 20, 2024 •

edited

Loading

andrei-21 Jun 20, 2024

ok300 Jun 26, 2024

ok300 Jun 26, 2024 •

edited

Loading

andrei-21 Jun 27, 2024

andrei-21 commented Jul 2, 2024

ok300 left a comment

Propagate timeout in LNURL withdraw flow #1004

Propagate timeout in LNURL withdraw flow #1004

Conversation

andrei-21 commented Jun 13, 2024

dangeross commented Jun 19, 2024

roeierez commented Jun 19, 2024

roeierez left a comment

Choose a reason for hiding this comment

ok300 Jun 20, 2024

Choose a reason for hiding this comment

ok300 Jun 20, 2024

Choose a reason for hiding this comment

roeierez Jun 20, 2024 • edited Loading

Choose a reason for hiding this comment

andrei-21 Jun 20, 2024

Choose a reason for hiding this comment

ok300 Jun 26, 2024

Choose a reason for hiding this comment

ok300 Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

andrei-21 Jun 27, 2024

Choose a reason for hiding this comment

andrei-21 commented Jul 2, 2024

ok300 left a comment

Choose a reason for hiding this comment

roeierez Jun 20, 2024 •

edited

Loading

ok300 Jun 26, 2024 •

edited

Loading