paymod: Payments reimagined (part 01) #3753

cdecker · 2020-05-29T12:34:00Z

This is only a partial implementation of the new payment flow, but given the
looming release deadline and the sheer size of the change, I thought it's best
to get some code reviews in now

Design

The current payment flow is a multi-step process in the pay plugin (and some
others) that grew organically as logic and more features were added over the
time. This PR is my attempt to reimagine and consolidate the payment flow, to
consolidate some commonalities, better compartmentalize the existing addons,
and facilitate future modifications of the flow.

In order to get to this lofty goal I separated the logic into two separate
parts: the payment which consists of the core functionality required to
perform a payment, and payment_modifiers which have associated data and are
called at each step of the payment flow to allow them to modify the payment
behavior. The core payment flow consists of the following steps:

digraph {
  initialized -> got_route -> sendonion -> failed;
  sendonion -> success;
}

After each step is complete all modifiers registered with the flow will be
called, and they are allowed to modify their local data and the data of the
payment in order to fulfill their purpose. In particular they are allowed to
set a payment into any state, including two states (retry and split) that
are not part of the core flow. These are indications that the payment failed,
but was retried with different parameters or was split into multiple
sub-payments, e.g., in MPP.

This brings us to the structure of payments: payments and their sub-payments
are organized in a tree, with the inner nodes representing failed attempts
that have been retried in some form, and leafs representing the currently
active or final attempts. The root of the tree is the original attempt and
serves as a coordination point for all sub-payments, aggregating information
that is shared among all attempts such as a list of channel_hints used to
compute exclusions, and the invoice information that initiated the payment
flow.

This structure allows modifiers to implement almost arbitrary logic, while
keeping the core logic clean. Modifiers (both planned and implemented)
include:

final_cltv: the invoice can specify a final CLTV offset that just adds to
the route CLTV deltas, so calling after got_route we can adjust them to
match.
routehints: these change the destination for which we call getroute and
once in got_route we can append the current routehint to form complete
route.
waitblockheight: if a node reports that our CLTV is too low we can infer
the height we should be at and then wait to reach that height before
re-attempting.
shadow_route: since this is just an offset applied to the values we can
compute the shadow route and then adjust the amount passed to the
getroute call.
retry: if a payment fails we can just retry for a fixed number of times
with different routes.
split: this is the core functionality of MPP, splitting a payment into
multiple sub-payments if it fails.
keysend is just adjusting the TLV payload passed to createonion to
include the preimage.

There are certainly some suboptimal solutions, however I think this structure
can be used to better experiment and implement future extensions rather than
having to weave the functionality into the existing flow (keysend sending
functionality for example would not have any need for the routehints
modifier, but the rest is shared).

Testing

The functionality is implemented in parallel to the existing json_pay
infrastructure, but is planned to become a drop-in replacement, allowing us to
eventually remove the old implementation and just use the new one. To test the
new flow you can use lightning-cli paymod [args] instead of lightning-cli pay [args] though some modifiers are not implemented yet, and some are in the
next parts of this PR.

I am testing feature parity by changing pay.c to use json_paymod instead
of json_pay, allowing me to run the existing (unmodified) tests against the
new logic.

The ugly parts:

I'm calling these out because I am currently not addressing them and they
aren't a high priority, but we should clean these up eventually, and they may
cause reviewers to stumble a bit:

Modifiers need to call payment_continue(p) to advance the state-machine,
and I forgot that a couple of times. Something like we do with
command_still_pending and friends would be nice.
The RPC calls being bound to an originating command instance is rather
strange, even though it doesn't do much (you can pass NULL to it), since
we now potentially have multiple concurrent RPC calls in flight, if one of
them were to respond to the original cmd it'd break things. Making the
RPC interface independent of having a command instance would be a good
idea.
I started actually parsing the JSON-RPC request and responses into
structs to facilitate modifying them, however the functions and struct
definitions are spread all over the place. I'd like to consolidate them
under common/ eventually, so we can use them on both the lightningd and
the plugin side, and not have to replicate the extraction of data from JSON
responses all over the place.

Next up

I have most of the functionality implemented, and we're down to 12 failing
tests in my development branch. I'll publish the next part
as soon as it settles down, i.e., doesn't get changed anymore, and I'll defer
cleanups and renaming to a last cleanup part in order not to delay the overall
progress and release.

Given the very new and unproven nature I'd propose we release this for
keysend sending support directly, but gate the replacement of the pay
command behind either a !COMPAT_090 or the DEVELOPER flag. This would
minimize the impact to the majority of users, but still allow us to collect
experience from keysend and users that opted into testing it. However it'd
also mean that mpp-pay is a separate command or a developer-only command for
this release.

Changelog-None

plugins/libplugin-pay.h

plugins/libplugin-pay.c

niftynei · 2020-05-29T19:39:15Z

The modifier design is really nice and modular! very cool 😎

ZmnSCPxj · 2020-06-10T10:36:22Z

One thing I have been thinking of is to separate pathfinding into a separate module. What I mean is, there is a module within pay whose purpose is to call getroute repeatedly and cache the resulting routes, speculatively excluding channels/nodes it believes are high-risk (low capacity, historically unreliable, cannot be pinged, etc). This module would run continuously in the background for a particular payment until it reached some number of candidate routes, then sit idle until the number of valid candidate routes drops below some threshold, after which it would trigger again and search more routes.

Periodically the pay logic would inform the pathfinding module that some node/channel failed a payment of XX amount, and the pathfinding module would rebox routes that include that channel / node into lower-amount boxes.

This allows us to amortize the time we are waiting for HTLCs to be resolved or failed into (speculative) pathfnding, and there may be more efficient pathfinding algorithms that can generate multiple paths in a single run rather than just a single path,

So roughly, the interface between the pay logic and the pathfinder module would be:

new_pathfinding_module(destination, routehints) - construct a new pathfinding module for a particular target destination.
pathfinding_module_get(pathfinding_module, amount) - deliver a path to the destination, for the specified amount.
pathfinding_module_inform_failed(pathfiding_module, amount, node_or_channel) - tell the pathfinding module that the node or channel failed while delivering an amount.

This seems vastly different from your plan, though.

In any case, it would be possible to expose the above proposed pathfinding module as a set of commands provided by a built-in plugin (separate from pay). We would need to bound its lifetime (and pay has a timeout parameter anyway that it could pass as the max lifetime of the pathfinding module), and allow early destruction of the pathfinding module object in case of definite success. This would provide a relatively common framework for interacting with a plan to send funds to the destination.

cdecker · 2020-06-10T15:11:33Z

One thing I have been thinking of is to separate pathfinding into a separate module. What I mean is, there is a module within pay whose purpose is to call getroute repeatedly and cache the resulting routes, speculatively excluding channels/nodes it believes are high-risk (low capacity, historically unreliable, cannot be pinged, etc). This module would run continuously in the background for a particular payment until it reached some number of candidate routes, then sit idle until the number of valid candidate routes drops below some threshold, after which it would trigger again and search more routes.

That is indeed already possible with the payment modifiers as implemented in this PR. In fact just yesterday I implemented the first modifier that uses its own logic to compute a route, completely sidestepping the call to getroute: cdecker@d39462f. This modifier checks to see if we have a direct peer channel with sufficient capacity and overrides the route to use that channel.

The logic to compute the route can be arbitrary complex, can rely on other plugins or RPC calls, and can do arbitrary choices, and report metrics back to whatever mechanism it is using. And best of all we don't have to define yet another interface.

The idea behind the new payment flow is that it provides the bare minimum to get a successful payment through in the ideal case, but still allows enough flexibility to override/replace whatever doesn't work, and add reactions to deviations from the happy path.

rustyrussell

Ack a61bf50

Minor feedback only..

plugins/libplugin-pay.c

plugins/libplugin-pay.h

rustyrussell · 2020-06-18T07:33:15Z

plugins/libplugin-pay.c

+		p->step = PAYMENT_STEP_RETRY;
+	}
+
+	payment_continue(p);


I see what you mean about payment_continue() being easy to miss.

Perhaps this function should simply return, and the "payment_continue" should be in the caller?

Ah, I see payment_continue gets called from other callbacks, making this more difficult. OK.

Yes, it's pretty similar to the command_still_pending/command_fail/command_success construction, so I though it'd be nice to build something similar to that. I can do that on top of paymod-03 if you agree that it's a good idea.

plugins/pay.c

plugins/libplugin-pay.c

rustyrussell · 2020-06-18T07:48:29Z

plugins/libplugin-pay.c

+	if (p->parent == NULL && cmd == NULL) {
+		/* This is the tree root, but we already reported success or
+		 * failure, so noop. */
+		return command_still_pending(NULL);


Yech, that's ugly! I'm not convinced payment_finished should retrurn command_result...

The reason it is like this is that payment_finished encapsulates the logic to determine whether a command is pending or not. It is the place where we call command_fail and command_finished. Those two functions are marked as warn_unused_result, so I thought it easiest to hand them up and then silence the result.

We can of course silence it by doing something else, but I have no good ideas on how to do that.

After some more googling it seems like we can opt-out of the warn_unused_result using a cast to void (???):

(void)command_finished(cmd, ret);

Still not nice, but I added what they'd look like in the last few fixup!-commits (they clash with other things so I haven't reordered them just yet).

Hm, gcc (7.5) doesn't seem to like this trick, and still warns about the unused result.

No, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66425 :(

wire/tlvstream.h

rustyrussell · 2020-06-18T07:53:19Z

plugins/libplugin-pay.c

+			cur->tlv_payload = tlv_tlv_payload_new(cr->hops);
+			tlvstream_set_tu64(
+			    &cur->tlv_payload->fields,
+			    TLV_TLV_PAYLOAD_AMT_TO_FORWARD,
+			    p->route[i + 1].amount.millisatoshis); /* Raw: TLV payload generation*/
+			tlvstream_set_tu32(&cur->tlv_payload->fields,
+					   TLV_TLV_PAYLOAD_OUTGOING_CLTV_VALUE,
+					   p->start_block +
+					       p->route[i + 1].delay);
+			tlvstream_set_short_channel_id(
+			    &cur->tlv_payload->fields,
+			    TLV_TLV_PAYLOAD_SHORT_CHANNEL_ID,
+			    &p->route[i + 1].channel_id);
 		}


Can we generate this? This is very unpretty... :(

I could move it into a separate function for starters, not truly generated, but should clean this up nicely.

cdecker · 2020-06-19T13:59:50Z

I pushed an update that attempts to fixup all the feedback. Here's the range-diff, no non-fixup commit was touched, so reviewing the fixups should be sufficient.

Not squashing just yet, since there is the "return command_result" change that needs to be acknowledged before I start working on the conflicts that reordering causes :-)

Ping @rustyrussell :-)

cdecker · 2020-06-22T16:26:00Z

Autosquashed and rebased on top of master

cdecker · 2020-06-25T15:50:31Z

Rebase on top of master required by a tiny conflict in test_pay.py:

 1:  5da37043e =  1:  6cc7a3f45 plugin: Add a data backend for the new payment state-machine
 2:  7f53f548b =  2:  57a48da7d paymod: Add base state-machine for payment flow
 3:  ada0fab08 =  3:  b23b31da5 paymod: Implement base state-machine transitions
 4:  3236f4d9b !  4:  1b6a98bbd paymod: Add a simple test-command to test the paymod state-machine
    @@ -158,8 +158,8 @@
     +++ b/tests/test_pay.py
     @@
      
    -     inv = invs[0]
    -     assert(inv['msatoshi_received'] >= amt)
    +     # l1 should still be alive afterwards.
    +     assert l1.rpc.getinfo()['id'] == l1id
     +
     +
     +@unittest.skipIf(not DEVELOPER, "paymod is only available to developers for now.")
 5:  1ac7a41e0 =  5:  3b10bf487 paymod: Generate type-safe accessor functions for modifier data
 6:  18e81ad28 =  6:  ad06cd95a paymod: Add a retry modifier that retries payments on failure
 7:  e09c5ab61 =  7:  a13d42a95 paymod: Implement bubbling results up the hierarchy of payments
 8:  e73d7ab6d =  8:  faa7f6c00 paymod: Implement getroute call
 9:  b0bb7987c =  9:  b432683e3 paymod: Add a `getinfo` call on payment_start to get the blockheight
10:  4047dea37 = 10:  0b18936b9 doc: Fix a documentation error in sendonion
11:  bb37a497e = 11:  f5bf73dff paymod: Implement legacy onion payload computation
12:  bd0a1a3d7 = 12:  8e68aadc6 paymod: Call `createonion` to generate the onion and associated data
13:  b91ac6782 = 13:  388e4d468 paymod: Parse createonion result and call sendonion
14:  20b12bf22 = 14:  4cd094937 paymod: Call waitsendpay on the sendonion we just queued
15:  06adaa6e5 = 15:  4e191d569 paymod: Parse and store the waitsendpay result
16:  947340a12 = 16:  76f824981 paymod: Collect and return results of a tree of partial payments
17:  50a69ddb4 = 17:  c144d88ce paymod: Use reasonable partids
18:  0f35f8ed2 = 18:  2bc1c8bd3 paymod: Maintain a list of current and past payments
19:  fcb46ade0 = 19:  20c22daab paymod: Do not reply to the same JSON-RPC command multiple times
20:  3112bd645 = 20:  eaed425c3 paymod: Do not pass cmd to sub-payments, plugin suffices
21:  6de4eaf02 = 21:  529faefcc tlv: Add generic getters and setters for tlvstream
22:  133b0d32c = 22:  9459e2d18 wiregen: Add enums for TLV types so we can call them by their name
23:  1787df06b = 23:  ace3b9e9f paymod: Implement TLV onion payload generation
24:  ae6d53e86 = 24:  0e9839d02 paymod: Move onion payload construction into its own function

The actual steps are mocked out, but we can already run through the various phases of a payment, and the modifiers should be called for each state.

This commit can be reverted/skipped once we have implemented all the logic and have feature parity with the normal `pay`. It's main purpose is to expose the unfinished functionality to test it, without completely breaking the existing `pay` command.

This should make it easy for JSON-RPC functions and modifiers to get the associated data for a given modifier name. Useful if a modifier needs to access its parent's modifier data, or in other functions that need to access modifier data, e.g., when passing destination pointers into the `param()` call.

This is likely a bit of overkill for this type of functionality, but it is a nice first use-case of how functionality can be compartmentalized into modifiers. If makes swapping retry mechanisms in and out really simple.

A payment is considered finished if it is in a final state (success or failure) or all its sub-payments are finished. If that's the case we notify `payment_finished` and bubble up through `payment_child_finished`, eventually bubbling up to the root, which can then report success of failure back to the RPC command that initiated the whole process.

This is necessary so we can build the absolute locktimes in the next step. For now just fetch the blockheight on each (sub-)payment, later we can reuse the root blockheight if it ends up using too much traffic.

Te `sendonion` docs where claiming that the `first_hop` needs to specify a `channel_id` whereas it should really be the `node_id` of the peer we are trying to contact.

This is just for testing for now, TLV payload computation will come next. We stage all the payloads in deserialized form so modifiers can modify them more easily and serialize them only before actually calling `createonion`.

After we gave each modifier a chance to have its say, we can now proceed to generate the onion that's going to be sent in the next step.

This is necessary so we can later aggregate across an entire tree of attempts and report success or failure to the RPC caller.

The status of what started as a simple JSON-RPC call is now spread across an entire tree of partial payments and payment attempts. So we collect the status in a single struct in order to report back success of failure.

We were just handwaving the partid generation, which broke some tests that expected the first payment attempt to always have partid=0, so here we just track the partids assigned in the payment tree, starting at 0.

We need to keep them around so we can inspect them later. We'll also need a background cleanup every once in a while to free some memory. More on that in a future commit.

We were passing the `cmd` instance around where `plugin` suffices (used to start RPC calls and logs).

Trying to rework the TLV streams to have a more homogenous interface to work with. This is by no means a complete implementation, just the groundwork that is going to be used by the wire code generator to generate the specific accessors, but it's enough so we can manipulate TLV streams in the onion and later just switch to the generated ones.

@niftynei

Suggested-by: Lisa Neigut <@niftynei> Signed-off-by: Christian Decker <@cdecker>

@rustyrussell

Suggested-by: Rusty Russell <@rustyrussell>

rustyrussell · 2020-06-30T20:32:12Z

Ack 3412a69

cdecker requested review from rustyrussell and niftynei May 29, 2020 12:35

cdecker force-pushed the paymod-01 branch 2 times, most recently from c009909 to 8bb9edc Compare May 29, 2020 13:19

niftynei reviewed May 29, 2020

View reviewed changes

cdecker force-pushed the paymod-01 branch from 8bb9edc to 22a2a0b Compare June 1, 2020 14:24

cdecker requested a review from niftynei June 1, 2020 14:25

cdecker force-pushed the paymod-01 branch 4 times, most recently from 20aa3fb to 71696c9 Compare June 5, 2020 16:54

cdecker force-pushed the paymod-01 branch from 71696c9 to dd4e95f Compare June 15, 2020 10:55

cdecker mentioned this pull request Jun 16, 2020

paymod: Feature parity, finally (part 03) #3773

Merged

cdecker force-pushed the paymod-01 branch 2 times, most recently from 34efe7e to dd4e95f Compare June 16, 2020 17:26

rustyrussell approved these changes Jun 18, 2020

View reviewed changes

cdecker force-pushed the paymod-01 branch from dd4e95f to e581e3c Compare June 19, 2020 13:57

cdecker force-pushed the paymod-01 branch from e581e3c to ae6d53e Compare June 22, 2020 15:44

cdecker force-pushed the paymod-01 branch from ae6d53e to 0e9839d Compare June 25, 2020 15:48

cdecker added 5 commits June 29, 2020 13:54

plugin: Add a data backend for the new payment state-machine

ee8d3e0

paymod: Add base state-machine for payment flow

ef0f003

paymod: Implement base state-machine transitions

45a58d0

The actual steps are mocked out, but we can already run through the various phases of a payment, and the modifiers should be called for each state.

cdecker added 19 commits June 29, 2020 13:54

paymod: Add a retry modifier that retries payments on failure

7fe59d5

This is likely a bit of overkill for this type of functionality, but it is a nice first use-case of how functionality can be compartmentalized into modifiers. If makes swapping retry mechanisms in and out really simple.

paymod: Implement getroute call

1326fe6

paymod: Add a getinfo call on payment_start to get the blockheight

12813a6

This is necessary so we can build the absolute locktimes in the next step. For now just fetch the blockheight on each (sub-)payment, later we can reuse the root blockheight if it ends up using too much traffic.

doc: Fix a documentation error in sendonion

47b4e53

Te `sendonion` docs where claiming that the `first_hop` needs to specify a `channel_id` whereas it should really be the `node_id` of the peer we are trying to contact.

paymod: Implement legacy onion payload computation

5edfa63

This is just for testing for now, TLV payload computation will come next. We stage all the payloads in deserialized form so modifiers can modify them more easily and serialize them only before actually calling `createonion`.

paymod: Call createonion to generate the onion and associated data

840cc8e

After we gave each modifier a chance to have its say, we can now proceed to generate the onion that's going to be sent in the next step.

paymod: Parse createonion result and call sendonion

014595d

paymod: Call waitsendpay on the sendonion we just queued

a91e2d9

paymod: Parse and store the waitsendpay result

1797d33

This is necessary so we can later aggregate across an entire tree of attempts and report success or failure to the RPC caller.

paymod: Collect and return results of a tree of partial payments

c59609c

The status of what started as a simple JSON-RPC call is now spread across an entire tree of partial payments and payment attempts. So we collect the status in a single struct in order to report back success of failure.

paymod: Use reasonable partids

3ac692c

We were just handwaving the partid generation, which broke some tests that expected the first payment attempt to always have partid=0, so here we just track the partids assigned in the payment tree, starting at 0.

paymod: Maintain a list of current and past payments

4b81e1b

We need to keep them around so we can inspect them later. We'll also need a background cleanup every once in a while to free some memory. More on that in a future commit.

paymod: Do not reply to the same JSON-RPC command multiple times

8167798

paymod: Do not pass cmd to sub-payments, plugin suffices

a908260

We were passing the `cmd` instance around where `plugin` suffices (used to start RPC calls and logs).

wiregen: Add enums for TLV types so we can call them by their name

aaa8779

Suggested-by: Lisa Neigut <@niftynei> Signed-off-by: Christian Decker <@cdecker>

paymod: Implement TLV onion payload generation

83f067b

paymod: Move onion payload construction into its own function

3412a69

Suggested-by: Rusty Russell <@rustyrussell>

cdecker force-pushed the paymod-01 branch from 0e9839d to 3412a69 Compare June 29, 2020 11:54

cdecker added this to the v0.9.0 milestone Jun 29, 2020

cdecker self-assigned this Jun 29, 2020

cdecker merged commit 6af5ab9 into master Jul 1, 2020

cdecker mentioned this pull request Jul 1, 2020

paymod: Slow but steady progress (part 02) #3760

Merged

rustyrussell deleted the paymod-01 branch August 15, 2022 00:43

paymod: Payments reimagined (part 01) #3753

paymod: Payments reimagined (part 01) #3753

Uh oh!

Conversation

cdecker commented May 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design

Testing

The ugly parts:

Next up

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

niftynei commented May 29, 2020

Uh oh!

ZmnSCPxj commented Jun 10, 2020

Uh oh!

cdecker commented Jun 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rustyrussell left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cdecker commented Jun 19, 2020

Uh oh!

cdecker commented Jun 22, 2020

Uh oh!

cdecker commented Jun 25, 2020

Uh oh!

rustyrussell commented Jun 30, 2020

Uh oh!

Uh oh!

cdecker commented May 29, 2020 •

edited

Loading

cdecker commented Jun 10, 2020 •

edited

Loading