Skip to content

paymod: Payments reimagined (part 01) #3753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Jul 1, 2020
Merged

paymod: Payments reimagined (part 01) #3753

merged 24 commits into from
Jul 1, 2020

Conversation

cdecker
Copy link
Member

@cdecker cdecker commented May 29, 2020

This is only a partial implementation of the new payment flow, but given the
looming release deadline and the sheer size of the change, I thought it's best
to get some code reviews in now

Design

The current payment flow is a multi-step process in the pay plugin (and some
others) that grew organically as logic and more features were added over the
time. This PR is my attempt to reimagine and consolidate the payment flow, to
consolidate some commonalities, better compartmentalize the existing addons,
and facilitate future modifications of the flow.

In order to get to this lofty goal I separated the logic into two separate
parts: the payment which consists of the core functionality required to
perform a payment, and payment_modifiers which have associated data and are
called at each step of the payment flow to allow them to modify the payment
behavior. The core payment flow consists of the following steps:

base-payment-state-machine

digraph {
  initialized -> got_route -> sendonion -> failed;
  sendonion -> success;
}

After each step is complete all modifiers registered with the flow will be
called, and they are allowed to modify their local data and the data of the
payment in order to fulfill their purpose. In particular they are allowed to
set a payment into any state, including two states (retry and split) that
are not part of the core flow. These are indications that the payment failed,
but was retried with different parameters or was split into multiple
sub-payments, e.g., in MPP.

This brings us to the structure of payments: payments and their sub-payments
are organized in a tree, with the inner nodes representing failed attempts
that have been retried in some form, and leafs representing the currently
active or final attempts. The root of the tree is the original attempt and
serves as a coordination point for all sub-payments, aggregating information
that is shared among all attempts such as a list of channel_hints used to
compute exclusions, and the invoice information that initiated the payment
flow.

This structure allows modifiers to implement almost arbitrary logic, while
keeping the core logic clean. Modifiers (both planned and implemented)
include:

  • final_cltv: the invoice can specify a final CLTV offset that just adds to
    the route CLTV deltas, so calling after got_route we can adjust them to
    match.
  • routehints: these change the destination for which we call getroute and
    once in got_route we can append the current routehint to form complete
    route.
  • waitblockheight: if a node reports that our CLTV is too low we can infer
    the height we should be at and then wait to reach that height before
    re-attempting.
  • shadow_route: since this is just an offset applied to the values we can
    compute the shadow route and then adjust the amount passed to the
    getroute call.
  • retry: if a payment fails we can just retry for a fixed number of times
    with different routes.
  • split: this is the core functionality of MPP, splitting a payment into
    multiple sub-payments if it fails.
  • keysend is just adjusting the TLV payload passed to createonion to
    include the preimage.

There are certainly some suboptimal solutions, however I think this structure
can be used to better experiment and implement future extensions rather than
having to weave the functionality into the existing flow (keysend sending
functionality for example would not have any need for the routehints
modifier, but the rest is shared).

Testing

The functionality is implemented in parallel to the existing json_pay
infrastructure, but is planned to become a drop-in replacement, allowing us to
eventually remove the old implementation and just use the new one. To test the
new flow you can use lightning-cli paymod [args] instead of lightning-cli pay [args] though some modifiers are not implemented yet, and some are in the
next parts of this PR.

I am testing feature parity by changing pay.c to use json_paymod instead
of json_pay, allowing me to run the existing (unmodified) tests against the
new logic.

The ugly parts:

I'm calling these out because I am currently not addressing them and they
aren't a high priority, but we should clean these up eventually, and they may
cause reviewers to stumble a bit:

  • Modifiers need to call payment_continue(p) to advance the state-machine,
    and I forgot that a couple of times. Something like we do with
    command_still_pending and friends would be nice.
  • The RPC calls being bound to an originating command instance is rather
    strange, even though it doesn't do much (you can pass NULL to it), since
    we now potentially have multiple concurrent RPC calls in flight, if one of
    them were to respond to the original cmd it'd break things. Making the
    RPC interface independent of having a command instance would be a good
    idea.
  • I started actually parsing the JSON-RPC request and responses into
    structs to facilitate modifying them, however the functions and struct
    definitions are spread all over the place. I'd like to consolidate them
    under common/ eventually, so we can use them on both the lightningd and
    the plugin side, and not have to replicate the extraction of data from JSON
    responses all over the place.

Next up

I have most of the functionality implemented, and we're down to 12 failing
tests in my development branch. I'll publish the next part
as soon as it settles down, i.e., doesn't get changed anymore, and I'll defer
cleanups and renaming to a last cleanup part in order not to delay the overall
progress and release.

Given the very new and unproven nature I'd propose we release this for
keysend sending support directly, but gate the replacement of the pay
command behind either a !COMPAT_090 or the DEVELOPER flag. This would
minimize the impact to the majority of users, but still allow us to collect
experience from keysend and users that opted into testing it. However it'd
also mean that mpp-pay is a separate command or a developer-only command for
this release.

Changelog-None

@cdecker cdecker requested review from rustyrussell and niftynei May 29, 2020 12:35
@cdecker cdecker force-pushed the paymod-01 branch 2 times, most recently from c009909 to 8bb9edc Compare May 29, 2020 13:19
@niftynei
Copy link
Contributor

The modifier design is really nice and modular! very cool 😎

@cdecker cdecker requested a review from niftynei June 1, 2020 14:25
@cdecker cdecker force-pushed the paymod-01 branch 4 times, most recently from 20aa3fb to 71696c9 Compare June 5, 2020 16:54
@ZmnSCPxj
Copy link
Contributor

One thing I have been thinking of is to separate pathfinding into a separate module. What I mean is, there is a module within pay whose purpose is to call getroute repeatedly and cache the resulting routes, speculatively excluding channels/nodes it believes are high-risk (low capacity, historically unreliable, cannot be pinged, etc). This module would run continuously in the background for a particular payment until it reached some number of candidate routes, then sit idle until the number of valid candidate routes drops below some threshold, after which it would trigger again and search more routes.

Periodically the pay logic would inform the pathfinding module that some node/channel failed a payment of XX amount, and the pathfinding module would rebox routes that include that channel / node into lower-amount boxes.

This allows us to amortize the time we are waiting for HTLCs to be resolved or failed into (speculative) pathfnding, and there may be more efficient pathfinding algorithms that can generate multiple paths in a single run rather than just a single path,

So roughly, the interface between the pay logic and the pathfinder module would be:

  • new_pathfinding_module(destination, routehints) - construct a new pathfinding module for a particular target destination.
  • pathfinding_module_get(pathfinding_module, amount) - deliver a path to the destination, for the specified amount.
  • pathfinding_module_inform_failed(pathfiding_module, amount, node_or_channel) - tell the pathfinding module that the node or channel failed while delivering an amount.

This seems vastly different from your plan, though.

In any case, it would be possible to expose the above proposed pathfinding module as a set of commands provided by a built-in plugin (separate from pay). We would need to bound its lifetime (and pay has a timeout parameter anyway that it could pass as the max lifetime of the pathfinding module), and allow early destruction of the pathfinding module object in case of definite success. This would provide a relatively common framework for interacting with a plan to send funds to the destination.

@cdecker
Copy link
Member Author

cdecker commented Jun 10, 2020

One thing I have been thinking of is to separate pathfinding into a separate module. What I mean is, there is a module within pay whose purpose is to call getroute repeatedly and cache the resulting routes, speculatively excluding channels/nodes it believes are high-risk (low capacity, historically unreliable, cannot be pinged, etc). This module would run continuously in the background for a particular payment until it reached some number of candidate routes, then sit idle until the number of valid candidate routes drops below some threshold, after which it would trigger again and search more routes.

That is indeed already possible with the payment modifiers as implemented in this PR. In fact just yesterday I implemented the first modifier that uses its own logic to compute a route, completely sidestepping the call to getroute: cdecker@d39462f. This modifier checks to see if we have a direct peer channel with sufficient capacity and overrides the route to use that channel.

The logic to compute the route can be arbitrary complex, can rely on other plugins or RPC calls, and can do arbitrary choices, and report metrics back to whatever mechanism it is using. And best of all we don't have to define yet another interface.

The idea behind the new payment flow is that it provides the bare minimum to get a successful payment through in the ideal case, but still allows enough flexibility to override/replace whatever doesn't work, and add reactions to deviations from the happy path.

Copy link
Contributor

@rustyrussell rustyrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack a61bf50

Minor feedback only..

p->step = PAYMENT_STEP_RETRY;
}

payment_continue(p);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean about payment_continue() being easy to miss.

Perhaps this function should simply return, and the "payment_continue" should be in the caller?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see payment_continue gets called from other callbacks, making this more difficult. OK.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's pretty similar to the command_still_pending/command_fail/command_success construction, so I though it'd be nice to build something similar to that. I can do that on top of paymod-03 if you agree that it's a good idea.

if (p->parent == NULL && cmd == NULL) {
/* This is the tree root, but we already reported success or
* failure, so noop. */
return command_still_pending(NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yech, that's ugly! I'm not convinced payment_finished should retrurn command_result...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason it is like this is that payment_finished encapsulates the logic to determine whether a command is pending or not. It is the place where we call command_fail and command_finished. Those two functions are marked as warn_unused_result, so I thought it easiest to hand them up and then silence the result.

We can of course silence it by doing something else, but I have no good ideas on how to do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some more googling it seems like we can opt-out of the warn_unused_result using a cast to void (???):

(void)command_finished(cmd, ret);

Still not nice, but I added what they'd look like in the last few fixup!-commits (they clash with other things so I haven't reordered them just yet).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, gcc (7.5) doesn't seem to like this trick, and still warns about the unused result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 474 to 487
cur->tlv_payload = tlv_tlv_payload_new(cr->hops);
tlvstream_set_tu64(
&cur->tlv_payload->fields,
TLV_TLV_PAYLOAD_AMT_TO_FORWARD,
p->route[i + 1].amount.millisatoshis); /* Raw: TLV payload generation*/
tlvstream_set_tu32(&cur->tlv_payload->fields,
TLV_TLV_PAYLOAD_OUTGOING_CLTV_VALUE,
p->start_block +
p->route[i + 1].delay);
tlvstream_set_short_channel_id(
&cur->tlv_payload->fields,
TLV_TLV_PAYLOAD_SHORT_CHANNEL_ID,
&p->route[i + 1].channel_id);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we generate this? This is very unpretty... :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could move it into a separate function for starters, not truly generated, but should clean this up nicely.

@cdecker
Copy link
Member Author

cdecker commented Jun 19, 2020

I pushed an update that attempts to fixup all the feedback. Here's the range-diff, no non-fixup commit was touched, so reviewing the fixups should be sufficient.

Not squashing just yet, since there is the "return command_result" change that needs to be acknowledged before I start working on the conflicts that reordering causes :-)

Ping @rustyrussell :-)

@cdecker
Copy link
Member Author

cdecker commented Jun 22, 2020

Autosquashed and rebased on top of master

@cdecker
Copy link
Member Author

cdecker commented Jun 25, 2020

Rebase on top of master required by a tiny conflict in test_pay.py:

 1:  5da37043e =  1:  6cc7a3f45 plugin: Add a data backend for the new payment state-machine
 2:  7f53f548b =  2:  57a48da7d paymod: Add base state-machine for payment flow
 3:  ada0fab08 =  3:  b23b31da5 paymod: Implement base state-machine transitions
 4:  3236f4d9b !  4:  1b6a98bbd paymod: Add a simple test-command to test the paymod state-machine
    @@ -158,8 +158,8 @@
     +++ b/tests/test_pay.py
     @@
      
    -     inv = invs[0]
    -     assert(inv['msatoshi_received'] >= amt)
    +     # l1 should still be alive afterwards.
    +     assert l1.rpc.getinfo()['id'] == l1id
     +
     +
     +@unittest.skipIf(not DEVELOPER, "paymod is only available to developers for now.")
 5:  1ac7a41e0 =  5:  3b10bf487 paymod: Generate type-safe accessor functions for modifier data
 6:  18e81ad28 =  6:  ad06cd95a paymod: Add a retry modifier that retries payments on failure
 7:  e09c5ab61 =  7:  a13d42a95 paymod: Implement bubbling results up the hierarchy of payments
 8:  e73d7ab6d =  8:  faa7f6c00 paymod: Implement getroute call
 9:  b0bb7987c =  9:  b432683e3 paymod: Add a `getinfo` call on payment_start to get the blockheight
10:  4047dea37 = 10:  0b18936b9 doc: Fix a documentation error in sendonion
11:  bb37a497e = 11:  f5bf73dff paymod: Implement legacy onion payload computation
12:  bd0a1a3d7 = 12:  8e68aadc6 paymod: Call `createonion` to generate the onion and associated data
13:  b91ac6782 = 13:  388e4d468 paymod: Parse createonion result and call sendonion
14:  20b12bf22 = 14:  4cd094937 paymod: Call waitsendpay on the sendonion we just queued
15:  06adaa6e5 = 15:  4e191d569 paymod: Parse and store the waitsendpay result
16:  947340a12 = 16:  76f824981 paymod: Collect and return results of a tree of partial payments
17:  50a69ddb4 = 17:  c144d88ce paymod: Use reasonable partids
18:  0f35f8ed2 = 18:  2bc1c8bd3 paymod: Maintain a list of current and past payments
19:  fcb46ade0 = 19:  20c22daab paymod: Do not reply to the same JSON-RPC command multiple times
20:  3112bd645 = 20:  eaed425c3 paymod: Do not pass cmd to sub-payments, plugin suffices
21:  6de4eaf02 = 21:  529faefcc tlv: Add generic getters and setters for tlvstream
22:  133b0d32c = 22:  9459e2d18 wiregen: Add enums for TLV types so we can call them by their name
23:  1787df06b = 23:  ace3b9e9f paymod: Implement TLV onion payload generation
24:  ae6d53e86 = 24:  0e9839d02 paymod: Move onion payload construction into its own function

cdecker added 5 commits June 29, 2020 13:54
The actual steps are mocked out, but we can already run through the various
phases of a payment, and the modifiers should be called for each state.
This commit can be reverted/skipped once we have implemented all the logic and
have feature parity with the normal `pay`. It's main purpose is to expose the
unfinished functionality to test it, without completely breaking the existing
`pay` command.
This should make it easy for JSON-RPC functions and modifiers to get the
associated data for a given modifier name. Useful if a modifier needs to
access its parent's modifier data, or in other functions that need to access
modifier data, e.g., when passing destination pointers into the `param()`
call.
cdecker added 19 commits June 29, 2020 13:54
This is likely a bit of overkill for this type of functionality, but it is a
nice first use-case of how functionality can be compartmentalized into
modifiers. If makes swapping retry mechanisms in and out really simple.
A payment is considered finished if it is in a final state (success or
failure) or all its sub-payments are finished. If that's the case we notify
`payment_finished` and bubble up through `payment_child_finished`, eventually
bubbling up to the root, which can then report success of failure back to the
RPC command that initiated the whole process.
This is necessary so we can build the absolute locktimes in the next step. For
now just fetch the blockheight on each (sub-)payment, later we can reuse the
root blockheight if it ends up using too much traffic.
Te `sendonion` docs where claiming that the `first_hop` needs to specify a
`channel_id` whereas it should really be the `node_id` of the peer we are
trying to contact.
This is just for testing for now, TLV payload computation will come next. We
stage all the payloads in deserialized form so modifiers can modify them more
easily and serialize them only before actually calling `createonion`.
After we gave each modifier a chance to have its say, we can now proceed to
generate the onion that's going to be sent in the next step.
This is necessary so we can later aggregate across an entire tree of attempts
and report success or failure to the RPC caller.
The status of what started as a simple JSON-RPC call is now spread across an
entire tree of partial payments and payment attempts. So we collect the status
in a single struct in order to report back success of failure.
We were just handwaving the partid generation, which broke some tests that
expected the first payment attempt to always have partid=0, so here we just
track the partids assigned in the payment tree, starting at 0.
We need to keep them around so we can inspect them later. We'll also need a
background cleanup every once in a while to free some memory. More on that in
a future commit.
We were passing the `cmd` instance around where `plugin` suffices (used to
start RPC calls and logs).
Trying to rework the TLV streams to have a more homogenous interface to work
with. This is by no means a complete implementation, just the groundwork that
is going to be used by the wire code generator to generate the specific
accessors, but it's enough so we can manipulate TLV streams in the onion and
later just switch to the generated ones.
Suggested-by: Lisa Neigut <@niftynei>
Signed-off-by: Christian Decker <@cdecker>
@cdecker cdecker added this to the v0.9.0 milestone Jun 29, 2020
@cdecker cdecker self-assigned this Jun 29, 2020
@rustyrussell
Copy link
Contributor

Ack 3412a69

@cdecker cdecker merged commit 6af5ab9 into master Jul 1, 2020
@rustyrussell rustyrussell deleted the paymod-01 branch August 15, 2022 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants