-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trampoline onion format (Feature 56/57) #836
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it, but I'm not sure I understand the purpose of the payment_secret except inside the internal onion for the final node?
04-onion-routing.md
Outdated
- MUST use a different `session_key` for the `trampoline_onion_packet` and the `onion_packet` | ||
- MUST include the `trampoline_onion_packet` tlv in the _last_ hop's payload of the `onion_packet` | ||
- MUST include the invoice's `payment_secret` in the _last_ hop's payload of the `trampoline_onion_packet` | ||
- MUST generate a different `payment_secret` to use in the outer onion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why include payment_secret at all in the outer onion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's for the case where trampoline nodes aggregate incoming MPP and then re-split differently to reach the next trampoline node.
This ensures they use the normal MPP validation code and no intermediate nodes can cheat. They could just rely on the total_amount
to verify they receive everything, but I like the fact that it works just like normal payments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you need total_msat from payment_data. But payment_secret here doesn't help much: the aggregating node can't know what the correct value is (and I can't see where in the spec you say it should be checked...).
An intermediate can probe by sending its own partial payment with some random payment_secret, so you can't really even say "they must all be the same...".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that in the invoice-based flow the recipient generated the payment_secret
and can check the correctness of the value. In the trampoline flow, it's the trampoline sender who generates one, and the recipient blindly accepts and just uses it to bundle together HTLCs from the same set.
But I think it still has some value because it does ensure that HTLCs from two different nodes will not end up being bundled together (and mess up the payment). If we have the following payment (with trampoline route Alice -> T1 -> T2 -> Bob
):
Alice -----> ... -----> T1 -----> I1 -----> I2 -----> T2 -----> ... -----> Bob
T1
generated a payment_secret
to send HTLCs to T2
. I2
is trying to interfere by generating his own (trampoline) payment to T2
. They will have a different payment_secret
, so they won't interfere with the ones T1
sent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the spec should make a recommendation for CLTV and fee budgets that senders of trampoline onions should use for each trampoline to reduce the usage of temporary_trampoline_failure
.
Also it seems reasonable to just reply with a single fee_budget_msat
instead of * [u32
:fee_base_msat
] and [u32
:fee_proportional_millionths
] as those fields seem not to make sense in the context of trampolines.
Trampolines could announce the fee_budget_msat
and cltv_budget
(in case you like the new name) that they believe to be sufficient in their node announcements?
04-onion-routing.md
Outdated
This error usually indicates that routes were found but failed because of | ||
temporary failures at intermediate hops. | ||
|
||
1. type: NODE|25 (`trampoline_fee_expiry_insufficient`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a recommendation that should be used initially by the sender for CLTV and fee budgets?
I understand from your comment in reply to @ecdsa at #829 (comment) that you suggest to have a trial and error mechanism with feedback from the trampoline node. I am worried if more than 1 trampoline node is involved that this might create a lot of unnecessary round trips as the sender would have to learn the fee and cltv budget for the first trampoline and then for the second one and so one. In the meantime the trampoline onion would always be cancelled in such roundtrips.
Maybe a recommendation for the sender to take the median CLTV for paths of up to n
hops lenghts and a fee budget for (multi)paths accordingly to start with might be helpful? I understand that especially the fee estimation might be tricky but I would prefer to start somewhere.
I also wonder if it makes sense for trampoline nodes to replicate the base_fee_msat
and fee_proportional_millionths
mechanism as trampolines have a total fee budget that they will allocate to deliver the payment and earn from what they have not used. Or did you plan that if a trampoline pays too much total base_fee_msat
but stays far below the fee_poportional_millionths
that you would send temporary_trampoline_failure
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the spec should make a recommendation for CLTV and fee budgets that senders of trampoline onions should use for each trampoline to reduce the usage of temporary_trampoline_failure.
The spec cannot do that, this is entirely up to node operators and implementations and we must let the market decide on good values here (which will likely change more often than the specification does).
Also it seems reasonable to just reply with a single fee_budget_msat instead of * [u32:fee_base_msat] and [u32:fee_proportional_millionths] as those fields seem not to make sense in the context of trampolines.
I don't see why they don't make sense? Base and proportional fees do apply to trampoline nodes, and it's more consistent that way. Trampoline nodes derive values for base and proportional fees by walking the graph outwards and adding the values of individual edges.
Overall your whole comment is addressed by what will come in a second step. My first proposal for trampoline (more than 2 years ago, see #654) did contain a mechanism to gossip trampoline fees and cltv. I decided to drop this part for now, because the most common way of using trampoline doesn't really need it, so it can be added later.
The simplest path to trampoline is to have the recipient include trampoline nodes in their invoices. It's the recipient's responsibility to compute the fees that each trampoline node will need to reach them (which is easy to do by just looking at a small subset of the graph) and include those fees in the invoice. Then the sender will either:
- directly send to those trampoline nodes (if the sender is able to compute a route to them): in that case there should never be a trampoline fee / cltv failure
- pick a first trampoline node close to them: in that case they do need to guess what fees and cltv they should give that first trampoline node to allow it to reach the second trampoline node. This may require a retry, but it requires only one, which is efficient enough (the second trampoline node should never require a fee / cltv update because the recipient computed the correct values)
My goal is to work on the gossip part once this first PR is accepted, and after getting some real-world feedback on what works well and what doesn't work in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec cannot do that, this is entirely up to node operators and implementations and we must let the market decide on good values here (which will likely change more often than the specification does).
What I meant was the idea if for example the cheapest path between two trampolines is of 3 hops the sender using these two trampoline nodes should include at least the fees that would be charged on such a path. Similarly with the cltv delay
which MUST at least meet that of the 3 channels. While this is a perfectly reasonable recommendation I just realized that it kind of defeats the purpose of Trampolines as the sender node wants to outsource the route computation. Also trampolines might utilize unannounced channels. So yes I agree it does not make sense to add a recommendation to the spec.
Also it seems reasonable to just reply with a single fee_budget_msat instead of * [u32:fee_base_msat] and [u32:fee_proportional_millionths] as those fields seem not to make sense in the context of trampolines.
I don't see why they don't make sense? Base and proportional fees do apply to trampoline nodes, and it's more consistent that way. Trampoline nodes derive values for base and proportional fees by walking the graph outwards and adding the values of individual edges.
Lets say I want to pay 200k sats and I add 50 sats base_fee
and a feerate of 1000
. This means that the trampoline node can charge a total of 250
sats. What happens if the trampoline node delivers the payment on channels charging a total of 220
sats for the fee rate and 20
sats for their base_fee
? I think the trampoline node would have to pay a total of 240
sats and stay below the total fee_budget
. It did however spend more on the fee_rate
. I think in such a situation the node should process the payment and be happy it earned 10
sats. This is similar to forwarding of regular onions where the difference of outgoing HTLC and incoming HTLC can be considered a fee budget for the routing node and needs to be smaller or equal to the result of the fee formular with fee_rate
and base_fee
.
If the above goal to forward the payment is the case, then why requiring two separate fees? It seems the final question is only whether the trampoline was able to deliver to the next trampoline while staying within the total fee budget. Also the base fee might complicate things in case the payment amount is larger and the trampoline node has to use MPP. If there was just a single fee rate for the trampoline and the underlying channels the allocation seems much more straight forward as one would effectively have a unit cost but I guess this is more a technical detail.
Overall your whole comment is addressed by what will come in a second step. My first proposal for trampoline (more than 2 years ago, see #654) did contain a mechanism to gossip trampoline fees and cltv. I decided to drop this part for now, because the most common way of using trampoline doesn't really need it, so it can be added later.
[...]
My goal is to work on the gossip part once this first PR is accepted, and after getting some real-world feedback on what works well and what doesn't work in practice.
It seems to me that we will end up having a gossip for trampolines at the end anyway which is why I elaborated above on the fee
and cltv
budget. I understand why you want to do the upgrade in smaller steps. I will checkout out #654 and come back to the gossip questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the base and proportional fee, I agree with @lightning-developer that it doesn't make sense to provide both. It makes sense to have both when we don't know the amount that will be paid and want to provide a formula that works for any amount, but here we're in the context of a specific payment, we know the amount and we can compute the exact fee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I'm currently providing both is explicitly for that: because it should be used for future payments of unknown amounts.
We don't have a gossip broadcast mechanism yet, but since we need to have a failure message, I believe it makes sense to make it compatible with future gossip. Whenever a payer receives such an error, they should store fee-base
and fee-proportional
, and use those values for their future payments through that trampoline node. It may be outdated by then (because we don't have yet a mechanism to receive new fees, but we will in the future), but it may still be accurate and win us a round-trip.
You should think of it as complementary to gossip (and in our case, a first, incomplete version of trampoline gossip). That's why I really believe it should be a formula and not a value that works for only one payment, because the formula is a superset of the single amount, so it's always superior, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would only work if the trampoline node uses the same fee structure regardless of the path used. I don't think it makes sense to advertise trampoline fees before we know what route to use, what if the best route is more expensive than the fee we've advertised? Do we refuse to route the payment? Do we increase the fees for everyone to make this case less frequent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial, imperfect idea was that trampoline nodes would do a BFS to compute how much it would cost them to reach any node that are at at most N hops (in terms of fee-base
and fee-proportional
). Then they would chose some statistical model to gossip values that would work most of the time while staying competitive. Sometimes the result would be that they receive more fees that they need, sometimes less; then it's an implementation choice to choose whether you sometimes route for less than your budget and make up for it with other payments, or send back an error asking for more fees (at a risk of not being chosen to route the retry).
Note that even returning a fee_budget
isn't guaranteed to work: that is only an estimation run by the trampoline node, but until the payment is actually tried, we can't know whether liquidity issues down the path will force us to raise the fees.
This is clearly an area where a lot more research would be useful, this is not an easy problem to solve as we work on very incomplete information. I assumed that it would provide the least friction to stick with the existing fee model, but that could be wrong.
I have a proposal to make this more future-proof. What about putting a tlv stream inside this error, that could contain either a fee_budget
tlv (that contains the exact fee that should be used for this payment) or a fees
tlv (that contains fee_base
and fee_proportional
)? Then it's up to implementations to choose whether they include only one of these or both of them, and we can very easily deprecate one field (or both) later when we have better real-world feedback on what fee estimation works for trampoline nodes?
We update the trampoline feature to match the official specification from lightning/bolts#836. We remove support for the previous version of trampoline, which means that when paying nodes that use the experimental version, we will use the trampoline-to-non-trampoline flow instead. Similarly, when older nodes pay updated nodes, they won't understand the new trampoline feature bit and will use the trampoline-to-non-trampoline flow. We update the trampoline-to-non-trampoline flow to remove the unused trampoline payload in the onion, which saves some space. Note that we don't want to officially specify this scenario, as it leaks some data about the recipient to the trampoline node. We rather wait for nodes to either support trampoline or blinded paths, which fixes this issue.
We update the trampoline feature to match the official specification from lightning/bolts#836. We remove support for the previous version of trampoline, which means that when paying nodes that use the experimental version, we will use the trampoline-to-non-trampoline flow instead. Similarly, when older nodes pay updated nodes, they won't understand the new trampoline feature bit and will use the trampoline-to-non-trampoline flow. We update the trampoline-to-non-trampoline flow to remove the unused trampoline payload in the onion, which saves some space. Note that we don't want to officially specify this scenario, as it leaks some data about the recipient to the trampoline node. We rather wait for nodes to either support trampoline or blinded paths, which fixes this issue.
Trampoline routing uses layered onions to trustlessly and privately offload the calculation of parts of a payment route to remote trampoline nodes. A normal onion contains a smaller onion for the last hop of the route, and that smaller onion contains routing information about the next trampoline hop. Intermediate trampoline nodes "fill the gap" by finding a route to the next trampoline node, and sending it the peeled trampoline onion, until that reaches the final destination.
@arik-so @valentinewallace I have added a draft spec of trampoline payments to blinded paths in 296ce21. It is probably a bit confusing because it depends on Bolt 12 types that are defined in the offers PR, and I should more explicitly spell out the requirement of which tlvs are included where, but combined with the discussions we had in this comment, we should be able to understand each other. I'm particularly looking for feedback on the shared secret extension I'm using for the recipient trampoline payload to include an ECDH with the Once we have a good enough rough consensus, I'll spend some time rebasing this PR, I'll re-write the requirements in a way that is similar to what was done in #1181 and more precise that what I've currently done, and I'll finalize the test vector. |
Awesome, thank you so much! Will update corresponding test vectors ASAP! |
We previously supported having multiple channels with our peer, because we didn't yet support splicing. Now that we support splicing, we always have at most one active channel with our peer. This lets us simplify greatly the outgoing payment state machine: payments are always made with a single outgoing HTLC instead of potentially multiple HTLCs (MPP). We don't need any kind of path-finding: we simply need to check the balance of our active channel, if any. We may introduce support for connecting to multiple peers in the future. When that happens, we will still have a single active channel per peer, but we may allow splitting outgoing payments across our peers. We will need to re-work the outgoing payment state machine when this happens, but it is too early to support this now anyway. This refactoring makes it easier to create payment onion, by creating the trampoline onion *and* the outer onion in the same function call. This will make it simpler to migrate to the version of trampoline that is currently specified in lightning/bolts#836 where some fields will be included in the payment onion instead of the trampoline onion.
We previously supported having multiple channels with our peer, because we didn't yet support splicing. Now that we support splicing, we always have at most one active channel with our peer. This lets us simplify greatly the outgoing payment state machine: payments are always made with a single outgoing HTLC instead of potentially multiple HTLCs (MPP). We don't need any kind of path-finding: we simply need to check the balance of our active channel, if any. We may introduce support for connecting to multiple peers in the future. When that happens, we will still have a single active channel per peer, but we may allow splitting outgoing payments across our peers. We will need to re-work the outgoing payment state machine when this happens, but it is too early to support this now anyway. This refactoring makes it easier to create payment onion, by creating the trampoline onion *and* the outer onion in the same function call. This will make it simpler to migrate to the version of trampoline that is currently specified in lightning/bolts#836 where some fields will be included in the payment onion instead of the trampoline onion.
We previously supported having multiple channels with our peer, because we didn't yet support splicing. Now that we support splicing, we always have at most one active channel with our peer. This lets us simplify greatly the outgoing payment state machine: payments are always made with a single outgoing HTLC instead of potentially multiple HTLCs (MPP). We don't need any kind of path-finding: we simply need to check the balance of our active channel, if any. We may introduce support for connecting to multiple peers in the future. When that happens, we will still have a single active channel per peer, but we may allow splitting outgoing payments across our peers. We will need to re-work the outgoing payment state machine when this happens, but it is too early to support this now anyway. This refactoring makes it easier to create payment onion, by creating the trampoline onion *and* the outer onion in the same function call. This will make it simpler to migrate to the version of trampoline that is currently specified in lightning/bolts#836 where some fields will be included in the payment onion instead of the trampoline onion.
We update the trampoline feature to match the official specification from lightning/bolts#836. We remove support for the previous version of trampoline, which means that when paying nodes that use the experimental version, we will use the trampoline-to-non-trampoline flow instead. Similarly, when older nodes pay updated nodes, they won't understand the new trampoline feature bit and will use the trampoline-to-non-trampoline flow. We update the trampoline-to-non-trampoline flow to remove the unused trampoline payload in the onion, which saves some space. Note that we don't want to officially specify this scenario, as it leaks some data about the recipient to the trampoline node. We rather wait for nodes to either support trampoline or blinded paths, which fixes this issue.
When paying a Bolt 12 invoice, the payer may use a trampoline node to relay that payment. The payer simply includes some of the blinded paths in the onion payload for the trampoline node, who will relay to those blinded paths. The trampoline node doesn't learn anything about the final recipient. We only support using a single trampoline node, because we must provide the blinded paths in the outer onion, instead of the trampoline onion. If we included them in the trampoline onion, the trampoline node would not have enough space in the outer onion to correctly relay the payment. If the recipient supports trampoline and the `invoice_request` contains the trampoline feature bit, the recipient may set it in its invoice. In that case, the sender can include a trampoline onion to provide custom TLVs to the recipient. We prevent the trampoline node from replacing that onion with one that it created by using a shared secret created from the `invoice_request` to authenticate that onion. Note that this commit depends on Bolt 12: it references a few types that are introduced in various commits related to Bolt 12 (e.g. `blinded_path` and `blinded_payinfo`), which can be confusing since Bolt 12 spec is still in progress. I will clean this up once Bolt 12 is finalized.
296ce21
to
b4405d8
Compare
@arik-so I slightly changed the last commit:
|
There is one point that is worth discussing here: do we want to include an extension for Bolt 11 to introduce trampoline routing hints? If we want to be able to make non-blinded trampoline payments between wallets that have only private channels (e.g. Alice -> LSPA -> ... -> LSPB -> Bob), I see two possible options. The first one is that we don't introduce any new routing hint, but if the invoice contains the (optional) trampoline feature bit, Alice assumes that nodes included in Bob's routing hints support trampoline and uses them as such. In the example above, Alice would use LSPA as her trampoline node and LSPB as Bob's trampoline node, and assume that LSPB is able to route to Bob. This option is nice because it doesn't increase QR code size, but when receiving wallets provide multi-hop routing hints, Alice may get that heuristic wrong. In practice I don't think this is an issue, in the worst case Alice could make several attempts where she assumes that different nodes in the routing hints support trampoline, until one works. The second option is that we introduce a trampoline routing hint field to Bolt 11 invoices, that are very similar to routing hints but don't include a Thoughts? |
On our end we only care about trampoline with blinded path destinations, really. I don't see much reason to care about trying to add a second trampoline hop of the recipient's LSP when they aren't using blinding, if the sender wants privacy from their trampoline hop(s), they should just add more trampoline hops! |
We previously supported having multiple channels with our peer, because we didn't yet support splicing. Now that we support splicing, we always have at most one active channel with our peer. This lets us simplify greatly the outgoing payment state machine: payments are always made with a single outgoing HTLC instead of potentially multiple HTLCs (MPP). We don't need any kind of path-finding: we simply need to check the balance of our active channel, if any. We may introduce support for connecting to multiple peers in the future. When that happens, we will still have a single active channel per peer, but we may allow splitting outgoing payments across our peers. We will need to re-work the outgoing payment state machine when this happens, but it is too early to support this now anyway. This refactoring makes it easier to create payment onion, by creating the trampoline onion *and* the outer onion in the same function call. This will make it simpler to migrate to the version of trampoline that is currently specified in lightning/bolts#836 where some fields will be included in the payment onion instead of the trampoline onion.
We update the trampoline feature to match the official specification from lightning/bolts#836. We remove support for the previous version of trampoline, which means that when paying nodes that use the experimental version, we will use the trampoline-to-non-trampoline flow instead. Similarly, when older nodes pay updated nodes, they won't understand the new trampoline feature bit and will use the trampoline-to-non-trampoline flow. We update the trampoline-to-non-trampoline flow to remove the unused trampoline payload in the onion, which saves some space. Note that we don't want to officially specify this scenario, as it leaks some data about the recipient to the trampoline node. We rather wait for nodes to either support trampoline or blinded paths, which fixes this issue.
We update our trampoline payments to blinded paths to match the official specification from lightning/bolts#836. The blinded paths and recipient features are included in the trampoline onion, which potentially allows using multiple trampoline hops. That was already what we were doing with experimental TLVs, so we simply update the TLV values to match the spec values.
We add the ability to pay recipients that support trampoline *and* blinded paths. We include the blinded path data in the trampoline payloads for each node inside the blinded path. This doesn't reveal unnecessary information to the trampoline node: this is specified in details in lightning/bolts#836.
We previously supported having multiple channels with our peer, because we didn't yet support splicing. Now that we support splicing, we always have at most one active channel with our peer. This lets us simplify greatly the outgoing payment state machine: payments are always made with a single outgoing HTLC instead of potentially multiple HTLCs (MPP). We don't need any kind of path-finding: we simply need to check the balance of our active channel, if any. We may introduce support for connecting to multiple peers in the future. When that happens, we will still have a single active channel per peer, but we may allow splitting outgoing payments across our peers. We will need to re-work the outgoing payment state machine when this happens, but it is too early to support this now anyway. This refactoring makes it easier to create payment onion, by creating the trampoline onion *and* the outer onion in the same function call. This will make it simpler to migrate to the version of trampoline that is currently specified in lightning/bolts#836 where some fields will be included in the payment onion instead of the trampoline onion.
Trampoline routing uses layered onions to trustlessly and privately offload the calculation of parts of a payment route to remote trampoline nodes.
A normal onion contains a smaller onion for the last hop of the route, and that smaller onion contains routing information about the next trampoline hop.
Intermediate trampoline nodes "fill the gap" by finding a route to the next trampoline node, and sending it the peeled trampoline onion, until that reaches the final destination.
This PR details the onion construction and requirements for supporting nodes. I advise readers to also have a look at #829 which gives a more high-level view of the different components, how they interact, and provides nice diagrams that help understand the low-level details.