Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft: HTLC Endorsement to Mitigate Channel Jamming #1071

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

carlaKC
Copy link
Contributor

@carlaKC carlaKC commented Apr 28, 2023

This PR introduces an endorsed TLV to update_add_htlc as a way for nodes to indicate whether they expect a HTLC to resolve "honestly". Nodes are advised to allocate a limited portion of their outbound liquidity and slots to HTLCs that are not endorsed by peers that they consider to have high reputation.

Opening early for discussion on structure, not ready for review - discussions around recommendations for local reputation scoring are ongoing.

Slides for the visually-minded here

@carlaKC carlaKC force-pushed the jamming-endorsement-specification branch from 591e524 to db043e9 Compare May 11, 2023 20:25
@carlaKC carlaKC force-pushed the jamming-endorsement-specification branch 3 times, most recently from ec6eb65 to a7075f7 Compare May 15, 2023 14:57

### Rationale
If a HTLC is endorsed by a peer they have signaled that they expect the HTLC
to resolve honestly, so will be held accountable for the manner in which they
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this again now, it makes me think of https://lists.linuxfoundation.org/pipermail/lightning-dev/2023-February/003842.html. But that was of course for the sender.

@@ -1407,6 +1438,88 @@ The _origin node_:
- MAY use the data specified in the various failure types for debugging
purposes.

## Recommendations for Reputation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is helpful, makes it much more concrete what to think of when talking about a reputation system in the context of a routing node.

04-onion-routing.md Outdated Show resolved Hide resolved
04-onion-routing.md Outdated Show resolved Hide resolved
04-onion-routing.md Outdated Show resolved Hide resolved
04-onion-routing.md Outdated Show resolved Hide resolved
04-onion-routing.md Outdated Show resolved Hide resolved
@carlaKC carlaKC force-pushed the jamming-endorsement-specification branch from a7075f7 to 6e221f8 Compare May 16, 2023 18:29
for `resolution_time` incurred.
- `fees`: the fees paid by a forwarded HTLC (as described in [BOLT #7](07-routing-gossip.md#htlc-fees),
equal to 0 if the HTLC was not fulfilled).
- `opportunity_cost`: `ceil ( (resolution_time - resolution_period) / resolution_period) * fees`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a worst-case resolution_period which is by far larger than resolution_time I can create a big enough negative opportunity_cost along a route.

Given that LN payments unfold over a big route, if I control the source & destination of a payment I can decide for how long to hold the payment for.

Given this layout

  • A -- B -- C -- D
  • I control A & D
  • B and C are some very big routing nodes

We consider the past relationship of B and C to be very good (i.e when B forwards to C then they will be endorsed because they have accumulated a lot of effective_fees over the window of interest)

I can follow these steps:

  • I make a few good payments from A to D until I notice that endorsed is turned on (effectively meaning that B now endorses me)
  • Given access to the endorsed slots, I now create one or multiple payments from A to D that upon success would pay a big amount of fees.
  • After I receive the endorsed HTLCs (that correspond to the above payments) on D I hold on to them for as long as possible by not releasing the preimage.
  • Just before the CLTV timeout I fail the payment.
  • A -- B, B -- C, C -- D all damage their local reputation by fees - opportunity_cost, where opportunity_cost can be a big enough multiple of fees.

If the above flow is feasible, then a node that just earned the endorsed flag by their peer can now cause them reputation damage by far greater than what it cost them (in fees) to earn that flag.

Copy link

@GeorgeTsagk GeorgeTsagk May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To mitigate this possible attack, perhaps opportunity_cost formula can have an upper limit?

Something that can still allow it to create damage on effective_fees (in order to maintain the earn slow / lose fast attribute) but not large enough to cause significant damage on other links further into the route, links with much stronger reputation relationships.

04-onion-routing.md Outdated Show resolved Hide resolved
04-onion-routing.md Outdated Show resolved Hide resolved
@carlaKC carlaKC force-pushed the jamming-endorsement-specification branch from 2600b6f to 9b97e28 Compare June 16, 2023 19:02
Copy link
Collaborator

@rustyrussell rustyrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be reduced to two variables per (outgoing) channel: reputation and exhaustion cost.

So, if incoming endorsed and outgoing reputation is greater than exhaustion cost of channel*:

  • outgoing endorsed = 1
    Otherwise:
  • outgoing endorsed = 0

If outgoing endorsed is 0:

  • reduce the effective max_htlc_value_in_flight_msat and max_accepted_htlcs of the outgoing channel by 50% for purposes of this htlc.
    Otherwise:
  • reduce outgoing reputation by fee * ( cltv_expiry - current block height) * 600,
  • Record the start time of the HTLC.

When the HTLC is resolved:

  • Record the end time of the HTLC.
  • If outgoing endorsed was 0:
    • If the HTLC was successful, and the end time - start time was less than 60 seconds
      • Increase the outgoing reputation by 50% of the htlc fee
  • Otherwise (endorsed = 1):
    • Increase the reputation by fee * ( cltv_expiry - current block height) * 600
    • If the end time - start time < 60 seconds and the HTLC was successful
      • Increase reputation by (end time - start time - 60) / 60) * fee

Every 1 days (or X blocks?)

  • Reduce all outgoing reputations by 1% (? depends on how long you're aiming for, see below)

  • I didn't see this clearly spelled out in the draft, but on the call you used these terms. Ideally, it's "how much money did this channel make in the last max-ctlv-delta-allowed blocks", but practically it's probably a decaying average:
  • If HTLC is successful:
    • Add fee to exhaustion cost of outgoing channel
  • Every block:
    • Multiply the exhaustion cost of the outgoing channel by 1 - (1 - n)^(1/n) where n is the max ctlv-delta you allow.
    • (ChatGPT tells me that's how you calculate the exp decay factor, but I haven't run tests to check...)

`max_accepted_htlcs`.
- MUST choose `unknown_allocation_liquidity` <= the remote channel peer's
`max_htlc_value_in_flight_msat`.
- If `endorsed` is set to 1 in the incoming `update_add_htlc` AND the HTLC
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably explicitly allow (and ignore!) the other bits for future use. So this should be:

"If endorsed is non-zero in the incoming..."

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason the endorsement is defined as a binary value, rather than a real number [0, 1], which may be more aptly called a reputation_weight or confidence_score?

While node’s may choose to use simpler implementations to begin (i.e binary signals and bucketed HTLCs), encoding more information in this field would allow for more precise reputation algorithms to develop independently over time without requiring a protocol change. For example, consider a long path constructed by a malicious actor:

A -> B -> … -> Y -> Z

Using only a binary signal, HTLC endorsement might get propagated through every hop. But if this is any real number, the confidence score can naturally decline and the attacker stands to lose the most reputation.

Comment on lines 1448 to 1452
Peers build reputation by forwarding successful HTLCs that resolve quickly, and
lose reputation if they endorse failing or slow-resolving HTLCs. Reputation is
only _negatively_ affected if an endorsed HTLC resolves undesirably, to hold
nodes accountable for their endorsement signal while still allowing them to
forward unendorsed HTLCs that they are not certain about.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this statement is true, because of the following scenario (let me know if I'm missing the point entirely):

  • A wants to send a payment to D: A -> B -> C -> D
  • the A -> B and B -> C channels are empty: A and B both endorse the HTLC
  • the C -> D channel is full (because of payments coming from unrelated nodes, e.g. E -> C -> D)
  • when C receives the HTLC, even it's fully endorsed and reputable, C has to fail the payment
  • when B receives the failure, B will then decrease A's reputation

Is the negative impact on A an issue? Is it something that can be abused? Can we do something about it (should we)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick failure will not harm your reputation.
Only slow resolving endorsed HTLC can harm your reputation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I believe that's one of the main differences between @thomash-acinq's proposal and this one, it's hard to evaluate which is the right choice.

Copy link
Contributor

@Crypt-iQ Crypt-iQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments/questions:

  • if I'm an honest new node and the network is jammed, I think you can have a trust-based pay-for-endorsement scheme so they could send payments. I don't think it needs to be described here though
  • if I have the topology A---B---C it seems like C can grief A's reputation with B?:
    • A and B are both honest, C is malicious
    • A sends an endorsed payment through B to C
    • C holds the payment for as long as possible and then fails it back
    • B punishes A for sending this endorsed payment that turned out to be jam-like

04-onion-routing.md Outdated Show resolved Hide resolved
@bshramin
Copy link

Hi, I find this method of mitigating HTLC jamming quite interesting, however, I have one question.
What will be the impact of using a reputation system on local channels on centralization of the network?

Previously a node could achieve a higher payment success rate if it had more channels to more nodes in the network and it would possibly achieve more privacy if it utilized different nodes to route its payments through.
However, when this reputation system is implemented can it incentivize nodes in some cases to open and utilize fewer channels (in extreme cases only one) to gain more reputation over that one and therefore achieve a higher payment success rate? especially for nodes that don't forward many HTLCs.

which capture the fees that it paid and the opportunity cost that holding it
for `resolution_time` incurred.
- `fees`: the fees paid by a forwarded HTLC (as described in [BOLT #7](07-routing-gossip.md#htlc-fees),
equal to 0 if the HTLC was not fulfilled).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm not understanding something. If fees are equal to zero for unfulfilled HTLCs, then it means that opportunity_cost is also zero. Does this mean failed HTLCs won't result in losing reputation?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For zero fees a default ppm will be assumed (100ppm?) in order to bypass this.

Also, if we're talking about a fast failing HTLC then even if we have fees the loss is going to be zero (can be seen on opportunity_cost formula)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that for when the payment is fulfilled but fees sent by the sender are zero? Or for the case mentioned here equal to 0 if the HTLC was not fulfilled?

@thomash-acinq
Copy link

A lot of the discussion revolves around the specific reputation scheme proposed here, however I don't think that this should be part of bolts which only describe rules for communication between peers. While it is crucial to find a good way to compute reputation, this topic is already discussed elsewhere (mailing list, meetings), we should focus here on the actual spec change: a way to signal to the next node how confident we are that this HTLC will succeed.
Different peers could even compute reputation differently as long as we agree that an endorsed value of 0 means that we have a low confidence that the HTLC will succeed and 1 means that we have a higher confidence it will succeed.

The questions that need answering here are:

  • Do we agree that it's a good idea to transmit some information about our own assessment of the HTLC to the next peer?
  • How much do we want to transmit? Just one bit as suggested here or more?

I personally think that it is useful to transmit our confidence to the next peer and that the more precision we give, the more useful it is. However too much precision could be a privacy leak (if you receive two HTLCs with the same confidence, it probably means that they followed the same path and came from the same sender) so I think that having 8 confidence buckets (3 bits of information) would be a good compromise.

Copy link
Contributor

@ProofOfKeags ProofOfKeags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think that resource bucketing can make sense as an MVP for how to interpret the endorsement mechanic laid out in BOLT2, I find myself resistant to this being in the main BOLT sections. Even with the designation of "MAY", I think this is better suited to be an extension BOLT or perhaps even a BLIP.

The reason for this is that you state in the proposal that reputation is a local phenomenon. Each node not only gets to make a decision for how to measure reputation and how to update the priors based on activity, but also probably ought to be free to select among a nigh infinite number of slot/liquidity allocation strategies between endorsed and unendorsed HTLCs.

In a prior conversation you had explained that the endorsement mechanic requires a strategy for how that endorsement can be used to mitigate jamming to demonstrate the utility of endorsements at all. I agree with this assessment, but I still find the particular strategy in the proposal to be lacking (more on that below). However, there's nothing intrinsically broken about the resource bucketing strategy you present, it just is probably far more rudimentary than a mature deployment of this would look like.

Further, because this decision is ultimately a local one and the notion of reputation is also local, no matter what strategy you present here, even if it's one that all of us are delighted by, we should expect nodes on the network to experiment and deploy their own solutions. Because they can, and because better solutions to this problem will yield better risk-adjusted returns, we can also expect that a portion of these strategies will be proprietary as well.

Considering all of these factors, I think it is more appropriate to consider the resource allocation strategy as a recommendation, and should probably be placed into an appendix of some kind, be it an extension BOLT, a BLIP or otherwise. I could be misunderstanding the scope and responsibilities of the main BOLTs but if I was trying to bootstrap another implementation, I would be required to understand the endorsement specification to be compatible with the rest of the network, but I would have no need whatsoever to implement the exact reputation and resource allocation strategies to remain compatible with the network.

With the more organizational critiques out of the way, I will motivate where I'm coming from with my exact concerns with resource bucketing in general. All resources in economics have marginal value, meaning that each additional unit of that resource you consume costs more than the last one (in real terms). The resources you've identified here (slots and liquidity) are no different. As a result, the "real cost" of allocating the last slot or sat of liquidity is greater than the first. There is no way to set the parameters described in this section to accurately model this phenomenon. What if I want to have the required reputation for forwarding increase as my available resources decreases?

Ok, that's the cost/risk side of the equation, but what about the benefit/reward?

As a routing node, every decision to accept an HTLC is essentially granting an option on a liquidity trade that trades liquidity on the downstream link for liquidity on the upstream link, with a probable increase in the total liquidity (taken as fees). The jamming problem represents a risk in being able to execute that trade to completion. In any scenario where we are taking risks for potential benefits, it makes little sense to analyze the risks (which this proposal does with the endorsement mechanic and reputation recommendations) without also considering the potential benefits. This proposal ignores the potential benefits of forwarding an HTLC (chiefly the fees).

It can make sense for me as a node operator to let a node with lower reputation offer an HTLC forward with a large fee, when I'd be hesitant to do so at a lower fee. Similar to the way that higher interest rates are charged for borrowers with lower credit scores, we need not deny a forwarding request simply because the upstream link doesn't have the reputation we'd want.

So to summarize my criticisms of the resource bucketing strategy, it comes down to two things: 1. It does not account for the continuously variable nature of the costs of offering the slots/sats, 2. It does not account for the potential benefits of forwarding the HTLC. That said, I don't think it's reasonable to require an airtight algorithm that takes these things into account for the endorsement mechanic to be a useful improvement to the status quo. I also don't have any issue with large swaths of the network deploying this strategy and seeing if it improves jamming incidence rates. Despite its incompleteness in modeling the incentives of the operator, it may be a dramatic improvement over today, I don't know. However, because of this incompleteness, I don't think it should be in the part of the spec that I view to be required for interop with the rest of the network.

02-peer-protocol.md Outdated Show resolved Hide resolved
@thomash-acinq
Copy link

While I think that resource bucketing can make sense as an MVP for how to interpret the endorsement mechanic laid out in BOLT2, I find myself resistant to this being in the main BOLT sections. Even with the designation of "MAY", I think this is better suited to be an extension BOLT or perhaps even a BLIP.

Agreed.

It can make sense for me as a node operator to let a node with lower reputation offer an HTLC forward with a large fee, when I'd be hesitant to do so at a lower fee. Similar to the way that higher interest rates are charged for borrowers with lower credit scores, we need not deny a forwarding request simply because the upstream link doesn't have the reputation we'd want.

That's very dangerous as an attacker can trivially exploit this: they just need to offer very high fees to compensate for their bad reputation (it doesn't cost them anything because they don't intend to actually pay the fees, they will just fail the HTLC).

So to summarize my criticisms of the resource bucketing strategy, it comes down to two things: 1. It does not account for the continuously variable nature of the costs of offering the slots/sats,

That's only a limitation of this specific algorithm to assign reputation, which as you said should not be part of the spec. However even when using a continuous reputation scheme, the binary endorsement forces you to discretize to 0 or 1. That's why I'm suggesting to replace the binary endorsement with a confidence value on 3 bits. A fully continuous value could be a privacy leak but I think that 3 bits is a good balance between the 1 bit of this proposal and a fully continuous value.

@ProofOfKeags
Copy link
Contributor

ProofOfKeags commented Jul 5, 2023

That's very dangerous as an attacker can trivially exploit this: they just need to offer very high fees to compensate for their bad reputation (it doesn't cost them anything because they don't intend to actually pay the fees, they will just fail the HTLC).

This is far from a trivial exploit. It is already the case that the attacker has no way to know what their reputation is with respect to their peers. For them to be able to exploit it, they would need to know what your threshold for endorsement is, which isn't a publicly knowable thing. Additionally, even while offering high fees for offered HTLCs does not guarantee the loss of those sats, it is still a capital outlay requirement that can reduce the reach of these attacks as well as well as reduces the attacker's bandwidth to accomplish them. That said, I'd imagine the reduction in effectiveness of the attack as a result of this increased cost is probably marginal at best, but this was also not suggested as a security scheme, I was simply pointing out that we cannot ignore the reward side of the incentive scheme when considering a node operator's interests.

That's only a limitation of this specific algorithm to assign reputation, which as you said should not be part of the spec. However even when using a continuous reputation scheme, the binary endorsement forces you to discretize to 0 or 1.

I actually think that this is a good thing. By forcing nodes to make a decision between 0 or 1 at the protocol level, you force the inputs to that decision to be a private matter, which ultimately it is. The node operator can either choose to tie its reputation to an HTLC or not.

That's why I'm suggesting to replace the binary endorsement with a confidence value on 3 bits. A fully continuous value could be a privacy leak but I think that 3 bits is a good balance between the 1 bit of this proposal and a fully continuous value.

I think that this convolutes things in a way that conceals the real dynamic in play. It is not the role of the endorser to "proxy" the reputation of its peers. The role of the endorser is to tie its own reputation to the HTLC it is offering. It is hard to understand how else to interpret the endorsement mechanic if it is allowed to have any more than 1 bit of signaling. Let's say we have 3 bits as you suggest, what happens if I endorse it to a level of 001 (000 being lowest and 111 being highest), and then the HTLC fails? What if the HTLC succeeds? What is my peer even trying to tell me when it gives a "partial endorsement"? The other issue with a continuous value is that it can basically be used as a measurement for how close to the payment source you are. Why would I endorse someone else's HTLC at a higher level than the upstream link did? Why wouldn't I ever endorse my own HTLC as 111?

Ultimately I believe the forced discretization of the endorsement is a good thing. In fact I believe that simply specifying that and having some discussion and recommendations around possible ways of interpreting endorsement (or non-endorsement), is enough for this proposal to be self-justifying and complete. I believe that the specifics of how to measure reputation and how to allocate HTLC slots/sats based off of reputation is beyond the scope of what this specification should offer.

Very often when we provide libraries we may also provide code examples to demonstrate how to use it, and I believe the resource bucketing scheme and ideas on how to measure and update reputation should not be viewed as anything more significant than a spec level code example. Compliance with these suggested schemes is neither enforceable nor can we expect nodes to adopt the same behaviors, so it really ought to be considered as a demo use of that endorsement bit.

Copy link
Contributor

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK 9b97e28

The problem is so big that a deep evaluation of the solution is a lot time-consuming for people not daily thinking about this problem (at least for me), so I like the approach to start with the peer's reputation and see how the network reacts to this.

So, I will ack the approach and start to write some code for this in cln and then evaluate some real data. I guess the more difficult part here is the reputation algorithm

P.S': Some small nits found while reading have been reported.
P.S'': I agree that the reputation should be separate from the BOL. I have started the lnmetrics.rfc for this particular reason.

HTLC resolution time is assessed relative to a threshold that the node
considers to be a reasonable amount of time for a HTLC to resolve:
- `resolution_period`: the amount of time a HTLC is allowed to resolve in that
is classified as "good" behavior, expressed in seconds (default: 60 seconds).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should mention why 60 seconds, iirc this is the default mpp timeout! We should report it there

successful, fast resolving HTLCs during the `resolution_time` the HTLC was
locked in the channel.

For every resolved incoming HLTC a peer has forwarded through a node, its
Copy link
Contributor

@vincenzopalazzo vincenzopalazzo Jul 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For every resolved incoming HLTC a peer has forwarded through a node, its
For every resolved incoming HTLC a peer has forwarded through a node, its

nit, there are other few around the doc

@@ -995,7 +995,10 @@ is destined, is described in [BOLT #4](04-onion-routing.md).
1. type: 0 (`blinding_point`)
2. data:
* [`point`:`blinding`]

1. type: 1 (`endorsed`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. type: 1 (`endorsed`)
1. type: 3 (`endorsed`)

Can we move this to a new optional/required pair (type 2/3)?

Copy link
Contributor

@morehouse morehouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good work, and clearly a lot of thought has been put into the reputation algorithm.

I've found a couple weaknesses in the current design, which hopefully we can fix to make the mitigation more robust.

Hodling for fun and profit

By design, reputation is costly to build but easily lost. In particular, HTLCs that take longer than 90s to resolve will decrease reputations of any nodes that endorsed them. And for every additional 90s it takes to resolve the HTLC, reputations are decreased further.

So any node on the network that receives regular HTLC traffic can hodl HTLCs to destroy reputations of the upstream nodes.

Attack scenarios

Routing nodes, merchants, and LSPs on the network can exploit this weakness to destroy reputations of their competitors, essentially for free. Once reputations have been sufficiently destroyed, the competitors' channels can then also be jammed for ~0 cost.

A simple attack scenario could look like this:

  1. EviLSP and HonestLSP are LSPs competing with each other. Both LSPs run lightning nodes that are well connected with the rest of the network, and the LSPs also have a direct channel with each other.
  2. EviLSP starts to hodl all high-value HTLCs coming from HonestLSP. Just before the HTLCs approach their expiry, EviLSP forwards them on to the next node.
  3. As HTLCs that have been in flight for hours start to settle, HonestLSP rapidly slashes the reputation scores of all its upstream channel peers.
  4. EviLSP uses another lightning node to jam all of HonestLSP's channels.
  5. In followup PR, EviLSP claims their node had a temporary glitch causing delayed processing but that everything is fine now and at least their service is working better than HonestLSP. HonestLSP's users start switching to EviLSP.

Alternatively, EviLSP could be sneakier:

  1. EviLSP occasionally hodls HTLCs forwarded to them from HonestLSP. The hodl frequency and duration is set high enough to have a negative influence in the reputation algorithm but low enough to not raise HonestLSP's suspicion.
  2. After a few days or weeks, HonestLSP has slowly decreased the reputation scores of its upstream channel peers and no longer allows those peers to access its privileged slots.
  3. EviLSP uses another lightning node to jam all of HonestLSP's channels.

Mitigation

I haven't come up with any great ideas to mitigate this weakness. Hopefully we can get more people thinking about this problem and potential solutions.

Reputation multiplier effect

Because in-flight risk is calculated separately for each pair of incoming and outgoing channels, an attacker can exploit network topology to cause more jamming damage than they paid for while gaining reputation. See inline comment for more details.

Mitigation

If in-flight risk is calculated per incoming channel only (ignoring the outgoing channel), or simply per upstream node (which makes sense when multiple channels exist between two nodes), then the multiplier effect disappears.

recommendations/local-resource-conservation.md Outdated Show resolved Hide resolved
carlaKC added 2 commits August 8, 2024 14:00
Add an endorsement field to allow nodes to signal whether a HTLC is
expected to resolve quickly or is unknown to the forwarder. The
addition of this field allows for the introduction of local reputation
tracking that still allows new and unknown entrants access to resources.
@carlaKC carlaKC force-pushed the jamming-endorsement-specification branch from 39f3c99 to 80dba2c Compare August 8, 2024 18:02
@PurpleTimez
Copy link

Hodling for fun and profit

By design, reputation is costly to build but easily lost. In particular, HTLCs that take longer than 90s to resolve will > decrease reputations of any nodes that endorsed them. And for every additional 90s it takes to resolve the HTLC, > reputations are decreased further.

So any node on the network that receives regular HTLC traffic can hodl HTLCs to destroy reputations of the upstream > nodes.

Attack scenarios

Routing nodes, merchants, and LSPs on the network can exploit this weakness to destroy reputations of their competitors, > essentially for free. Once reputations have been sufficiently destroyed, the competitors' channels can then also be > jammed for ~0 cost.

A simple attack scenario could look like this:

EviLSP and HonestLSP are LSPs competing with each other. Both LSPs run lightning nodes that are well connected with > the rest of the network, and the LSPs also have a direct channel with each other.
EviLSP starts to hodl all high-value HTLCs coming from HonestLSP. Just before the HTLCs approach their expiry, EviLSP > forwards them on to the next node.
As HTLCs that have been in flight for hours start to settle, HonestLSP rapidly slashes the reputation scores of all its > upstream channel peers.
EviLSP uses another lightning node to jam all of HonestLSP's channels.
In followup PR, EviLSP claims their node had a temporary glitch causing delayed processing but that everything is fine now > and at least their service is working better than HonestLSP. HonestLSP's users start switching to EviLSP.
Alternatively, EviLSP could be sneakier:

EviLSP occasionally hodls HTLCs forwarded to them from HonestLSP. The hodl frequency and duration is set high enough > to have a negative influence in the reputation algorithm but low enough to not raise HonestLSP's suspicion.
After a few days or weeks, HonestLSP has slowly decreased the reputation scores of its upstream channel peers and no > longer allows those peers to access its privileged slots.
EviLSP uses another lightning node to jam all of HonestLSP's channels.
Mitigation

I haven't come up with any great ideas to mitigate this weakness. Hopefully we can get more people thinking about this > problem and potential solutions.

FYI - I believe this vector of attack for 3-party topology of lightning nodes (HonestLSP <-> EvilLSP <-> upstream peers) and
onliness equivocation have already been mentioned in the context of channel jamming discussions.

See the email thread "Hold fee rates as DoS protection (channel spamming and jamming)" were long-delay applications such as atomic onchain / offchain swaps (e.g lightning loops) are mentioned, and how a time-independent hold feerate has been already suggested as a mitigation.

Reputation multiplier effect

Because in-flight risk is calculated separately for each pair of incoming and outgoing channels, an attacker can exploit network topology to cause more jamming damage than they paid for while gaining reputation. See inline comment for more details.

Mitigation

If in-flight risk is calculated per incoming channel only (ignoring the outgoing channel), or simply per upstream node (which makes sense when multiple channels exist between two nodes), then the multiplier effect disappears.

I believe the downside of aggregated reputation for a N number of incoming channels, wherever they're associated to a unique lightning node or not have already been considered in the past, with the controlled or uncontrolled scenarios.

See the digest post "Channel Jamming" documentation on bitcoin-problems made by one of the author of the "Unjamming Lightning" paper, from which I believe this draft is partially inspired.

I think even more sneakier hodling for fun and profit style of exploitation is leveraging multi-path payment and the fact that there are gossiped htlc_minimum_msat values associated to each LSP incoming routing channels.

We define the following parameters:
* `resolution_period`: the amount of time a HTLC is allowed to resolve in that
classifies as "good" behavior, expressed in seconds. The recommended default
is 90 seconds (given that the protocol allows for a 60 second MPP timeout).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to have an additional indication on the definition of the clock used to tick the resolution_period second and from which all the opportunity_cost computation are scaled on.

E.g, the Epoch which is defined on unix systems as "1970-01-01 00:00:00 +0000 (UTC), it's not a perfect clock though it's better than nothing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment to be re-evaluated in light of finding here: #1071 (comment)

@carlaKC
Copy link
Contributor Author

carlaKC commented Aug 26, 2024

Thanks for the detailed review + writeup @morehouse 🙏

Hodling for fun and profit

This is possible because our reputation algorithm only accounts for the risk of an incoming peer jamming our outgoing channel, which is clearly insufficient to cover downstream attacks like this. Pretty nasty when the attacker doesn't need to send any payments themselves. It seems reasonable to reverse this logic to consider reputation in both directions when we receive a HTLC:

  • Reputation of the incoming link vs the value of the outgoing link (this is what we currently do)
  • Reputation of the outgoing link vs the value of the incoming link (add this to account for downstream direction)

We're currently looking into this and running some experiments, aiming to give some more meaningful analysis of how bi-directional reputation works for attacks like this (be)for(e) the summit.

A few things that came up while discussing this attack which are tempting but probably not helpful:

  1. Attributable errors/latency aware routing: even if we lived in a world where senders penalize nodes that hold HTLCs for too long, the attacker can still be successful if they can fool many senders just once. I think it's reasonable to expect that zero fee / well positioned channels would be able to "sink" honest traffic, so we can't rely on pathfinding to avoid this.
  2. Monetary solutions: we're primarily dealing with slow jamming attacks here, because the attacker isn't in charge of payments being dispatched (so all they can do is hold them for long time). My instinct is that any monetary solution would price out honest users far before we compensate the node targeted.

Reputation multiplier effect

Addressed in latest push (spec was outdated), thanks for flagging!

Copy link

@PurpleTimez PurpleTimez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed at 80dba2c, including lrc's implem at e87ae62.

It's a dense proposal, and overall quite solid.

even if we lived in a world where senders penalize nodes that hold HTLCs for too long, the attacker can still be successful if they can fool many senders just once.

+1, No global clock in lightning to get reliable attributable errors / latency aware routing.

My instinct is that any monetary solution would price out honest users far before we compensate the node targeted.

On-chain fees paid by honest users to open chan could be used as part of the target node compensation. But it's a more complicated story...

by high reputation nodes.

Sequence:
* The `update_add_htlc` is sent by an upstream peer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update_add_htlc is sent by an upstream peer.

Minor - If the receiving peer is also the HTLC recipient, the reputation
algorithm could halt here. Unless the HTLC preimage is unknown to the recipient ?
Normally there is only keysend payments, as such types of HTLCs.

I don't know if there should be a mention in the "Local Reputation" subsection,
that the reputation algorithm MAY NOT be run if the payment is final and there is
no slow jamming risk. It could be still worthy to score up the reputation of the
sending peer.

* The corresponding outgoing HTLC (if present) will be forwarded with
`endorsed` set to `1`.
* Otherwise:
* The HTLC will be limited to the remaining "general" slots and liquidity,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"* The HTLC will be limited to the remaining "general" slots and liquidity,
and will be failed if there are no resources remaining in this bucket."

While immediate failure is an option for the local routing node, an economically
rational one can rather halt the processing and wait for the local ressources
allocated to downstream channels to free up, and then process up the non-endorsed
HTLC.

There is no interactivity needed with the upstream peer after the commitment_signed
have been exchanged. The "halting time" per the local routing node measurement can
be subtracted from the CLTV delta difference between the outgoing_cltv_value and
the current chain height.

In a world where there is economical competition among the routing node, why
the receiving peer would reject a HTLC for free ? It can be more rational to
bet and pockets in the fee_base_msat and fee_proportional_millionths. You
might offer back to the other routing peer that can fulfill this HTLC forwarding
request, high quality economic traffic.

Especially, if the peer does not meet sufficient local reputation, while not
all the protected_slot_count are occupied, there is no economic sense to mark
an endorsed or non-endorsed HTLC as "general".

One suggestion could be rather to document that a receiving peer can support some
clawback where a HTLC can be upgraded from "general" to protected_slot_count as
some implementation policy. After reading "Resource Bucketing" subsection, and lrc's
addHTLC, I don't see that mentioned or implemented, if it's a worthy concern.

no need for use of protected resources as channels are not saturated during
regular operation. Should the network come under attack, honest nodes that
have built up reputation over time will still be able to utilize protected
resources to process payments in the network.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the network come under attack, honest nodes that
have built up reputation over time will still be able to utilize protected
resources to process payments in the network.

Minor - There could be a succinct mention of how the local resource conservation
system behave for honest routing nodes boostrapping their HTLC forwarding in the
network. Such nodes might not have accumulated enough reputation during a steady
state to leverage it in face of slow jamming attack. It can be worthy to be sure
the system works smoothly for marginal peers added to the network topology.

Especially, if there are some negative events happening deeper in the stack of
the routing nodes (e.g a cloud center being taking down by a tsunami). Some
segments of the channels topology could have to be substituted on a short period
of time. The whole graph seen with by concatenating node_announcement and
channel_announcement by not be a bijection with the infrastructure.

resources to and signal endorsement of a HTLC on the outgoing channel. Nodes MAY
use any metric of their choosing to classify a peer as having sufficient
reputation, though a poor choice of reputation scoring metric may affect their
reputation with their downstream peers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nodes MAY use any metric of their choosing to classify a peer as having sufficient
reputation, though a poor choice of reputation scoring metric may affect their
reputation with their downstream peers.

For the implementators, maybe few examples of scoring metrics could be given:

  • reliability: the uptime of the sending peer in the channel
  • success rate: success rates versus number of failures
  • profit / volume: based on earned fees and total amount moved through the channel
  • utility: that one is blurred...opportuntiy cost ? e.g if off-chain fees have been paid to open the chan

See that presentation about "Lightning network topology, its creation and maintenance
from which the above metrics are inspired from.

Maybe it could be added in this document or in a blip. I know there is the idea of what
is "observable within the protocol" described latter in this subsection, yet even for
the forwarding fees, this is a hard problem. The upstream peer has no visibility on
the local routing node's difference between amount_msat and amt_to_forward, if
onion encryption holds.

is granted. This algorithm uses forwarding fees to measure damage, as this value
is observable within the protocol. It is reasonable to expect an adversary to
"about turn" - to behave perfectly to build up reputation, then alter their
behavior to abuse it. For this reason, in-flight HTLCs have a temporary

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this reason, in-flight HTLCs have a temporary negative impact on reputation until
they are resolved.

Minor - This sentence, or the idea behind it, I don't get it.

Let's consider a simple slow jamming, where an attacker forwards a bunch of spammy
HTLCs, which are never resolved successfully and a blank state where no reputation
have been accumulated in the reputationTracker.

At the moment of reception of the update_add_htlc, there is no informational state
by the receiving peer on how the HTLC is resolved, either by a success or a failure.
Marking that in-flight HTLC as having a negative impact on the channel or peer reputation
could lead to reject another concurrent in-flight HTLC, and a fees gain. The final state
of those 2 in-flight HTLCs could be a success. However at the time of processing among
upstream and downstream, the target node has no medium to predict in a deterministic
fashion the HTLC resolution.

Unless it is suggested that a target node should limit the max number of in-flight HTLC
originating from a single upstream peer ? I don't see that idea mentioned more neither
in the "Resource Bucketing" subsection or "Local Reputation" subsection. It could be
interesting as some kind of implementation policy to limit worst-case damage from a
single peer. E.g one that would have build up a high IncomingReputation, and
then suddenly engage in a slow jamming on the target node.

Maybe, this could be described in a blip or another document as an implementation policy.

remote peer's `max_accepted_htlcs`).
* `protected_liquidity_portion`: defines the portion of liquidity that is
reserved for endorsed HTLCs from peers with sufficient reputation (default:
0.5).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • protected_liquidity_portion: defines the portion of liquidity that is
    reserved for endorsed HTLCs from peers with sufficient reputation (default: 0.5).

Minor minor - The base of the 0.5 could be precised if it's the funding utxo
amount, from the htlc_minimum_msat, with or without the channel reserve.

* SHOULD reduce the remote peer's `max_accepted_htlcs` by
`protected_slot_count` for the purposes of the proposed HTLC.
* SHOULD reduce the `max_htlc_value_in_flight` by
`protected_liquidity_portion` * `max_htlc_value_in_flight`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • SHOULD reduce the max_htlc_value_in_flight by
    protected_liquidity_portion * max_htlc_value_in_flight.

See comment above HTLC on-the-fly upgrade from "general" to "protected". This
could be the kind of situation where a high-value HTLC paying good routing fees
is rejected from forward on outgoing channel, because the HTLC outgoing_htlc_value
is falling just above protected_liquidity_portion * max_htlc_value_in_flight.

Of course, all depend if there is high volume of traffic going through
the target node, and that traffic probabilistically should soon occupied the
protected_slot_liquidity, or if it's more economically interesting to take
the risk of making an exemption.

One could suggest this part could be better left to be described in another
document, or a blip and have implementation experimenting with that. This
could be too "rigid" for low-volume forwarding nodes and the parameters too
"flexible" for high-volume, topologically well-connected forwarding nodes.

Rolling windows specified in this write up may be implemented as a decaying
average to minimize the amount of data that needs to be stored per-channel. In
flight HTLCs can be accounted for separately to this calculation, as the node
will already have data for these HTLCs available.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rolling windows specified in this write up may be implemented as a decaying
average to minimize the amount of data that needs to be stored per-channel. In
flight HTLCs can be accounted for separately to this calculation, as the node
will already have data for these HTLCs available.

Minor - I believe the overall "Local Resource Conversartion" proposal would gain to
have the implementation notes dried up in its own document or blip, including
some magic values that are referenced in other subsections (e.g the 10 for the
incoming_channel_multiplier defining the rolling window).

Without thoughts really on the decaying average, there could be implementation
alternative such as taking all the HTLCs points since the channel opening and
periodically re-evaluating their score according to the on-chain fees, as one
can see in the blocks, or the total HTLC forwarding traffic that has been through
the target node. Just ideas, more to note the range of rolling windows algorithm
that could be experimented with.

### Bootstrapping Outgoing Channel Revenue
New channels with no revenue history:
* MAY choose not to endorse any HTLCs in their first two weeks of operation
to establish baseline revenue.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New channels with no revenue history:

  • MAY choose not to endorse any HTLCs in their first two weeks of operation
    to establish baseline revenue.

Minor - It would deserve its own blip, especially if some nodes tries altnerative
bootstrapping ideas, e.g modulating the no endorsment period in function of
the peers's number of channel_announcements.

when assessing reputation.
* MAY consider `outgoing_channel_revenue` for all channels with the outgoing
peer, but SHOULD take care to [bootstrap](#bootstrapping-outgoing-channel-revenue)
new channels so they do not lower the reputation threshold for existing ones.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • MAY consider incoming_channel_revenue across all channels with the peer
    when assessing reputation.
  • MAY consider outgoing_channel_revenue for all channels with the outgoing
    peer, but SHOULD take care to bootstrap

Minor - ....Hmmmmmm, it could be interesting to adopt the communalized reputation assesment
over many channels from a upstream peer on the HTLC timing. Given their lower and
upper bounded by the height_added and resolution_window, a set of slow-jamming
HTLCs might have to fit within the same window.

Especially, it could be useful to prevent sudden spikes of slow-jamming HTLCs to
occupy liquidity / slots, when those slow-jamming are triggered with few hops of
depths in the graph. While not downgrading the forwarding of the upstream peers
the rest of the time. It could be a thing.

@PurpleTimez
Copy link

PurpleTimez commented Nov 22, 2024

This is the re-explenation of the local reputation downgrade attack among lightning peers in a point-to-point networking topology, that has been previously raised in comments here and here, which I think are the main serious ones.

Resolution_Period Drift Attack

Let's say there is the following lightning network topology among all HTLC routing nodes.


        Alice <----> Bob <----> Caroll <----> Dave

Both Bob and Caroll supports the HTLC opportunity_cost formula to evaluate the forwarding of HTLCs:

  • opportunity_cost : ceil ( (resolution_time - resolution_period) / resolution_period) * fees`

In this setting, Bob and Caroll the intermediary nodes are configured with the following "Local Resource Conservation" parameters to evaluate the opportunity_cost of a HTLC resolution:

  • resolution_period: 90 seconds

In this attack, the resolution_time is targeted to maliciously drift beyond the resolution_period that Bob and Caroll are configured with.

Alice and Dave are in collusion and they forward a HTLC over the Bob-Caroll link. Alice is a well-scored peer by Bob, and Dave is a well-scored peer by Dave from past fast-settlement HTLC traffic.

Alice forwards a HTLC to Dave. Once Dave receives the HTLC, resolution is hold until the resolution_period on Bob forwarding channel to Caroll is reached on the wall clock. This HTLC withhold can be done indistinguishably until min_final_cltv_expiry_delta is reached, which is 18 blocks by default or 180 min of average blocks time.

Once the resolution_period threshold is reached, Bob will start to downgrade the local reputation of Caroll, even if Dave is the HTLC withhold and final payee at the origin of the period "drift".

There is very likely variant of this attack, even more stealth, where Dave withhold until the last second the HTLC, to fall under the 90 seconds period on the Caroll-Dave link.

There is no strong economic cost to trigger this attack, just the on-chain fees to open channels with Bob and Caroll, and assuming those mitigations deployment, the cost of a node local reputation will be very likely inversely proportional to the centrality of the channel vertices in the lightning topology of edges.

One Solution: Payment Path Opportunity Cost Transitivity

The most straighforward solution would be to ensure that resolution_period are encompassed at the link-level for the payment route to be similar along, e.g as new BOLT2 channel parameter, where lightning peers would announce them in the open_channel / accept_channel. This would start to make the local reputation ressource transitive among the lightning topology.

I don't think it's bullet-proofs robust in face of fine-grained timing-based variant of this attack. Though dynamic scoring algorithms developed here could be also worthy in face of distributed denial-of-service attacks targeting base-layer full-nodes.

I cannot think about other robust solutions given routed lightning HTLC are always play out among a set of three parties: A <-> B <-> C or more, where the channel topology is established under trust-minimized assumptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.