Skip to content

WIP: Trampoline forwarding #3711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

arik-so
Copy link
Contributor

@arik-so arik-so commented Apr 7, 2025

Forward Trampoline onions. Currently a work in progress, still missing these cleanups:

  • New HTLCSource variant for storing the session_priv instead of throwing them away
  • Tests for forwarding failures
  • Additional tests for failures that can occur both when forwarding and receiving Trampoline onions
  • Determine skimmable amount, charging a routing fee

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Apr 7, 2025

👋 Thanks for assigning @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@arik-so arik-so force-pushed the arik/trampoline/forwarding branch from 3ad2b6b to 4070974 Compare April 7, 2025 08:42
Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skimmed the PR. Change set definitely much shorter than expected.

What I didn't really see is the interpretation of the result of the trampoline forward and potential retry when it fails?

previously_failed_blinded_path_idxs: vec![],
},
final_value_msat: outgoing_amount,
max_total_routing_fee_msat: Some(incoming_amount - outgoing_amount),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the trampoline node itself earn something too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, figuring out how to determine a good amount to skim

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should just include the fee on the local outgoing channel in the total fee during pathfinding? Normally that is not necessary, but this is a payment that is really a forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be addressed now

@@ -2460,3 +2460,165 @@ fn test_trampoline_forward_rejection() {
expect_payment_failed_conditions(&nodes[0], payment_hash, false, payment_failed_conditions);
}
}

#[test]
fn test_unblinded_trampoline_forward() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that you do an unblinded forward. I know it has to be supported, but it isn't what an LDK sender would do right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle there's nothing preventing us from placing an unblinded Trampoline hop before a blinded one. But the reason I'm testing these first is less cryptography overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do now test both.

// applying non-strict forwarding.
// The channel with the least amount of outbound liquidity will be used to maximize the
// probability of being able to successfully forward a subsequent HTLC.
let maybe_optimal_channel = peer_state.channel_by_id.values_mut()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd not expect this code to be repeated, because it also applies to non-trampoline forwards? I do realize this is draft.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up needing to move this into the else clause, resulting in potentially even more repetition, but also some minor differences. I'm now looking into deduplication.

let outgoing_amount = next_packet_details.outgoing_amt_msat;
let outgoing_cltv = next_packet_details.outgoing_cltv_value;

let routes = self.router.find_route(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any risk in this potentially long running operation blocking important processes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be running in the background, so I don't believe so

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is running in the background processor I think, but it could still be blocking the processing of other events? Or is execution really suspended here while other event handling proceeds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To break this down a bit more precisely, are you suggesting creating an entirely separate loop for that altogether, independent of the PendingHTLCForward processing loop, and push an event for e. g. TrampolineForwardPathFound?

InFlightHtlcs::new()
).unwrap();

inter_trampoline_path = routes.paths.first().map(|path| path.hops.clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No path found case to be handled I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, ironing that out :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should hopefully find the test coverage to be satisfactory now :)

@arik-so arik-so force-pushed the arik/trampoline/forwarding branch 2 times, most recently from 0931218 to 92fa368 Compare April 7, 2025 20:57
@arik-so arik-so mentioned this pull request Apr 7, 2025
31 tasks
@@ -4353,8 +4355,6 @@ where
// we don't have the channel here.
return Err(("Refusing to forward over real channel SCID as our counterparty requested.", 0x4000 | 10));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we perform this check for trampoline forwards, once we know the outgoing channel?

Copy link
Contributor Author

@arik-so arik-so Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently we do not given we take the channel immediately from the pathfinding logic

match trampoline_outgoing_scid {
Some(scid) => scid,
None => {
return Err(("Cannot forward by Node ID without SCID.", 0x4000 | 10));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add debug_assert(false) if this should be unreachable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently inapplicable with the logic reshuffle, but if we should reimplement a can_forward call post-pathfinding, will become applicable again

Comment on lines 189 to 190
/// Path to send the packet to the next hop
hops: Vec<RouteHop>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this lock us into using only 1 path for trampoline forwards? Seems like that could impact payment reliability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Determined offline to be fine pending implementation of retry logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought in our offline discussion we said we want to get the persistence correct now, though? Also MPP seems applicable for the first try as well, not just retries. Let me know what I'm missing there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the persistence for MPP scenarios.

@arik-so arik-so force-pushed the arik/trampoline/forwarding branch 4 times, most recently from c3f5089 to e0a2a9f Compare April 10, 2025 06:19
Comment on lines +492 to +500
/// We couldn't forward to the next Trampoline node. That may happen if we cannot find a route,
/// or if the route we found didn't work out
FailedTrampolineForward {
/// The node ID of the next Trampoline hop we tried forwarding to
requested_next_node_id: PublicKey,
/// The channel we tried forwarding over, if we have settled on one
forward_scid: Option<u64>,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all about to change a lot with #3700 so may want to use an existing variant instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we wait until #3700 lands? Or I can coördinate with Carla? It seems to be the existing variants really don't map nicely, but I won't die on that hill.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it might be a good idea to coordinate with her, I think 3700 should be landing soon.

Comment on lines 189 to 190
/// Path to send the packet to the next hop
hops: Vec<RouteHop>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought in our offline discussion we said we want to get the persistence correct now, though? Also MPP seems applicable for the first try as well, not just retries. Let me know what I'm missing there.

@arik-so arik-so force-pushed the arik/trampoline/forwarding branch from e0a2a9f to 4c6f1ef Compare April 10, 2025 18:21
Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No review was requested yet, but had a quick look and dropped a few comments.


let replacement_onion = {
// create a substitute onion where the last Trampoline hop is an unblinded receive, which we
// (deliberately) do not support out of the box, therefore necessitating this workaround
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this be a more casual failure, just no liquidity or something like that?

I am also wondering whether all the trampoline failure cases are covered currently. So there is blind and non-blinded, but also failures directly from a first trampoline node, from a second trampoline, and from a normal node in between the first and second.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can create a more casual failure. Regarding the forwarding failure cases, hopefully they should be much better covered now, though the sky's the limit.

Copy link
Contributor Author

@arik-so arik-so Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a preference for what the more casual failure should be exactly? A channel with insufficient balance? I'm trying to think of scenarios where the pathfinding wouldn't fail, but the actual sending would, which are also easy to simulate.

max_path_count: 1,
max_path_length: MAX_PATH_LENGTH_ESTIMATE / 2,
max_channel_saturation_power_of_half: 2,
previously_failed_channels: vec![],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed for retries?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joostjager could you clarify that? It seems like we should track channels that failed us in the past for retries specifically

Copy link
Contributor

@joostjager joostjager Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what I mean. Then we need info from previous routing attempts here. But I suppose that'll be added in a follow up that does retries?

I am wondering though if postponing the retries to a later PR might require redoing some of the logic that is added in this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, yeah when we get a failure we should insert the channel ID in here for retries.

I am wondering though if postponing the retries to a later PR might require redoing some of the logic that is added in this one.

I'm also wondering about this, going to try to take a closer look at that on the next pass.

do_pass_along_path(args);

if success {
claim_payment(&nodes[0], &[&nodes[1], &nodes[2], &nodes[4], &nodes[3]], payment_preimage);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably a good idea to cover the trampoline routing retry somewhere too?

}
}
()
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of code added. Is it possible to do some decomposition in methods?

I would also suggest to add the maximum amount of comments in all of this code to explain as much as possible.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-request review when you want another look :)

@@ -7067,6 +7091,61 @@ where
failed_next_destination: destination,
}, None));
},
HTLCSource::TrampolineForward { previous_hop_data, incoming_trampoline_shared_secret, .. } => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we expand the commit message to talk about the behavior being added here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, updated the commit message. Perhaps a comment might be worthwhile, too

previous_hop_data: HTLCPreviousHopData,
incoming_trampoline_shared_secret: [u8; 32],
hops: Vec<RouteHop>,
session_priv: SecretKey,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment explaining this field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

push_trampoline_forwarding_failure(format!("Could not to route to next Trampoline hop {next_node_id} via forwarding channel {outgoing_scid}"), htlc_source, Some(outgoing_scid), 0x2000 | 25, Vec::new());
continue;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to call self.can_forward_htlc here, now that we have the outgoing scid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we still do this immediately after pathfinding?

max_path_count: 1,
max_path_length: MAX_PATH_LENGTH_ESTIMATE / 2,
max_channel_saturation_power_of_half: 2,
previously_failed_channels: vec![],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joostjager could you clarify that? It seems like we should track channels that failed us in the past for retries specifically

@arik-so arik-so force-pushed the arik/trampoline/forwarding branch from 4c6f1ef to 2abfc25 Compare April 12, 2025 02:50
Copy link

codecov bot commented Apr 12, 2025

Codecov Report

Attention: Patch coverage is 88.63965% with 157 lines in your changes missing coverage. Please review.

Project coverage is 89.17%. Comparing base (481e5c7) to head (688ad3f).

Files with missing lines Patch % Lines
lightning/src/ln/channelmanager.rs 70.92% 111 Missing and 5 partials ⚠️
lightning/src/ln/onion_payment.rs 57.53% 25 Missing and 6 partials ⚠️
lightning/src/ln/blinded_payment_tests.rs 99.20% 7 Missing ⚠️
lightning/src/chain/channelmonitor.rs 33.33% 2 Missing ⚠️
lightning/src/ln/onion_utils.rs 95.45% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3711      +/-   ##
==========================================
+ Coverage   89.13%   89.17%   +0.04%     
==========================================
  Files         156      156              
  Lines      123477   124824    +1347     
  Branches   123477   124824    +1347     
==========================================
+ Hits       110056   111308    +1252     
- Misses      10752    10828      +76     
- Partials     2669     2688      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@arik-so arik-so force-pushed the arik/trampoline/forwarding branch 6 times, most recently from dfd0be7 to 688ad3f Compare April 16, 2025 08:41
@arik-so arik-so marked this pull request as ready for review April 16, 2025 08:42
@arik-so
Copy link
Contributor Author

arik-so commented Apr 16, 2025

Hey guys, rephrased the commits, addressed what I think was most of your comments, and would like to squash a large portion of the fixup commits that ended up in here.

Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reviewed the first commit and posting what I have. Need to gather a lot more context to properly review the rest of the PR.

fee_msat: amt_msat,
cltv_expiry_delta: 24,
fee_msat: 0,
cltv_expiry_delta: 72,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the delta be 48 too then if it is the final hop?

@@ -2270,6 +2269,8 @@ fn test_trampoline_unblinded_receive() {
route_params: None,
};

// outer 56
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outer 56?

let args = if underpay {
args.with_payment_preimage(payment_preimage)
.without_claimable_event()
.expect_failure(HTLCDestination::FailedPayment { payment_hash })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it now waiting on more mpp shards to get to 2*amt?

}
{
let payment_failed_conditions = PaymentFailedConditions::new()
.expected_htlc_error_data(0x2000 | 26, &[0; 0]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the spec saying failure data is u32,u32,u16 ?

InboundHTLCErr {
msg: "Underflow calculating outbound amount or CLTV value for Trampoline forward",
err_code: 0x2000 | 26,
err_data: Vec::new(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No err data like specced in bolt?

@@ -148,6 +166,15 @@ pub(super) fn create_fwd_pending_htlc_info(
err_data: vec![0; 32],
}
})?;
check_trampoline_sanity(outer_hop_data, outgoing_cltv_value, amt_to_forward).map_err(|()| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming consistency: check_blinded_forward vs check_blinded_forward_constraints vs check_trampoline_sanity

trampoline_hop_data: msgs::InboundOnionReceivePayload {
payment_data, keysend_preimage, custom_tlvs, sender_intended_htlc_amt_msat,
cltv_expiry_height, payment_metadata, ..
}, ..
} =>
} => {
check_trampoline_sanity(outer_hop_data, cltv_expiry_height, sender_intended_htlc_amt_msat).map_err(|()| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the final hop, shouldn't it be equal? I know it is not in the spec like that.

@@ -577,7 +624,34 @@ where
outgoing_cltv_value
})
}
onion_utils::Hop::TrampolineForward { next_trampoline_hop_data: msgs::InboundTrampolineForwardPayload { amt_to_forward, outgoing_cltv_value, next_trampoline }, trampoline_shared_secret, incoming_trampoline_public_key, .. } => {
onion_utils::Hop::TrampolineForward { ref outer_hop_data, next_trampoline_hop_data: msgs::InboundTrampolineForwardPayload { amt_to_forward, outgoing_cltv_value, next_trampoline }, outer_shared_secret, trampoline_shared_secret, incoming_trampoline_public_key, .. } => {
if let Err(()) = check_trampoline_sanity(outer_hop_data, outgoing_cltv_value, amt_to_forward) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does sanity need to be checked both here and in create_fwd_pending_htlc_info ?

@@ -1902,7 +1904,7 @@ where
&hop_data.trampoline_packet.hop_data,
hop_data.trampoline_packet.hmac,
Some(payment_hash),
(blinding_point, node_signer),
(hop_data.current_path_key, node_signer),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an unrelated fix?

@@ -1557,6 +1557,8 @@ impl HTLCFailReason {
else if failure_code == 21 { debug_assert!(data.is_empty()) }
else if failure_code == 22 | PERM { debug_assert!(data.len() <= 11) }
else if failure_code == 23 { debug_assert!(data.is_empty()) }
else if failure_code == 25 | NODE { debug_assert!(data.is_empty()) }
else if failure_code == 26 | NODE { debug_assert!(data.is_empty()) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only 25 is empty according to spec?

@joostjager joostjager self-requested a review April 16, 2025 11:33
Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting some comments out since I think a rebase is coming in a bit

Comment on lines 64 to 77
fn check_trampoline_sanity(outer_hop_data: &msgs::InboundTrampolineEntrypointPayload, trampoline_cltv_value: u32, trampoline_amount: u64) -> Result<(), ()> {
if outer_hop_data.outgoing_cltv_value < trampoline_cltv_value {
return Err(());
}
if outer_hop_data.multipath_trampoline_data.as_ref().map_or(outer_hop_data.amt_to_forward, |mtd| mtd.total_msat) < trampoline_amount {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wrap at 100chars

Comment on lines 134 to 135
onion_utils::Hop::TrampolineForward { ref outer_hop_data, next_trampoline_hop_data, next_trampoline_hop_hmac, new_trampoline_packet_bytes, trampoline_shared_secret, .. } => {
check_trampoline_sanity(outer_hop_data, next_trampoline_hop_data.outgoing_cltv_value, next_trampoline_hop_data.amt_to_forward).map_err(|()| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wrap at 100chars and below

/// Uniquely describes an HTLC by its source. Just the guaranteed-unique subset of [`HTLCSource`].
pub(crate) enum SentHTLCId {
PreviousHopData { short_channel_id: u64, htlc_id: u64 },
OutboundRoute { session_priv: [u8; SECRET_KEY_SIZE] },
TrampolineForward { session_priv: [u8; SECRET_KEY_SIZE], previous_hop_data: Vec<PreviousHopIdData> }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/previous_hop_data/previous_hop_ids since there are multiple now?

@@ -624,10 +624,17 @@ impl Readable for InterceptId {
}

#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)]
pub(crate) struct PreviousHopIdData {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: does the Data suffix add anything? May be nicer to remove

// todo: what do we want to do with this given we do not wish to propagate it directly?
let _decoded_onion_failure = onion_error.decode_onion_failure(&self.secp_ctx, &self.logger, &source);

for current_hop_data in previous_hop_data {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use upstream_hop_data or just hop_data? Kinda confusing to say "for current hop in previous hops { .. }".

Comment on lines 7124 to 7423
let failure = match blinded_failure {
Some(BlindedFailure::FromIntroductionNode) => {
let blinded_onion_error = HTLCFailReason::reason(INVALID_ONION_BLINDING, vec![0; 32]);
let err_packet = blinded_onion_error.get_encrypted_failure_packet(
&incoming_packet_shared_secret, &Some(incoming_trampoline_shared_secret.clone())
);
HTLCForwardInfo::FailHTLC { htlc_id, err_packet }
},
Some(BlindedFailure::FromBlindedNode) => {
HTLCForwardInfo::FailMalformedHTLC {
htlc_id,
failure_code: INVALID_ONION_BLINDING,
sha256_of_onion: [0; 32]
}
},
None => {
let err_code = 0x2000 | 25;
let err_packet = HTLCFailReason::reason(err_code, Vec::new())
.get_encrypted_failure_packet(&incoming_packet_shared_secret, &Some(incoming_trampoline_shared_secret.clone()));
HTLCForwardInfo::FailHTLC { htlc_id, err_packet }
}
};

push_forward_event = self.decode_update_add_htlcs.lock().unwrap().is_empty();
let mut forward_htlcs = self.forward_htlcs.lock().unwrap();
push_forward_event &= forward_htlcs.is_empty();

match forward_htlcs.entry(short_channel_id) {
hash_map::Entry::Occupied(mut entry) => {
entry.get_mut().push(failure);
},
hash_map::Entry::Vacant(entry) => {
entry.insert(vec!(failure));
}
}

mem::drop(forward_htlcs);

let mut pending_events = self.pending_events.lock().unwrap();
pending_events.push_back((events::Event::HTLCHandlingFailed {
prev_channel_id: channel_id,
failed_next_destination: destination.clone(),
}, None));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to DRY this with the HTLCSource::PreviousHopData handling above?

@arik-so arik-so force-pushed the arik/trampoline/forwarding branch 2 times, most recently from bbc9414 to 8817662 Compare April 17, 2025 07:53
arik-so added 9 commits April 17, 2025 11:05
Ensure that the Trampoline onion's amount and CLTV values do not exceed
the limitations imposed by the outer onion.
To process errors returned from downstream nodes when forwarding between
Trampoline nodes, we need to store information beyond what's available
in `HTLCPreviousHopData`, such as the used hops and the newly generated
outer onion's session_priv.

To that end, we add a new variant to `HTLCSource` in this commit, which
also future-proofs mapping an outbound forward to an incoming MPP.
The previously existing `HTLCDestination` do not map nicely to the
failure event of a Trampoline forward, so we introduce a new variant to
fill the gap.
In this commit, we expand our forwarding logic to do ad-hoc pathfinding
to subsequent Trampoline nodes, covering both blinded and unblinded
scenarios. We allow the forwards to use MPP, but we do not yet implement
inbound Trampoline MPP handling, nor retry logic.

We further modify our error propagation logic to handle Trampoline
forward HTLC failures eagerly, i.e. when we receive a failure from the
downstream node, we immediately fail or inbound MPP components.
@arik-so arik-so force-pushed the arik/trampoline/forwarding branch from 8817662 to 0b86a98 Compare April 17, 2025 19:26
let mut recipient_features = Bolt11InvoiceFeatures::empty();
recipient_features.set_basic_mpp_optional();

let route = match self.router.find_route(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question that came to my mind is to what extent pathfinding here is a DoS vector. It's probably quite easy to keep trampoline nodes busy searching?

@valentinewallace
Copy link
Contributor

Thinking about the current approach in the last commit where we find a route --

I think things may be cleaner if we move a lot of the handling of trampoline forwards to the outbound_payments module. In this approach, we would have a similar method to the existing send_payment_* methods there that creates route parameters, registers a PendingOutboundPayment::Retryable that's a trampoline, and forwards via the internal method find_route_and_send_payment.

The benefit of this approach is that retries will be handled automatically and it keeps us from adding a bunch of code to process_pending_htlc_forwards/adds more encapsulation in general.

I think some changes needed for this would be:

  • when we're processing a trampoline forward in process_pending_htlc_forwards, we'll call into the outbound payments module to handle the pathfinding and the forward, and if it fails then fail the trampoline HTLC backwards
  • modify ChannelManager's send_payment_along_path to construct the correct HTLC source and payment onion, since this method is called from outbound_payments
  • if an outbound trampoline forward fails, when we notice this in fail_htlc_backwards_internal we'll call another new method PendingOutboundPayments::trampoline_htlc_failed(..) similar to the fail_htlc method in that module, which will either retry for us or tell us to fail the HTLC(s) backwards
  • will need to make sure we generate the right events/don't generate PaymentFailed for failed trampolines

Thoughts on this alternate approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants