No next lookup via gossipd #3547

rustyrussell · 2020-02-25T05:57:18Z

(Based on #3546, so ignore first three commits).

Every time we need to forward, we ask gossipd for where we should forward it. That's unnecessary, and slow and weird. But that also gives us a chance to get the latest channel_update, in case we need to send an error.

This does a local lookup, and if we need an update it starts with a temporary_node_failure then asks gossipd so it can make the real error. This is robust against crashing and other corner cases (such as channeld manipulating the failed HTLC) since there's never an HTLC in limbo.

niftynei · 2020-02-25T20:44:10Z

lightningd/peer_htlcs.c

+		= tal_dup_talarr(cbdata, u8, failmsg_needs_update);
+	subd_req(cbdata, hin->key.channel->peer->ld->gossip,
+		 take(towire_gossip_get_stripped_cupdate(NULL, failmsg_scid)),
+		 -1, 0, failmsg_update_reply, cbdata);


seems like your gossip daemon doesnt exist in all cases? failing test_restart_many_payments

EBUG:root:lightningd-1: 2020-02-25T06:08:23.095Z DEBUG 0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de199-chan#3: HTLC in 1 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: FATAL SIGNAL 11 (version v0.8.1-36-g03045e1-modded) DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: common/daemon.c:44 (send_backtrace) 0x5634d3e4c1b0 DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: common/daemon.c:52 (crashdump) 0x5634d3e4c200 DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x7fb9d3ca4f1f DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: lightningd/subd.c:750 (subd_send_msg) 0x5634d3e43132 DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: lightningd/subd.c:770 (subd_req_) 0x5634d3e431f6 DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: lightningd/peer_htlcs.c:268 (local_fail_in_htlc_needs_update) 0x5634d3e35438 DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: lightningd/peer_htlcs.c:296 (fail_out_htlc) 0x5634d3e3558d DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: lightningd/peer_htlcs.c:516 (destroy_hout_subd_died) 0x5634d3e35c86 DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: ccan/ccan/tal/tal.c:240 (notify) 0x5634d3eb2472 DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: ccan/ccan/tal/tal.c:402 (del_tree) 0x5634d3eb2961 DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: ccan/ccan/tal/tal.c:486 (tal_free) 0x5634d3eb2ced DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: lightningd/peer_htlcs.c:2107 (free_htlcs) 0x5634d3e3a85a DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: lightningd/lightningd.c:520 (shutdown_subdaemons) 0x5634d3e1a7a1 DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: lightningd/lightningd.c:940 (main) 0x5634d3e1b217 DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x7fb9d3c87b96 DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x5634d3e01b09 DEBUG:root:lightningd-1: 2020-02-25T06:08:23.108Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0xffffffffffffffff DEBUG:root:Received response for stop call: {'error': 'Connection to RPC server lost.'} ----------------------------- Captured stderr call ----------------------------- lightningd: FATAL SIGNAL 11 (version v0.8.1-36-g03045e1-modded) 0x5634d3e4c15a send_backtrace common/daemon.c:39 0x5634d3e4c200 crashdump common/daemon.c:52 0x7fb9d3ca4f1f ??? ???:0 0x5634d3e43132 subd_send_msg lightningd/subd.c:750 0x5634d3e431f6 subd_req_ lightningd/subd.c:770 0x5634d3e35438 local_fail_in_htlc_needs_update lightningd/peer_htlcs.c:268 0x5634d3e3558d fail_out_htlc lightningd/peer_htlcs.c:296 0x5634d3e35c86 destroy_hout_subd_died lightningd/peer_htlcs.c:516 0x5634d3eb2472 notify ccan/ccan/tal/tal.c:240 0x5634d3eb2961 del_tree ccan/ccan/tal/tal.c:402 0x5634d3eb2ced tal_free ccan/ccan/tal/tal.c:486 0x5634d3e3a85a free_htlcs lightningd/peer_htlcs.c:2107 0x5634d3e1a7a1 shutdown_subdaemons lightningd/lightningd.c:520 0x5634d3e1b217 main lightningd/lightningd.c:940 0x7fb9d3c87b96 ??? ???:0 0x5634d3e01b09 ??? ???:0 0xffffffffffffffff ??? ???:0

Ah, I think this is best addressed by freeing HTLCs and peer daemons first, then gossipd. Thanks!

niftynei · 2020-02-25T20:46:43Z

lightningd/peer_htlcs.c

 	}

 	if (!topology_synced(out->peer->ld->topology)) {
 		log_info(out->log, "Attempt to send HTLC but still syncing"
 			 " with bitcoin network");
-		return towire_temporary_channel_failure(ctx,
-							out->stripped_update);
+		return towire_temporary_node_failure(ctx);


i think this change is causing test_lightningd_still_loading to fail with the wrong error message?

<class 'AssertionError'> Pattern 'TEMPORARY_NODE_FAILURE' not found in "RPC call failed: method: sendpay, payload: {'route': [{'msatoshi': 1000, 'id': '022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59', 'delay': 5, 'channel': '1x1x1'}], 'payment_hash': '55a61a634cd0dbfdad2461676acae93774a0576ff4e5d61cb6728b3078744af1'}, error: {'code': 204, 'message': 'failed: WIRE_TEMPORARY_CHANNEL_FAILURE (First peer not ready)', 'data': {'erring_index': 0, 'failcode': 4103, 'failcodename': 'WIRE_TEMPORARY_CHANNEL_FAILURE', 'erring_node': '0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518', 'erring_channel': '1x1x1', 'erring_direction': 1}}" [<TracebackEntry /home/travis/build/ElementsProject/lightning/tests/test_misc.py:189>] test_lightningd_still_loading failed; it passed 0 out of the required 1 times.

Huh, no. That error is complaining that it's expecting a temp_node_fail, but it's getting a temp_chan_fail. And it doesn't happen for me here...

Ah, it does under VALGRIND. Turns out test is flaky, fixing...

niftynei

couple of test failures; otherwise concept ACK

…use it. The idea is that gossipd can give us the cupdate we need for an error, and we wire things up so that we ask for it (async) just before we send the error to the subdaemon. I tried many other things, but they were all too high-risk. 1. We need to ask gossipd every time, since it produces these lazily (in particular, it doesn't actually generate an offline update unless the channel is used). 2. We can't do async calls in random places, since we'll end up with an HTLC in limbo. What if another path tries to fail it at the same time? 3. This allows us to use a temporary_node_failure error, and upgrade it when gossipd replies. This doesn't change any existing assumptions. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Instead of saving a stripped_update, we use the new local_fail_in_htlc_needs_update. One minor change: we return the more correct towire_temporary_channel_failure when the node is still syncing. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

rustyrussell · 2020-02-27T04:16:06Z

Rebased, addressed @niftynei feedback.

Even without optimization, it's faster to walk all the channels than ping another daemon and wait for the response. Changelog-Changed: Forwarding messages is now much faster (less inter-daemon traffic) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

cdecker · 2020-02-27T12:33:20Z

Restarted due to flaky test_reconnect_no_update, seems unrelated.

cdecker

Just a minor clarification, otherwise LGTM

ACK b17cd23

cdecker · 2020-02-27T12:36:32Z

lightningd/peer_htlcs.c

+	 * but it's v. unlikely */
+	if (!fromwire_gossip_get_stripped_cupdate_reply(msg, msg,
+							&stripped_update)
+	    || !tal_count(stripped_update)) {


Aren't we using tal_bytelen for u8*?

It's a line-ball, IMHO. Especially here, where it will actually (due to our explicit fromwire_ generation policy) be NULL instead of empty.

cdecker · 2020-02-27T12:37:46Z

lightningd/peer_htlcs.c

-
-	subd_send_msg(hin->key.channel->owner,
-		      take(towire_channel_fail_htlc(NULL, failed_htlc)));
+	tell_channeld_htlc_failed(hin, failed_htlc);


Nice simplification 👍

rustyrussell added this to the 0.8.2 milestone Feb 25, 2020

rustyrussell requested review from cdecker and niftynei February 25, 2020 05:57

niftynei reviewed Feb 25, 2020

View reviewed changes

rustyrussell force-pushed the no-next-lookup-via-gossipd branch from 55b75cf to 7439faf Compare February 27, 2020 02:19

rustyrussell added 2 commits February 27, 2020 14:17

rustyrussell force-pushed the no-next-lookup-via-gossipd branch from 7439faf to 46eaef9 Compare February 27, 2020 04:15

rustyrussell force-pushed the no-next-lookup-via-gossipd branch from 46eaef9 to b17cd23 Compare February 27, 2020 04:27

cdecker approved these changes Feb 27, 2020

View reviewed changes

rustyrussell merged commit f8a21f1 into ElementsProject:master Feb 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No next lookup via gossipd #3547

No next lookup via gossipd #3547

rustyrussell commented Feb 25, 2020

niftynei Feb 25, 2020 •

edited

Loading

rustyrussell Feb 27, 2020

niftynei Feb 25, 2020

rustyrussell Feb 27, 2020

rustyrussell Feb 27, 2020

niftynei left a comment

rustyrussell commented Feb 27, 2020

cdecker commented Feb 27, 2020

cdecker left a comment

cdecker Feb 27, 2020

rustyrussell Feb 27, 2020

cdecker Feb 27, 2020

No next lookup via gossipd #3547

No next lookup via gossipd #3547

Conversation

rustyrussell commented Feb 25, 2020

niftynei Feb 25, 2020 • edited Loading

Choose a reason for hiding this comment

rustyrussell Feb 27, 2020

Choose a reason for hiding this comment

niftynei Feb 25, 2020

Choose a reason for hiding this comment

rustyrussell Feb 27, 2020

Choose a reason for hiding this comment

rustyrussell Feb 27, 2020

Choose a reason for hiding this comment

niftynei left a comment

Choose a reason for hiding this comment

rustyrussell commented Feb 27, 2020

cdecker commented Feb 27, 2020

cdecker left a comment

Choose a reason for hiding this comment

cdecker Feb 27, 2020

Choose a reason for hiding this comment

rustyrussell Feb 27, 2020

Choose a reason for hiding this comment

cdecker Feb 27, 2020

Choose a reason for hiding this comment

niftynei Feb 25, 2020 •

edited

Loading