Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Channel Graph Fetching from newer lightning nodes #615

Closed
TheBlueMatt opened this issue May 3, 2020 · 13 comments
Closed

Channel Graph Fetching from newer lightning nodes #615

TheBlueMatt opened this issue May 3, 2020 · 13 comments
Milestone

Comments

@TheBlueMatt
Copy link
Collaborator

Newer versions of some lightning nodes (at least lnd, but not c-lightning) don't bother supporting sending the full routing table anymore in response to the initial_routing_sync bit. We should figure out what the minimal set of channel queries is to be able to tell such nodes to send us the full routing table at startup without taking zlib dependency. Long-term I'd probably prefer that we help get the sync stuff to a more robust redesign (see eg lightning/bolts#584 (comment) ) but certainly in the mean time we should at least be able to fetch the db.

@TheBlueMatt TheBlueMatt added this to the 0.1 milestone May 3, 2020
@naumenkogs
Copy link
Contributor

Related to #57, but seems more up-to-date.

@naumenkogs
Copy link
Contributor

naumenkogs commented Jul 20, 2020

I'm planning to pick this up. To be clear, the plan is to use gossip_queries in a smart way, and avoiding zlib compression?

@TheBlueMatt
Copy link
Collaborator Author

Yep, gossip_queries kinda suck, but if we have to use then just to get the graph, we have to... still, ideally we don't implement more of the complexity than we have to.

@bmancini55
Copy link
Contributor

I would like to help if you all want the assistance. I have experience implementing gossip_queries in the node-lightning project and would be happy to share that experience or work on the implementation if its not too late.

@TheBlueMatt
Copy link
Collaborator Author

Cool! I don't believe @naumenkogs is working on this, at least based on his PR's, but I'll let him correct me if not. As for where, you should check out peer_handler.rs, at least where it handles initial_routing_sync on the init message (https://github.com/rust-bitcoin/rust-lightning/blob/master/lightning/src/ln/peer_handler.rs#L589 and https://github.com/rust-bitcoin/rust-lightning/blob/master/lightning/src/ln/peer_handler.rs#L717). If you need to interface with the network graph to get information, do so through the RoutingMessageHandler trait (https://github.com/rust-bitcoin/rust-lightning/blob/master/lightning/src/ln/msgs.rs#L600-L622).

@naumenkogs
Copy link
Contributor

I was planning, but haven't started working on this issue yet, so please go ahead :)

@bmancini55
Copy link
Contributor

I'm still getting familiar with the codebase but wanted to share some thoughts and experiences on how we might tackle this.

At a high level, a minimalist implementation should only support gossip_queries and not bother with gossip_queries_ex. We will need to modify feature negotiation during init to signal gossip_queries. Once negotiated, init_routing_sync is ignored and they will not gossip to us until we send a gossip_timestamp_filter.

To keep things simple with gossip_timestamp_filter we can assume that we want all messages from the peer and should set the filter to be as broad as possible. If we set first_timestamp to 0, it is possible that we receive a routing table dump. In practice, this does not appear to be a reliable way to backfill routing table information, which would be nice.

When using query_channel_range, I have been requesting the entire block range when a node has been offline for more than a trivial amount of time. This ensures that channel_update and node_announcement messages aren't missed and we don't have to wait for the updates to propagate via relay if/when those are periodically updated.

Prior to a clarifications of the spec, reply_channel_range spanning multiple messages used inconsistent formats. The clarification helps establish the meaning of the full_information flag and enables proper sequencing of multipart replies. IMO, we should focus on implementing the current specification and ignore inconsistencies in the past.

Not supporting zlib presents a challenge in that there is no way to dictate the encoding sent in reply_channel_range. From observation:

  • LND sends plaintext responses (though this may change)
  • eclair sends zlib responses
  • c-lightning sends zlib responses

If we connect to c-lightning or eclair node, we are out of luck and will need to fail the sync and try with a different peer. It might be best to push for a new feature flag in the spec to signal zlib support.

In my implementation, I blackboxed the sync process. The two main benefits are:

  • we get success/failure outcome for the aggregate of the process and can try from a different peer if there is a problem
  • we can disable relay while we are performing a sync so we don't fill the buffer with old messages

I'll send a follow up with some thoughts on actual implementation as I get more familiar with the code. Thanks!

@TheBlueMatt
Copy link
Collaborator Author

It might be best to push for a new feature flag in the spec to signal zlib support.

Definitely. Am I correct, then, in my understanding that there is no reliable way to fetch the routing table from both an lnd node and a c-lightning node without taking a zlib dependency or without playing some game where we guess the remote node type and reconnect with different flags until things work?

I have been requesting the entire block range when a node has been offline for more than a trivial amount of time.

Regarding using block filters, does this imply that we would miss any channel announcements which were only broadcast a while (eg lets say a day) after the block where the channel was opened if we set anything other than a "give me everything" filter? Its totally reasonable for nodes to be offline when their commitment transaction confirms and only broadcast the channel announcement much later, which seems to imply other nodes would miss it due to requesting a filter of "all the stuff since I was last online, as determined by when the channel confirmed, not when the channel_announcement was broadcast"?

In my implementation, I blackboxed the sync process. The two main benefits are:

I'm not sure what you mean by this - are you implying that the RoutingMessageHandler has some ability to query the PeerHandler and learn about things across different peers, or are you saying this is just for testing? Maybe a demo PR would help.

@bmancini55
Copy link
Contributor

Definitely. Am I correct, then, in my understanding that there is no reliable way to fetch the routing table from both an lnd node and a c-lightning node without taking a zlib dependency or without playing some game where we guess the remote node type and reconnect with different flags until things work?

That is my understanding as well.

Regarding using block filters, does this imply that we would miss any channel announcements which were only broadcast a while (eg lets say a day) after the block where the channel was opened if we set anything other than a "give me everything" filter? Its totally reasonable for nodes to be offline when their commitment transaction confirms and only broadcast the channel announcement much later, which seems to imply other nodes would miss it due to requesting a filter of "all the stuff since I was last online, as determined by when the channel confirmed, not when the channel_announcement was broadcast"?

You are correct, if your time horizon prior to your last known block is too short you may miss a chan_ann message. For example:

  • Alice and Bob are both online at height 0
  • Alice broadcasts a funding tx and goes offline
  • The tx gets confirmed at height 100
  • Bob goes offline around block 400
  • Alice comes online around block 500 and broadcasts the chan_ann
  • Bob comes back online around block 700

Bob believes he has a complete view until he went offline at 400, but if his time horizon is short (say 1 days worth), he will construct his query for block 256+ and miss Alice's channel that is attached to block 100.

This might not be that common an issue since Alice's counterparty may fail the channel if she fails to send her funding_locked message in a reasonable amount of time. But it's still an issue.

In can also be resolved if Alice is active and sends periodic chan_upd messages. When Bob's node sees the chan_upd for an unknown channel he can use query_short_channel_ids to ask for the chan_ann + chan_upd + node_ann messages.

Another reason I've been fetching the full range is a related problem. Because query_channel_range is based on the block height of the channel and chan_upd and node_ann messages are timestamp based, you may also miss updates for older channels. For instance:

  • Alice has a channel from block 0
  • Bob has knowledge of this channel
  • Bob goes offline at block 500
  • Alice issues a chan_upd or updates her node and has new features and broadcasts a node_ann around block 600
  • Bob comes online at block 800
  • Bob syncs from his last known point of 500 less some horizon
  • Bob misses the chan_upd and node_ann messages because they are from a channel belonging to a block prior to the query range.

Doing a full sync is expensive, but it at least guarantees that we have all information that the peer has.

I'm not sure what you mean by this - are you implying that the RoutingMessageHandler has some ability to query the PeerHandler and learn about things across different peers, or are you saying this is just for testing? Maybe a demo PR would help.

Sorry for the confusion, sort of a aside on the implementation side of things. I was referring to the implementation in node-lightning which has a different architecture.

@TheBlueMatt
Copy link
Collaborator Author

Doing a full sync is expensive, but it at least guarantees that we have all information that the peer has.

RIght, sounds like we pretty much need to request a full sync always until we get a set-reconciliation protocol to replace the existing sync stuff.

Sorry for the confusion, ...

Ah, ok, looking forward to seeing your work here :)

@bmancini55
Copy link
Contributor

BTW, I created this issue in the lightning-rfc requesting zlib support negotiation: lightning/bolts#811

@TheBlueMatt
Copy link
Collaborator Author

I believe #1015 is the only thing left here.

@TheBlueMatt
Copy link
Collaborator Author

Compression got yanked from the spec, this is now complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
@TheBlueMatt @bmancini55 @naumenkogs and others