Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple channel management from host session management #2801

Merged
merged 87 commits into from
Sep 28, 2021

Conversation

eddyashton
Copy link
Member

This PR includes the final pieces from #2775:

  • Node-to-node sessions on the host are not managed by the enclave. Instead they are created on-demand by the host, and should be cullable on idleness independently of the enclave's channel key management
  • Key exchange protocol is rewritten to clarify what is serialised in each message, with a few functional tweaks. The core idea is that we more often obey a new init message, regardless of what we were doing, and try to proceed with that exchange
  • ChannelManager implementation is separated from interface, and only the manager (not the internal Channels) are exposed by the API/used by the tests

@achamayou
Copy link
Member

I've not managed to reproduce exactly what's happening in CI, but I've hit a run where one of the new nodes essentially fails to catch up, seemingly because it refuses to establish a channel successfully with the current primary (one of the old nodes).

It then stays stuck in WAITING_FOR_FINAL, seemingly forever.

2021-08-10T16:58:53.252032Z -0.006 0   [info ] ../src/node/node_state.h:657         | Node n[03b2170c2ad208e6d3a895e37d45c539cd89e0e644549328997e9aaf0aa98187] is waiting for votes of members to be trusted
2021-08-10T16:58:57.301365Z -0.073 0   [info ] ../src/consensus/aft/raft.h:3352     | Added raft node n[7914f7a2941bbf6c0ebcf8c45b165992687c641c5625888d962817e1bd717fc1] (127.168.43.206:40165)
2021-08-10T16:58:57.301419Z -0.073 0   [info ] ../src/consensus/aft/raft.h:3352     | Added raft node n[7a02f4becd3615bf6f92f0c5e0c6a5137cd53002a91210e5301c71c3fc4f5850] (127.109.68.148:37233)
2021-08-10T16:58:57.301442Z -0.073 0   [info ] ../src/consensus/aft/raft.h:3352     | Added raft node n[a5085d99316d81c177c5fab55986dbb7dfc5c7113b9e8dafe84bd52f98cd0ea2] (127.20.34.178:38167)
2021-08-10T16:58:57.301463Z -0.073 0   [info ] ../src/consensus/aft/raft.h:3352     | Added raft node n[d3e5d133b737de6c691c7d79864a5052ab5c483f1335e7ca2f68f4e6e246f85f] (127.114.210.171:42657)
2021-08-10T16:58:57.301515Z        100 [info ] ../src/host/ledger.h:822             | Setting last known/commit index to 131
2021-08-10T16:58:57.301535Z -0.073 0   [info ] ../src/consensus/aft/raft.h:2866     | Becoming follower n[03b2170c2ad208e6d3a895e37d45c539cd89e0e644549328997e9aaf0aa98187]: 2
2021-08-10T16:58:57.307910Z -0.079 0   [info ] ../src/node/node_state.h:626         | Joiner successfully resumed from snapshot at seqno 131 and view 2
2021-08-10T16:58:57.312170Z -0.084 0   [info ] ../src/node/node_state.h:1591        | Network TLS connections now accepted
2021-08-10T16:58:57.313251Z -0.085 0   [info ] ../src/node/node_state.h:647         | Node has now joined the network as node n[03b2170c2ad208e6d3a895e37d45c539cd89e0e644549328997e9aaf0aa98187]: all domains
2021-08-10T16:58:57.331681Z -0.009 0   [info ] ../src/node/channels.h:902           | Resetting channel with n[a5085d99316d81c177c5fab55986dbb7dfc5c7113b9e8dafe84bd52f98cd0ea2]
2021-08-10T16:58:57.402321Z -0.004 0   [info ] ../src/node/channels.h:902           | Resetting channel with n[a5085d99316d81c177c5fab55986dbb7dfc5c7113b9e8dafe84bd52f98cd0ea2]
2021-08-10T16:58:57.408753Z -0.011 0   [fail ] ../src/node/channels.h:531           | <- n[a5085d99316d81c177c5fab55986dbb7dfc5c7113b9e8dafe84bd52f98cd0ea2] (WAITING_FOR_FINAL): Peer certificate verification failed
2021-08-10T16:58:57.408818Z -0.001 0   [info ] ../src/node/channels.h:814           | Node channel with n[a5085d99316d81c177c5fab55986dbb7dfc5c7113b9e8dafe84bd52f98cd0ea2] cannot receive authenticated message: not established, status=WAITING_FOR_FINAL
2021-08-10T16:58:57.409902Z -0.002 0   [info ] ../src/consensus/aft/raft.h:867      | Dropped invalid message from n[a5085d99316d81c177c5fab55986dbb7dfc5c7113b9e8dafe84bd52f98cd0ea2]
2021-08-10T16:58:57.492951Z -0.001 0   [info ] ../src/node/channels.h:814           | Node channel with n[a5085d99316d81c177c5fab55986dbb7dfc5c7113b9e8dafe84bd52f98cd0ea2] cannot receive authenticated message: not established, status=WAITING_FOR_FINAL
2021-08-10T16:58:57.492992Z -0.001 0   [info ] ../src/consensus/aft/raft.h:867      | Dropped invalid message from n[a5085d99316d81c177c5fab55986dbb7dfc5c7113b9e8dafe84bd52f98cd0ea2]
2021-08-10T16:58:57.661553Z -0.001 0   [info ] ../src/node/channels.h:814           | Node channel with n[a5085d99316d81c177c5fab55986dbb7dfc5c7113b9e8dafe84bd52f98cd0ea2] cannot receive authenticated message: not established, status=WAITING_FOR_FINAL
2021-08-10T16:58:57.661602Z -0.001 0   [info ] ../src/consensus/aft/raft.h:867      | Dropped invalid message from n[a5085d99316d81c177c5fab55986dbb7dfc5c7113b9e8dafe84bd52f98cd0ea2]
2021-08-10T16:58:57.801621Z -0.056 0   [info ] ../src/node/channels.h:814           | Node channel with n[a5085d99316d81c177c5fab55986dbb7dfc5c7113b9e8dafe84bd52f98cd0ea2] cannot receive authenticated message: not established, status=WAITING_FOR_FINAL
2021-08-10T16:58:57.801679Z -0.056 0   [info ] ../src/consensus/aft/raft.h:867      | Dropped invalid message from n[a5085d99316d81c177c5fab55986dbb7dfc5c7113b9e8dafe84bd52f98cd0ea2]
2021-08-10T16:58:57.917641Z -0.001 0   [info ] ../src/node/channels.h:814           | Node channel with n[a5085d99316d81c177c5fab55986dbb7dfc5c7113b9e8dafe84bd52f98cd0ea2] cannot receive authenticated message: not established, status=WAITING_FOR_FINAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants