Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Do not merge] Changes around channel management #2775

Closed
wants to merge 51 commits into from

Conversation

eddyashton
Copy link
Member

Opening this mega-PR to get CI testing, as the previously problematic tests are now passing locally. This includes many non-critical refactors/reworkings of adjacent bits of code, so I'll factor those into separate PRs.

A brief summary of the changes, as I remember them:

  • NodeToNodeChannelManager takes some of node_state.h, and some of channels.h, and is the sole point for external systems to send/receive (they don't open/close/manage individual channels)
  • Host-side node-to-node session management isn't at all exposed to the enclave. It doesn't distinguish inbound/outbound sessions, and doesn't try to (asynchronously) open/close sessions. Instead the enclave deals purely with NodeIDs. I believe that either the node already knows the IP of its recipient (it has been seen in the KV), in which case it has sent an associate message to the host and from then on the host can remember the ID -> IP mapping, or else it is a response to an inbound message, in which case the host still has a single associate session. Then the host can create TCP sessions on-demand, re-creating them if they were closed by networking issues, etc, and just becomes another step in the best-effort-but-faulty network stack (WIP)
  • Consistent, explicit serialisation of node_outbound messages over the ringbuffer. We previously benefitted from the fact that writing multiple values to the ringbuffer would blit them in a known format, which happened to match how we later read them with serialized::read (partly on the host, partly inside the receiving enclave). We now gives this explicit types like any other message, distinguishing the host-readable bits from the payload. There's a reserialisation penalty here but I think we can minimise it, and there's an associated fix to use ByteRanges on these types that may mitigate its effect in practice
  • Consistent logging format, with NodeID and current-state prefix, for all channel logging

And the core goal:

  • Re-examine the key exchange state machine and parsing logic to avoid deadlocked states

@ghost
Copy link

ghost commented Jul 7, 2021

eda_channels_1@29086 aka 20210714.9 vs main ewma over 20 builds from 28649 to 29064

Click to see table
build_id build_number tpcc_sgx_cft^ tpcc_sgx_cft_mem tpcc_sgx_bft^ tpcc_sgx_bft_mem ls_sgx_cft^ ls_sgx_cft_mem ls_jwt_sgx_cft^ ls_jwt_sgx_cft_mem ls_js_sgx_cft^ ls_js_sgx_cft_mem ls_full_js_sgx_cft^ ls_full_js_sgx_cft_mem ls_js_jwt_sgx_cft^ ls_js_jwt_sgx_cft_mem CHAMP put (/s)^ CHAMP get (/s)^
28649 20210708.2 6519.56 9.21478e+07 2321.55 2.74862e+08 22771.9 1.74367e+07 4048.09 1.45532e+07 2446.59 9.31027e+06 2458.48 9.31027e+06 1805.77 8.2617e+06 1.32728e+06 3.62478e+07
28654 20210708.3 6574.64 9.18856e+07 2799.76 2.17977e+08 21660.4 1.76989e+07 4470.06 1.45532e+07 2702.58 9.31027e+06 2569.31 9.57242e+06 1739.07 1.03588e+07 1.32961e+06 3.62478e+07
28682 20210708.11 6609.37 9.00506e+07 2845.73 2.10899e+08 23098 1.71746e+07 4277.1 1.45532e+07 2638.9 9.57242e+06 2513.73 1.27181e+07 1749.26 8.52384e+06 1.36971e+06 3.69675e+07
28709 20210708.19 6421.77 9.24099e+07 2658.41 2.85872e+08 21230.9 1.71746e+07 4052.77 1.48153e+07 2522.67 1.37667e+07 2441.23 9.31027e+06 1727.95 8.52384e+06 1.35039e+06 3.61837e+07
28722 20210708.23 6607.24 9.18856e+07 2819.96 2.0854e+08 22096.2 1.71746e+07 4212.29 1.48153e+07 2565.35 9.83456e+06 2548.29 9.57242e+06 1676.23 8.52384e+06 1.34119e+06 3.62478e+07
28740 20210709.2 6624.79 9.08371e+07 2688.61 2.42618e+08 22051.8 1.74367e+07 4139.21 1.48153e+07 2651.49 9.83456e+06 2421.44 1.00967e+07 1779.62 9.83456e+06 1.34471e+06 3.61199e+07
28744 20210709.3 6305.95 9.29342e+07 3218.48 1.68694e+08 23213.2 1.76989e+07 4504.05 1.48153e+07 2589.01 9.31027e+06 2373.99 9.31027e+06 1757.08 8.52384e+06 1.3236e+06 3.59298e+07
28791 20210709.14 6694.46 9.16235e+07 2716.75 2.15618e+08 19978.6 1.71746e+07 4173.24 1.50774e+07 2592.79 1.35046e+07 2507.92 9.83456e+06 1773.38 8.2617e+06 1.32103e+06 3.62478e+07
28822 20210709.20 6621.18 9.21478e+07 3228.35 1.79704e+08 21578.1 1.82232e+07 4450.77 1.48153e+07 2657.35 9.57242e+06 2564.62 9.31027e+06 1814.03 8.2617e+06 1.34817e+06 3.61837e+07
28832 20210710.1 6465.35 9.13613e+07 2779.56 2.39997e+08 20900.7 1.74367e+07 4409.77 1.50774e+07 2448.72 1.35046e+07 2525.38 1.00967e+07 1714.06 1.08831e+07 1.32163e+06 3.62478e+07
28841 20210712.2 6305.13 9.24099e+07 2914.58 2.11161e+08 19914.1 1.71746e+07 4223.94 1.4291e+07 2611.13 9.57242e+06 2361.08 9.04813e+06 1747.77 8.52384e+06 1.36261e+06 3.62478e+07
28853 20210712.4 6604.09 9.21478e+07 2652.4 2.17715e+08 21031.8 1.71746e+07 4230.47 1.50774e+07 2447.93 9.83456e+06 2518.8 1.29803e+07 1739.46 8.2617e+06 1.34551e+06 3.62478e+07
28869 20210712.8 6414.96 9.05749e+07 2861.45 2.19812e+08 22324.3 1.71746e+07 4424.13 1.48153e+07 2699.44 9.57242e+06 2523.75 9.57242e+06 1774.27 8.2617e+06 1.34392e+06 3.59298e+07
28892 20210712.14 6314.58 9.16235e+07 2874.54 2.10899e+08 20587.2 1.74367e+07 4164.2 1.48153e+07 2659.64 1.00967e+07 2583.47 1.27181e+07 1770.76 1.03588e+07 1.34067e+06 3.62478e+07
28947 20210712.26 6080.63 9.31964e+07 2776.23 2.16142e+08 22268.2 1.74367e+07 4509.82 1.50774e+07 2682.1 9.83456e+06 2538.74 9.31027e+06 1756.38 8.52384e+06 1.34941e+06 3.63121e+07
28952 20210713.2 6299.76 9.24099e+07 2547.23 2.42618e+08 21849.5 1.71746e+07 4242.07 1.48153e+07 2493.74 9.57242e+06 2536.28 9.83456e+06 1800.45 8.52384e+06 1.34684e+06 3.62478e+07
28971 20210713.6 6252.4 9.18856e+07 2705.26 2.38424e+08 23328.1 1.74367e+07 4576.91 1.48153e+07 2579.34 9.57242e+06 2492.22 9.31027e+06 1753.52 8.78598e+06 1.3448e+06 3.62478e+07
29002 20210713.16 6386.27 9.13613e+07 2409.68 2.39735e+08 21401.9 1.71746e+07 4412.55 1.45532e+07 2622.41 9.57242e+06 2548.55 9.31027e+06 1661.32 8.52384e+06 1.34852e+06 3.61837e+07
29042 20210713.26 6165.51 9.08371e+07 2470.55 2.43667e+08 23148.9 1.74367e+07 4434.99 1.45532e+07 2629.48 9.31027e+06 2422.36 9.31027e+06 1714.39 8.78598e+06 1.33455e+06 3.61837e+07
29064 20210714.2 6720.43 9.29342e+07 3010.72 2.0854e+08 21531.2 1.74367e+07 4441.3 1.45532e+07 2591.27 9.57242e+06 2538.24 9.31027e+06 1729.62 8.52384e+06 1.34595e+06 3.61199e+07

images

@eddyashton
Copy link
Member Author

2 more thoughts:

@eddyashton
Copy link
Member Author

Split up into other PRs: #2777, #2780, #2783, #2801.

@eddyashton eddyashton closed this Jul 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant