-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
context: add simple rtt estimator #140
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #140 +/- ##
==========================================
+ Coverage 58.31% 58.49% +0.17%
==========================================
Files 32 33 +1
Lines 2262 2279 +17
==========================================
+ Hits 1319 1333 +14
- Misses 859 861 +2
- Partials 84 85 +1 ☔ View full report in Codecov by Sentry. |
After b3c1c3b the only source of block time inaccuracy left is PrepareRequest propagation. In general it's undetermined and can vary from node to node and from round to round. But in practice it's relatively stable and can be averaged out over some number of samples. While backup node can't know when request was sent to it, it can measure what happens to its request when it's primary since healthy nodes are expected to respond immediately making request/response pretty similar to ping/pong roundtrip. This data also can't be perfect, messages can travel different ways from A to B and B to A on a P2P network, but in practice it's good enough and it's much better than having no data at all. Notice that this timer adjustment scheme uses local timer only, we trust our local clock inevitably and we don't need to mess with any view of time that other nodes have (including the one from the block header). It can be attacked by malicious peer or node delaying messages, but as we expect 2F+1 nodes to be correct one this is enough to collect proper number of responses and move on, seriously delayed response will be outdated by the time it comes then. And if any intermediary can delay messages to all nodes for arbitrary time on our P2P network we can't have any reliable timing anyway. To me it fixes as much of neo-project/neo#2918 as only possible. Signed-off-by: Roman Khimov <roman@nspcc.ru>
Synthetic tests relative to #56 (which is a part of the master branch now). Single node is not affected by this change, so it's omitted. 4 nodes, 5s time, no delays, nothing to talk about: The most interesting case, 5s, 200ms delay with 1000 RPS load (much lower than max). Adjustments are clearly seen there and they don't come to the full effect yet (70 blocks are needed), but TPS is the same (as expected): Now pushing this network to its limits, 5s, 200ms delay and no RPS boundaries. Suddenly we're regaining some of TPS metrics lost in #56 (with better block time): Let's make it faster with 1s block time, without delays, it's slightly better: Overall it's just better or about the same in every synthetic test. |
And testnet. Real network scattered all over the globe, 6 countries for 7 CNs, 1s target time with 200-300ms rtt. This also includes data for #56 isolated (but only two nodes out of seven!), since original here is pre-lastBlock change. Almost no transactions, 1000 blocks sample: We're deviating by ~5% average with 20-30%% block time rtt, good enough for me. |
We need to publish an article. |
After b3c1c3b the only source of block time inaccuracy left is PrepareRequest propagation. In general it's undetermined and can vary from node to node and from round to round. But in practice it's relatively stable and can be averaged out over some number of samples. While backup node can't know when request was sent to it, it can measure what happens to its request when it's primary since healthy nodes are expected to respond immediately making request/response pretty similar to ping/pong roundtrip. This data also can't be perfect, messages can travel different ways from A to B and B to A on a P2P network, but in practice it's good enough and it's much better than having no data at all.
Notice that this timer adjustment scheme uses local timer only, we trust our local clock inevitably and we don't need to mess with any view of time that other nodes have (including the one from the block header).
It can be attacked by malicious peer or node delaying messages, but as we expect 2F+1 nodes to be correct one this is enough to collect proper number of responses and move on, seriously delayed response will be outdated by the time it comes then. And if any intermediary can delay messages to all nodes for arbitrary time on our P2P network we can't have any reliable timing anyway.
To me it fixes as much of neo-project/neo#2918 as only possible.