Cache up-to-date consensus payloads #949

vncoelho · 2019-07-23T21:59:36Z

closes #788

Check this out, @erikzhang, @shargon, @jsolman and @igormcoelho. An interesting speed up for consensus payloads.

The mechanism was an idea given by SPCC guys during their GO implementation (@fabwa, Anatoly Bogatyrev and Evgeniy Stratonikov).

codecov-io · 2019-07-23T22:07:54Z

Codecov Report

❗ No coverage uploaded for pull request base (master@0a7cba7). Click here to learn what that means.
The diff coverage is 24.52%.

@@            Coverage Diff            @@
##             master     #949   +/-   ##
=========================================
  Coverage          ?   64.83%           
=========================================
  Files             ?      199           
  Lines             ?    13700           
  Branches          ?        0           
=========================================
  Hits              ?     8883           
  Misses            ?     4817           
  Partials          ?        0

Impacted Files	Coverage Δ
neo/Consensus/ConsensusService.cs	`13.74% <4.22%> (ø)`
neo/Consensus/ConsensusContext.cs	`63.77% <65.71%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0a7cba7...fb27884. Read the comment docs.

neo/Consensus/ConsensusService.cs

…neo-project/neo into speed-up-consensus-with-future-payloads

vncoelho · 2019-07-26T15:08:39Z

@erikzhang, even marked as Low Priority, I still believe it is important.
@neo-project/core, take your time to review it when you have an extra time.

As much as we have NEO3 consensus operating more efficiently a better experience we are going to have.

This PR is a simple mechanism that caches payloads that were previously discarded. However, they are useful and can surely improve consensus performance.

vncoelho · 2019-07-28T11:26:33Z

Improved the mechanism for just trying to load payloads if they are from the current height.

TODO: With this last change it becomes better to just set each used payload to null.

@shargon, how to set a var of foreach to null inside the loop?

igormcoelho

This is a very good idea. Brother, I think that Reset second parameter could default to true, and only H+1 future messages are collected. After ine height more I dont see any advantage.. right?

vncoelho · 2019-07-31T23:06:11Z

In the first implementation I set the default to true, but at the end I preferred to make it more explicit, since we just have 2 calls.

H+1 Preparations, H+1 Commits? Something like that?
Makes sense also, brother.
However, we are just dealing with local RAM allocation. I believe it is better to have everything we can for now, no?
Since the payload already arrived.

neo/Consensus/ConsensusService.cs

lock9

Hi @vncoelho, we need UT to ensure this is working properly (we can't approve PRs without testing). Any chance you can add a few?
Also, it would be good if we can have some evidence that this will increase consensus. What exactly is this change going to improve?

vncoelho · 2019-08-08T21:24:47Z

@lock9, I will not be able to add UTs to this right now and it is not my priority at the moment. But I think it is a good thing to be done and I would sure revise and review any PR related to this with pleasure.

vncoelho · 2020-01-06T20:05:22Z

@shargon @erikzhang, I think that this is ready to merge.
In fact, @cloud8little experiments shows a slithgly better performance for 15s and 5s.

However, I believe that, statistically speaking, we are only going to see real gains when we use delays, which is the case of the mainenet and testnet.

The changes are straightforward.
The trade-off of local processing power is minor compared to the benefits for the mainnet operation, even if the gain of performance is minor (which I believe is not when we scale the system and count with delays).

erikzhang · 2020-01-07T04:59:04Z

How can we get a result of 14.9140625 seconds? I think there must be something wrong with the test.

vncoelho · 2020-01-08T01:02:50Z

It can happen, @erikzhang, in particular, if you consider PR #1345.

erikzhang · 2020-01-08T13:02:24Z

How can it happen? I don't understand.

vncoelho · 2020-01-08T13:42:50Z

I believe that the following cases may happen:

If any RecoveryMessage is received by the Primary, and the Primary has not it send its PrepareRequest, it will be triggered even without a proper Timeout:

neo/src/neo/Consensus/ConsensusService.cs

Lines 336 to 346 in d9d54b5

    
           if (!context.RequestSentOrReceived) 
        
           { 
        
               ConsensusPayload prepareRequestPayload = message.GetPrepareRequestPayload(context, payload); 
        
               if (prepareRequestPayload != null) 
        
               { 
        
                   totalPrepReq = 1; 
        
                   if (ReverifyAndProcessPayload(prepareRequestPayload)) validPrepReq++; 
        
               } 
        
               else if (context.IsPrimary) 
        
                   SendPrepareRequest(); 
        
           }

We are talking about an average advance of 1,85s = 185ms, considering a total number of blocks of 128 we have 23680ms = 23,6s total advance in a ~32minutes run. If we are on different machines I believe this is expected, however, if the test were run on a privnet maybe there was an error as you suspected, @erikzhang. Since this PR helps the consensus to cache next payloads, maybe the behavior improved a little bit and now we saw this behavior. I am not sure as well.

Anyway, the focus of this PR is just to cache the next payloads that were already sent via P2P. I do not see reason to discard them. Perhaps the name of the PR can be changed, it does not effectively need to be an speed up.

erikzhang · 2020-01-08T14:04:15Z

Anyway, the focus of this PR is just to cache the next payloads that were already sent via P2P. I do not see reason to discard them.

@vncoelho You are right. But I think this PR must solve a problem. So, what problem does this PR solve? In my opinion, it provides a caching mechanism that reduces the probability of packet loss. Then, if we can find evidence in the consensus log that our current consensus mechanism often switches views due to packet loss, I will think this PR is useful. But now, I don't think it's worth merging.

vncoelho · 2020-01-08T14:27:20Z

@erikzhang, in fact, we had an internal discussion yesterday and we discussed about closing/(not opening new ones) PRs (features) related to Consensus and dBFT 2.0.

The idea is to merge a whole package for dBFT 3.0, we are on the way of writing the paper and start the implementation until middle of this year.

There are some requirement for the safety and properties of pBFT, there are other important things we need to solve for fixing "workarounds", such as the use of FailedNodes flags.

This PR, in particular, is very simple, there is no bad trade-off in merging this since it only involves a local processing power without any possible scalability issue.

cloud8little · 2020-01-10T07:48:54Z

@vncoelho I can help make more test on different TPS comparison, on different machine. let's see what will it act.

vncoelho · 2020-01-13T09:46:34Z

Sounds great, @cloud8little!

cloud8little · 2020-01-14T11:13:29Z

@vncoelho I've retested on 4 consensus nodes, located in different countries, here is the detail of the results. there is no difference for 15000ms, slightly improvement for 5000/800ms. Obevious improvement for 400ms. the experiment is based on no transactions, since there is a issue #1410, I can't send massive txs at this moment.

	OS	Location	CPU	Memory	Bandwith	Disk
n1	Ubuntu 18.04	Tokyo	4	8G	5Mbps	20G
n2	CentOS 7.4	America	2	8G	6Mbps	20G
n3	Win Server 2016	BeiKing	2	8G	7Mbps	20G
n4	Ubuntu 18.04	India	2	8G	8Mbps	20G

neo-cli: master 51cd29fbe21abb9e1f17f64e5c6d21bc7decbbb9
neo: master ab4830c
neo-vm: master be2ac36bf35a3033d828e0ba0630d390599c487d

baseline
MillisecondsPerBlock	StartTime	EndTime	number of blocks	duration	secs/per block	Avg secs/block
15000	12:07:48	12:38:20	116	0:30:32	15.64655172	16.08189655
15000	12:06:44	12:38:20	116	0:31:36	16.34482759
15000	12:07:38	12:38:22	116	0:30:45	15.90517241
15000	12:06:34	12:38:20	116	0:31:46	16.43103448
5000	14:32:57	15:03:12	300	0:30:16	6.053333333	6.0925
5000	14:32:36	15:03:12	300	0:30:37	6.123333333
5000	14:32:52	15:03:12	300	0:30:20	6.066666667
5000	14:32:34	15:03:12	300	0:30:38	6.126666667
800	15:50:00	16:20:26	1562	0:30:26	1.169654289	1.169494238
800	15:50:00	16:20:26	1562	0:30:26	1.169014085
800	15:50:00	16:20:27	1562	0:30:27	1.169654289
800	15:50:00	16:20:26	1562	0:30:27	1.169654289
400	23:29:00	23:59:59	1988	0:30:58	0.930080483	0.93221831
400	23:29:01	23:59:59	1988	0:30:58	0.930080483
400	23:29:01	23:59:59	1988	0:30:58	0.934607646
400	23:29:01	23:59:58	1988	0:30:57	0.934104628

pr949
MillisecondsPerBlock	StartTime	EndTime	number of blocks	duration	secs/per block	Avg secs/block
15000	10:45:29	11:17:10	115	0:31:41	16.53043478	16.40652174
15000	10:45:28	11:17:10	115	0:31:43	16.54782609
15000	10:46:31	11:17:11	115	0:30:39	15.99130435
15000	10:45:28	11:17:12	115	0:31:44	16.55652174
5000	0:08:11	0:39:04	331	0:30:53	5.598187311	5.592900302
5000	0:08:14	0:39:04	331	0:30:50	5.589123867
5000	0:08:08	0:39:04	331	0:30:56	5.607250755
5000	0:08:18	0:39:04	331	0:30:46	5.577039275
800	14:18:33	14:49:00	1573	0:30:27	1.161474889	1.158773045
800	14:18:36	14:49:00	1573	0:30:24	1.159567705
800	14:18:42	14:49:00	1573	0:30:18	1.155753338
800	14:18:38	14:49:00	1573	0:30:22	1.158296249
400	11:46:53	12:17:01	2413	0:30:08	0.749274762	0.749274762
400	11:46:53	12:17:01	2413	0:30:08	0.749274762
400	11:46:53	12:17:01	2413	0:30:08	0.749274762
400	11:46:53	12:17:01	2413	0:30:08	0.749274762

vncoelho · 2020-01-14T11:25:47Z

Great experiments, @cloud8little.

I believe that there is not statistical different.
But it is very good to see this results.
It is incredible to see how 400ms worked even in a scenario with nodes in different location with possible great delays.

Perhaps that if the PrepReq had more Txs or size we could detect more gains in order to avoid losing this Payload.
In addition, in the real mainnet or testnet we do not direct have communication with CN, thus, payloads packages have longer routes through the graph network, as well as more uncertainty, which would surely reinforce the benefits of this current PR.

vncoelho · 2020-05-15T13:16:51Z

@cloud8little, @superboyiii, @shargon, this is a not a big change but now that we have more features for testing with tx's, could you test if this change improve's performance when network is under high load? Perhaps that as bigger is the PrepareRequest more efficient it will be.

erikzhang · 2022-05-18T22:45:10Z

Since the consensus module has been moved to neo-modules, I will close this first.

First draft for speeding up consensus with future payloads

1514b0f

vncoelho requested review from igormcoelho, shargon, jsolman and erikzhang July 23, 2019 21:59

vncoelho changed the title ~~First draft for speeding up consensus with future payloads~~ Speeding up consensus with future payloads Jul 23, 2019

vncoelho changed the title ~~Speeding up consensus with future payloads~~ Speeding up consensus with up-to-date payloads Jul 23, 2019

vncoelho commented Jul 24, 2019

View reviewed changes

neo/Consensus/ConsensusService.cs Outdated Show resolved Hide resolved

erikzhang added the Low-Priority Issues with lower priority label Jul 24, 2019

vncoelho added 4 commits July 26, 2019 11:49

Merge branch 'master' into speed-up-consensus-with-future-payloads

4b4b470

Function for checking if future payloads exists

1cb33c9

Merge branch 'speed-up-consensus-with-future-payloads' of github.com:…

9bca60e

…neo-project/neo into speed-up-consensus-with-future-payloads

Finishing all design of the PR with reload of Consensus Payloads

d62a6ec

vncoelho marked this pull request as ready for review July 26, 2019 15:04

vncoelho added 2 commits July 26, 2019 12:11

Fixing mixing context, literally

9429021

Just try to load payload if it is from the current height

bfa39ff

vncoelho added 2 commits July 30, 2019 06:59

Merge branch 'master' into speed-up-consensus-with-future-payloads

8d33283

Merge branch 'master' into speed-up-consensus-with-future-payloads

de8ce2f

igormcoelho reviewed Jul 31, 2019

View reviewed changes

Merge branch 'master' into speed-up-consensus-with-future-payloads

99fd182

Serialize and deserialize future payloads

c23d77e

erikzhang reviewed Aug 2, 2019

View reviewed changes

neo/Consensus/ConsensusService.cs Outdated Show resolved Hide resolved

lock9 previously requested changes Aug 4, 2019

View reviewed changes

vncoelho added 2 commits August 8, 2019 18:24

Merge branch 'master' into speed-up-consensus-with-future-payloads

8191c95

Limit future payloads that will be saved

8411857

Merge branch 'master' into speed-up-consensus-with-future-payloads

5ae142f

vncoelho requested review from shargon, erikzhang and igormcoelho January 6, 2020 20:05

Merge branch 'master' into speed-up-consensus-with-future-payloads

055e10f

vncoelho changed the title ~~Speeding up consensus with up-to-date payloads~~ Cache up-to-date consensus payloads Jan 8, 2020

Merge branch 'master' into speed-up-consensus-with-future-payloads

9da742d

Merge branch 'master' into speed-up-consensus-with-future-payloads

d19713b

vncoelho added 2 commits April 4, 2020 10:03

Merge branch 'master' into speed-up-consensus-with-future-payloads

06027a2

Merge branch 'master' into speed-up-consensus-with-future-payloads

4f01bf9

Merge branch 'master' into speed-up-consensus-with-future-payloads

00ad423

vncoelho mentioned this pull request Jun 8, 2020

Add ReadCache option to LevelDB and RocksDB neo-project/neo-modules#255

Closed

vncoelho added 3 commits June 8, 2020 09:09

Merge branch 'master' into speed-up-consensus-with-future-payloads

baccdb3

Merge branch 'master' into speed-up-consensus-with-future-payloads

067838a

Update ConsensusContext.cs

f56261f

erikzhang closed this May 18, 2022

erikzhang deleted the speed-up-consensus-with-future-payloads branch May 18, 2022 22:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache up-to-date consensus payloads #949

Cache up-to-date consensus payloads #949

vncoelho commented Jul 23, 2019 •

edited

Loading

codecov-io commented Jul 23, 2019 •

edited by codecov bot

Loading

vncoelho commented Jul 26, 2019 •

edited

Loading

vncoelho commented Jul 28, 2019

igormcoelho left a comment

vncoelho commented Jul 31, 2019

lock9 left a comment

vncoelho commented Aug 8, 2019 •

edited

Loading

vncoelho commented Jan 6, 2020

erikzhang commented Jan 7, 2020

vncoelho commented Jan 8, 2020

erikzhang commented Jan 8, 2020

vncoelho commented Jan 8, 2020

erikzhang commented Jan 8, 2020

vncoelho commented Jan 8, 2020 •

edited

Loading

cloud8little commented Jan 10, 2020

vncoelho commented Jan 13, 2020

cloud8little commented Jan 14, 2020 •

edited

Loading

vncoelho commented Jan 14, 2020

vncoelho commented May 15, 2020

erikzhang commented May 18, 2022 •

edited

Loading

Cache up-to-date consensus payloads #949

Cache up-to-date consensus payloads #949

Conversation

vncoelho commented Jul 23, 2019 • edited Loading

codecov-io commented Jul 23, 2019 • edited by codecov bot Loading

Codecov Report

vncoelho commented Jul 26, 2019 • edited Loading

vncoelho commented Jul 28, 2019

igormcoelho left a comment

Choose a reason for hiding this comment

vncoelho commented Jul 31, 2019

lock9 left a comment

Choose a reason for hiding this comment

vncoelho commented Aug 8, 2019 • edited Loading

vncoelho commented Jan 6, 2020

erikzhang commented Jan 7, 2020

vncoelho commented Jan 8, 2020

erikzhang commented Jan 8, 2020

vncoelho commented Jan 8, 2020

erikzhang commented Jan 8, 2020

vncoelho commented Jan 8, 2020 • edited Loading

cloud8little commented Jan 10, 2020

vncoelho commented Jan 13, 2020

cloud8little commented Jan 14, 2020 • edited Loading

vncoelho commented Jan 14, 2020

vncoelho commented May 15, 2020

erikzhang commented May 18, 2022 • edited Loading

vncoelho commented Jul 23, 2019 •

edited

Loading

codecov-io commented Jul 23, 2019 •

edited by codecov bot

Loading

vncoelho commented Jul 26, 2019 •

edited

Loading

vncoelho commented Aug 8, 2019 •

edited

Loading

vncoelho commented Jan 8, 2020 •

edited

Loading

cloud8little commented Jan 14, 2020 •

edited

Loading

erikzhang commented May 18, 2022 •

edited

Loading