Fix/Improve syncing (2x): Adding ping mechanism to TaskManager for replacing StartHeight and PendingKnownHashes strategy #1024

yongjiema · 2019-08-13T03:20:47Z

No description provided.

…e max start height

shargon · 2019-08-13T07:14:53Z

Could you make some unit test for this?

vncoelho

@yongjiema, it looks like a reasonable change in my opinion since nodes are not going to request blocks all the time, but based on the maximum startheight.

vncoelho

@yongjiema,

I think that this will spam the network a little bit. Do not think so?

Could you set a variable with the percentage of nodes that (sessions) will be asked for getblocks in such case?

Otherwise, I am thinking that the node will start to request getblocks to all its sessions until it reconnects to some other session at higher startheight.

Maybe a random number:

something like 20% of chance of requesting getblocks to a given session if such case happen: || (Blockchain.Singleton.Height < Blockchain.Singleton.HeaderHeight && Blockchain.Singleton.Height >= maxStartHeight) && Random.Rand() <= maximumPercentageForRequestingBlocksToAGivenSessionInCaseWeAreOnTop

vncoelho · 2019-08-13T11:18:04Z

On the other hand, @yongjiema, if we have a disconnection mechanism, in which:

nodes monitor the requests of each of its sessions
if it detects that the connected session is demanding a lot of the same information it will disconnect from it, which is theoretically, a spamming node/friend.

vncoelho

@yongjiema aaaaaaa, there is also another possibility.

In such case, you ping nodes!
The ping will return the LastBlockIndex, which will, consequently, release the request of blocks naturally, right?

…-2.x

vncoelho · 2019-08-21T08:15:18Z

@yongjiema, I fell good improvements comming. Nice job.
We just need to be sure we want them for master2x, but surely for master3x.

lock9

Can you make some UTs? How did you test this?

yongjiema · 2019-09-16T02:28:52Z

Tested with a build locally as we don't have some basic UTs.

vncoelho

Mannnn, @yongjiema, I just took a look at it right now, this new design looks good.

You did several good ideas simply after the last time I saw the code. Great job.
What do you think, @erikzhang?

I think that I agree with this, let me check with more time, some more couple of days. But the changes look precise.
During this time, @lock9, I will also try to think about UTs, but not sure if we are going to push here for NEO2. It would be nice, but maybe the change may reach a point that it is clearly enough to us.
But I agree that, at least, for NEO3 we should try to test basic cases of messages arriving.

neo/Network/P2P/ProtocolHandler.cs

neo/Network/P2P/TaskManager.cs

superboyiii · 2019-10-29T02:45:16Z

@erikzhang Could you have a look at this? I think this could help fixing the consensus stuck.

erikzhang · 2019-10-29T04:21:44Z

There are two "requested changes".

cloud8little · 2019-10-29T09:14:35Z

@yongjiema I can still get block stuck at a lower height with this fix, following is the environment I setup. 1 seed node, 4 consensus node; and send nep5 transactions to the seed node continuously, about 3txs per second.
The best height was 57749, while seed node stuck at 57726 for over 10 minutes.

seed node:

neo> show pool
total: 45233, verified: 31545, unverified: 13688

consensus node:
node1:

show pool
total: 27259, verified: 304, unverified: 26955

node2:

show pool
total: 27479, verified: 1976, unverified: 25503

node3:

show pool
total: 27667, verified: 1959, unverified: 25708

here is part of the consensus log, showing that every block contains 21 txs.
9329:[16:44:40.049] relay block: height=57695 hash=0x58531450235011caa9a44a73edae03dd652f1c8b0a97bbded414907914e6ed5a tx=21 9330:[16:44:40.326] persist block: height=57695 hash=0x58531450235011caa9a44a73edae03dd652f1c8b0a97bbded414907914e6ed5a tx=21 9339:[16:44:56.210] relay block: height=57696 hash=0xdaf61e192c40b7fc9ccddfdb0e4e1fa96f94edeecf4c818ab7e8f2867a934790 tx=21 9340:[16:44:57.084] persist block: height=57696 hash=0xdaf61e192c40b7fc9ccddfdb0e4e1fa96f94edeecf4c818ab7e8f2867a934790 tx=21 9342:[16:45:11.449] OnPrepareRequestReceived: height=57697 view=0 index=1 tx=21 9348:[16:45:11.989] relay block: height=57697 hash=0xe8281a7361c8e2d4c9bf3dd14527788592567fc82ab49b6ad3262a204e508382 tx=21 9349:[16:45:12.183] persist block: height=57697 hash=0xe8281a7361c8e2d4c9bf3dd14527788592567fc82ab49b6ad3262a204e508382 tx=21 9357:[16:45:42.381] OnPrepareRequestReceived: height=57698 view=1 index=1 tx=21 9371:[16:45:43.462] relay block: height=57698 hash=0xd233b9ac764c672170d5f1b7c4d9d2daeb90cf4006d8dc08efdde4c8733a759b tx=21 9372:[16:45:43.791] persist block: height=57698 hash=0xd233b9ac764c672170d5f1b7c4d9d2daeb90cf4006d8dc08efdde4c8733a759b tx=21 9374:[16:45:58.709] OnPrepareRequestReceived: height=57699 view=0 index=3 tx=21 9380:[16:45:59.545] relay block: height=57699 hash=0xd3062287dd601b51310cbb999fdf9ceb8442919c9ec6b5c869fd6ff74e92ef02 tx=21 9381:[16:45:59.872] persist block: height=57699 hash=0xd3062287dd601b51310cbb999fdf9ceb8442919c9ec6b5c869fd6ff74e92ef02 tx=21

yongjiema · 2019-10-29T10:25:08Z

@cloud8little Can you try send 180txs per mins? Ideally the interval is greater than 30 secs.

https://github.com/neo-project/neo/blob/master/neo/Network/P2P/TaskManager.cs#L23

cloud8little · 2019-10-30T09:53:29Z

@cloud8little Can you try send 180txs per mins? Ideally the interval is greater than 30 secs.

https://github.com/neo-project/neo/blob/master/neo/Network/P2P/TaskManager.cs#L23

180txs per mins can be handled by this fix, correct a bit, TPS yesterday should be around 12tx/sec, today tried again, it works well, when 9.6~12 txs per second, it can recover to the best height in several minutes. The mempool is full with 50 thousands txs.

Good to merge this PR.

neo/Network/P2P/ProtocolHandler.cs

neo/Network/P2P/RemoteNode.cs

@yongjiema

Not application for now. However, still need to double check if someone can spam the node with pendinghashes, ping @yongjiema

vncoelho · 2019-10-31T18:21:52Z

@yongjiema, what do you think about some node just sending random hashes that will enter on the pendingKnownHashes?

yongjiema · 2019-11-01T02:02:19Z

@yongjiema, what do you think about some node just sending random hashes that will enter on the pendingKnownHashes?

Will remove these later if it's persisted to blockchain instead directly inserting to knownHashes previously.

neo/neo/Network/P2P/ProtocolHandler.cs

Line 42 in 35e1d17

private static readonly TimeSpan PendingTimeout = TimeSpan.FromMinutes(1);

vncoelho · 2019-11-01T12:21:52Z

I see, @yongjiema, is not this a problem?
Because they will be removed after, let`s say, 1 minutes...And, then, some node could spam again.

yongjiema · 2019-11-02T02:23:48Z

I see, @yongjiema, is not this a problem?
Because they will be removed after, let`s say, 1 minutes...And, then, some node could spam again.

Currently cannot stop if someone want to spam, they can generate different transactions are verified and send to network, and previous solution could be out of memory.

vncoelho · 2019-11-10T14:56:50Z

@yongjiema, I am not sure.
This is a dangerous change of pendingHashes...aheuahuea
But I like it.

I mean:

Nowadays, we directly add to knownHashes, which is only removed if the Capacity of the FifoSet is reached! This can take some time. However, as you said, someone could just produce random hashes and spam, right?
With the new pendingHashes, someone can periodically send the same invalid hashes and, also, do the same previous attack, generating random hashes and making the PendingKnownHashesCollection with even millions of items (worse than before).

vncoelho

@yongjiema,

Could you asap split the PRs in two different ones?
I am going to approve the part I related to the ping and the use of LastBlockIndex. We need to move forward and this ping is surely a good thing. The StartHeight looks bad to me.

Then, we discuss the another PR with just the 'PendingKnownHashesCollection`, namely part II.

Could you do that for both master2x and 3x?

Maybe you can just remove the PendingKnownHashesCollection and open the new PR with it.
Leave the opened PRs with the part I.

yongjiema · 2019-11-12T01:52:18Z

@vncoelho Can you think more? It will not resolve the sync stuck issue without PendingKnownHashesCollection for me.

vncoelho · 2019-11-13T03:27:25Z

Perfect, @yongjiema, I am currently testing it under some scenarios. It is now available online here: https://neocompiler.io/#!/ecolab/cnnodesinfo

vncoelho · 2019-11-13T03:29:54Z

@lock9, UTs for this PR will not be possible right now.
However, it will be great if you could also try some tests.
When you have some time, please double check the PR and your review.

I am now almost completely understanding the change and in favor of it. Currently, we are performing some test to be more sure about its possible improvements.

vncoelho

Here we go, @yongjiema.
The syncing process is considerably improved.
It is like a turning water into wine. Congratulations!

UTs are not possible right now, we would need to first design them for the classes of P2P that are being modified. Even for NEO3, perhaps, the UTs will come later.

vncoelho · 2019-11-13T14:22:02Z

@superboyiii @cloud8little, as far as I can see, you both also agree with the change.

I dismissed @lock9 requested change because the UTs are not feasible for now, even for master3x.

@erikzhang, from my tests here, I never saw the network like this!
It is really incredible job, deserves a good recognition for this, my friend!
I discussed the attack vectors with @yongjiema and from his visions I understood that is does not make the P2P less reliable.

Let's merge this PR. I should apologize for the long time we took to see this improvement (but, perhaps, we also improved it a little bit since the idea draft), @yongjiema. You were very precisely to the cause.

neo/Network/P2P/ProtocolHandler.cs

vncoelho · 2019-11-16T02:16:16Z

omg, this PR is a magic @yongjiema, if @jsolman takes a look he will probably be proud of it.
We are having great experience on the Neocompiler Eco Shared Privatnet.

shargon

Could you add some comments, what's the meaning of pendingKnownHashes?

lock9

Hi @vncoelho, I can't approve it because I currently don't know how to properly test this. If you can guide me on how to test it, I may be able to approve it.

vncoelho · 2019-11-25T12:27:42Z

@lock9, I am not going to guide you throughout these steps right now.
This is not the first time I am seeing this situation.
In addition, I think it is part of the learning as well.

It took many months and great part of the energy we try to put on NEO for motivating such kind of initiatives that result in this PR and knowledge evolution. It is taking us time and efforts to try to understand it, try to test it and surely a similar procedure happened with @yongjiema.

Right now, I need to focus on other aspects that I and the ones surround me consider to be important. Meanwhile, fell free to design the UTs for this PR or leave it opened until you think it should.

Try to get blocks if the current height is bigger than or equal to th…

c00eb93

…e max start height

yongjiema changed the title ~~Try to get blocks if the current height is bigger than or equal to the max start height~~ Try to get blocks if the current height is bigger than or equal to the max start height (2x) Aug 13, 2019

vncoelho reviewed Aug 13, 2019

View reviewed changes

vncoelho previously requested changes Aug 13, 2019

View reviewed changes

vncoelho reviewed Aug 13, 2019

View reviewed changes

vncoelho and others added 4 commits August 15, 2019 09:03

Merge branch 'master-2.x' into fix-sync-blocks-2.x

ac56ba0

Merge remote-tracking branch 'origin/master-2.x' into fix-sync-blocks…

56a3846

…-2.x

Ping remote nodes if the local node is outdated

875eb5b

Fix cannot get data by hashes again if the first try is failed

de26909

yongjiema changed the title ~~Try to get blocks if the current height is bigger than or equal to the max start height (2x)~~ Improve syncing (2x) Aug 21, 2019

yongjiema mentioned this pull request Aug 21, 2019

Fix/Improve syncing (3x): Adding ping mechanism to TaskManager for replacing StartHeight and PendingKnownHashes strategy #899

Merged

lock9 previously requested changes Sep 10, 2019

View reviewed changes

vncoelho reviewed Sep 16, 2019

View reviewed changes

neo/Network/P2P/ProtocolHandler.cs Show resolved Hide resolved

vncoelho reviewed Sep 16, 2019

View reviewed changes

neo/Network/P2P/TaskManager.cs Show resolved Hide resolved

Merge branch 'master-2.x' into fix-sync-blocks-2.x

35e1d17

vncoelho reviewed Oct 30, 2019

View reviewed changes

neo/Network/P2P/ProtocolHandler.cs Show resolved Hide resolved

neo/Network/P2P/ProtocolHandler.cs Show resolved Hide resolved

neo/Network/P2P/RemoteNode.cs Show resolved Hide resolved

Merge branch 'master-2.x' into fix-sync-blocks-2.x

8830421

vncoelho reviewed Nov 10, 2019

View reviewed changes

vncoelho changed the title ~~Improve syncing (2x)~~ Improve syncing (2x): Adding ping mechanism to TaskManager for replacing StartHeight Nov 10, 2019

vncoelho requested a review from lock9 November 13, 2019 03:28

Merge branch 'master-2.x' into fix-sync-blocks-2.x

d3a2276

vncoelho previously approved these changes Nov 13, 2019

View reviewed changes

vncoelho changed the title ~~Improve syncing (2x): Adding ping mechanism to TaskManager for replacing StartHeight~~ Fix/Improve syncing (2x): Adding ping mechanism to TaskManager for replacing StartHeight and PendingKnownHashes strategy Nov 13, 2019

erikzhang reviewed Nov 14, 2019

View reviewed changes

neo/Network/P2P/ProtocolHandler.cs Show resolved Hide resolved

Merge branch 'master-2.x' into fix-sync-blocks-2.x

de4080b

Update ProtocolHandler.cs

f6b8b56

erikzhang dismissed vncoelho’s stale review via f6b8b56 November 19, 2019 05:23

erikzhang approved these changes Nov 19, 2019

View reviewed changes

Merge branch 'master-2.x' into fix-sync-blocks-2.x

39f3d2e

shargon reviewed Nov 19, 2019

View reviewed changes

lock9 reviewed Nov 25, 2019

View reviewed changes

Merge branch 'master-2.x' into fix-sync-blocks-2.x

b5128f5

erikzhang merged commit 15f59a7 into neo-project:master-2.x Nov 25, 2019

roman-khimov mentioned this pull request Nov 26, 2019

Implement ping/pong messages parsing and processing nspcc-dev/neo-go#430

Closed

roman-khimov mentioned this pull request Dec 16, 2019

add ping pong processing nspcc-dev/neo-go#456

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/Improve syncing (2x): Adding ping mechanism to TaskManager for replacing StartHeight and PendingKnownHashes strategy #1024

Fix/Improve syncing (2x): Adding ping mechanism to TaskManager for replacing StartHeight and PendingKnownHashes strategy #1024

yongjiema commented Aug 13, 2019

shargon commented Aug 13, 2019

vncoelho left a comment

vncoelho left a comment •

edited

Loading

vncoelho commented Aug 13, 2019

vncoelho left a comment

vncoelho commented Aug 21, 2019

lock9 left a comment

yongjiema commented Sep 16, 2019

vncoelho left a comment •

edited

Loading

superboyiii commented Oct 29, 2019

erikzhang commented Oct 29, 2019

cloud8little commented Oct 29, 2019

yongjiema commented Oct 29, 2019

cloud8little commented Oct 30, 2019

vncoelho commented Oct 31, 2019

yongjiema commented Nov 1, 2019

vncoelho commented Nov 1, 2019

yongjiema commented Nov 2, 2019

vncoelho commented Nov 10, 2019 •

edited

Loading

vncoelho left a comment •

edited

Loading

yongjiema commented Nov 12, 2019

vncoelho commented Nov 13, 2019

vncoelho commented Nov 13, 2019 •

edited

Loading

vncoelho left a comment

vncoelho commented Nov 13, 2019 •

edited

Loading

vncoelho commented Nov 16, 2019 •

edited

Loading

shargon left a comment

lock9 left a comment

vncoelho commented Nov 25, 2019

Fix/Improve syncing (2x): Adding ping mechanism to TaskManager for replacing StartHeight and PendingKnownHashes strategy #1024

Fix/Improve syncing (2x): Adding ping mechanism to TaskManager for replacing StartHeight and PendingKnownHashes strategy #1024

Conversation

yongjiema commented Aug 13, 2019

shargon commented Aug 13, 2019

vncoelho left a comment

Choose a reason for hiding this comment

vncoelho left a comment • edited Loading

Choose a reason for hiding this comment

vncoelho commented Aug 13, 2019

vncoelho left a comment

Choose a reason for hiding this comment

vncoelho commented Aug 21, 2019

lock9 left a comment

Choose a reason for hiding this comment

yongjiema commented Sep 16, 2019

vncoelho left a comment • edited Loading

Choose a reason for hiding this comment

superboyiii commented Oct 29, 2019

erikzhang commented Oct 29, 2019

cloud8little commented Oct 29, 2019

yongjiema commented Oct 29, 2019

cloud8little commented Oct 30, 2019

vncoelho commented Oct 31, 2019

yongjiema commented Nov 1, 2019

vncoelho commented Nov 1, 2019

yongjiema commented Nov 2, 2019

vncoelho commented Nov 10, 2019 • edited Loading

vncoelho left a comment • edited Loading

Choose a reason for hiding this comment

yongjiema commented Nov 12, 2019

vncoelho commented Nov 13, 2019

vncoelho commented Nov 13, 2019 • edited Loading

vncoelho left a comment

Choose a reason for hiding this comment

vncoelho commented Nov 13, 2019 • edited Loading

vncoelho commented Nov 16, 2019 • edited Loading

shargon left a comment

Choose a reason for hiding this comment

lock9 left a comment

Choose a reason for hiding this comment

vncoelho commented Nov 25, 2019

vncoelho left a comment •

edited

Loading

vncoelho left a comment •

edited

Loading

vncoelho commented Nov 10, 2019 •

edited

Loading

vncoelho left a comment •

edited

Loading

vncoelho commented Nov 13, 2019 •

edited

Loading

vncoelho commented Nov 13, 2019 •

edited

Loading

vncoelho commented Nov 16, 2019 •

edited

Loading