Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/Improve syncing (2x): Adding ping mechanism to TaskManager for replacing StartHeight and PendingKnownHashes strategy #1024

Merged
merged 12 commits into from
Nov 25, 2019

Conversation

yongjiema
Copy link
Contributor

No description provided.

@yongjiema yongjiema changed the title Try to get blocks if the current height is bigger than or equal to the max start height Try to get blocks if the current height is bigger than or equal to the max start height (2x) Aug 13, 2019
@shargon
Copy link
Member

shargon commented Aug 13, 2019

Could you make some unit test for this?

Copy link
Member

@vncoelho vncoelho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yongjiema, it looks like a reasonable change in my opinion since nodes are not going to request blocks all the time, but based on the maximum startheight.

Copy link
Member

@vncoelho vncoelho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yongjiema,

I think that this will spam the network a little bit. Do not think so?

Could you set a variable with the percentage of nodes that (sessions) will be asked for getblocks in such case?

Otherwise, I am thinking that the node will start to request getblocks to all its sessions until it reconnects to some other session at higher startheight.

Maybe a random number:

  • something like 20% of chance of requesting getblocks to a given session if such case happen: || (Blockchain.Singleton.Height < Blockchain.Singleton.HeaderHeight && Blockchain.Singleton.Height >= maxStartHeight) && Random.Rand() <= maximumPercentageForRequestingBlocksToAGivenSessionInCaseWeAreOnTop

@vncoelho
Copy link
Member

On the other hand, @yongjiema, if we have a disconnection mechanism, in which:

  • nodes monitor the requests of each of its sessions
  • if it detects that the connected session is demanding a lot of the same information it will disconnect from it, which is theoretically, a spamming node/friend.

Copy link
Member

@vncoelho vncoelho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yongjiema aaaaaaa, there is also another possibility.

In such case, you ping nodes!
The ping will return the LastBlockIndex, which will, consequently, release the request of blocks naturally, right?

@yongjiema yongjiema changed the title Try to get blocks if the current height is bigger than or equal to the max start height (2x) Improve syncing (2x) Aug 21, 2019
@vncoelho
Copy link
Member

@yongjiema, I fell good improvements comming. Nice job.
We just need to be sure we want them for master2x, but surely for master3x.

lock9
lock9 previously requested changes Sep 10, 2019
Copy link
Contributor

@lock9 lock9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make some UTs? How did you test this?

@yongjiema
Copy link
Contributor Author

Tested with a build locally as we don't have some basic UTs.

Copy link
Member

@vncoelho vncoelho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mannnn, @yongjiema, I just took a look at it right now, this new design looks good.

You did several good ideas simply after the last time I saw the code. Great job.
What do you think, @erikzhang?

I think that I agree with this, let me check with more time, some more couple of days. But the changes look precise.
During this time, @lock9, I will also try to think about UTs, but not sure if we are going to push here for NEO2. It would be nice, but maybe the change may reach a point that it is clearly enough to us.
But I agree that, at least, for NEO3 we should try to test basic cases of messages arriving.

@superboyiii
Copy link
Member

@erikzhang Could you have a look at this? I think this could help fixing the consensus stuck.

@erikzhang
Copy link
Member

There are two "requested changes".

@cloud8little
Copy link
Contributor

@yongjiema I can still get block stuck at a lower height with this fix, following is the environment I setup. 1 seed node, 4 consensus node; and send nep5 transactions to the seed node continuously, about 3txs per second.
The best height was 57749, while seed node stuck at 57726 for over 10 minutes.
3591572340223_ pic_hd
seed node:
3581572340207_ pic

neo> show pool
total: 45233, verified: 31545, unverified: 13688

consensus node:
node1:

show pool
total: 27259, verified: 304, unverified: 26955

node2:

show pool
total: 27479, verified: 1976, unverified: 25503

node3:

show pool
total: 27667, verified: 1959, unverified: 25708

here is part of the consensus log, showing that every block contains 21 txs.
9329:[16:44:40.049] relay block: height=57695 hash=0x58531450235011caa9a44a73edae03dd652f1c8b0a97bbded414907914e6ed5a tx=21 9330:[16:44:40.326] persist block: height=57695 hash=0x58531450235011caa9a44a73edae03dd652f1c8b0a97bbded414907914e6ed5a tx=21 9339:[16:44:56.210] relay block: height=57696 hash=0xdaf61e192c40b7fc9ccddfdb0e4e1fa96f94edeecf4c818ab7e8f2867a934790 tx=21 9340:[16:44:57.084] persist block: height=57696 hash=0xdaf61e192c40b7fc9ccddfdb0e4e1fa96f94edeecf4c818ab7e8f2867a934790 tx=21 9342:[16:45:11.449] OnPrepareRequestReceived: height=57697 view=0 index=1 tx=21 9348:[16:45:11.989] relay block: height=57697 hash=0xe8281a7361c8e2d4c9bf3dd14527788592567fc82ab49b6ad3262a204e508382 tx=21 9349:[16:45:12.183] persist block: height=57697 hash=0xe8281a7361c8e2d4c9bf3dd14527788592567fc82ab49b6ad3262a204e508382 tx=21 9357:[16:45:42.381] OnPrepareRequestReceived: height=57698 view=1 index=1 tx=21 9371:[16:45:43.462] relay block: height=57698 hash=0xd233b9ac764c672170d5f1b7c4d9d2daeb90cf4006d8dc08efdde4c8733a759b tx=21 9372:[16:45:43.791] persist block: height=57698 hash=0xd233b9ac764c672170d5f1b7c4d9d2daeb90cf4006d8dc08efdde4c8733a759b tx=21 9374:[16:45:58.709] OnPrepareRequestReceived: height=57699 view=0 index=3 tx=21 9380:[16:45:59.545] relay block: height=57699 hash=0xd3062287dd601b51310cbb999fdf9ceb8442919c9ec6b5c869fd6ff74e92ef02 tx=21 9381:[16:45:59.872] persist block: height=57699 hash=0xd3062287dd601b51310cbb999fdf9ceb8442919c9ec6b5c869fd6ff74e92ef02 tx=21

@yongjiema
Copy link
Contributor Author

@cloud8little Can you try send 180txs per mins? Ideally the interval is greater than 30 secs.

https://github.com/neo-project/neo/blob/master/neo/Network/P2P/TaskManager.cs#L23

@cloud8little
Copy link
Contributor

@cloud8little Can you try send 180txs per mins? Ideally the interval is greater than 30 secs.

https://github.com/neo-project/neo/blob/master/neo/Network/P2P/TaskManager.cs#L23

180txs per mins can be handled by this fix, correct a bit, TPS yesterday should be around 12tx/sec, today tried again, it works well, when 9.6~12 txs per second, it can recover to the best height in several minutes. The mempool is full with 50 thousands txs.
屏幕快照 2019-10-30 下午5 39 38

Good to merge this PR.

neo/Network/P2P/ProtocolHandler.cs Show resolved Hide resolved
neo/Network/P2P/ProtocolHandler.cs Show resolved Hide resolved
neo/Network/P2P/RemoteNode.cs Show resolved Hide resolved
@vncoelho vncoelho dismissed their stale review October 30, 2019 12:02

Not application for now. However, still need to double check if someone can spam the node with pendinghashes, ping @yongjiema

@vncoelho
Copy link
Member

@yongjiema, what do you think about some node just sending random hashes that will enter on the pendingKnownHashes?

@yongjiema
Copy link
Contributor Author

@yongjiema, what do you think about some node just sending random hashes that will enter on the pendingKnownHashes?

Will remove these later if it's persisted to blockchain instead directly inserting to knownHashes previously.

private static readonly TimeSpan PendingTimeout = TimeSpan.FromMinutes(1);

@vncoelho
Copy link
Member

vncoelho commented Nov 1, 2019

I see, @yongjiema, is not this a problem?
Because they will be removed after, let`s say, 1 minutes...And, then, some node could spam again.

@yongjiema
Copy link
Contributor Author

I see, @yongjiema, is not this a problem?
Because they will be removed after, let`s say, 1 minutes...And, then, some node could spam again.

Currently cannot stop if someone want to spam, they can generate different transactions are verified and send to network, and previous solution could be out of memory.

@vncoelho
Copy link
Member

vncoelho commented Nov 10, 2019

@yongjiema, I am not sure.
This is a dangerous change of pendingHashes...aheuahuea
But I like it.

I mean:

  1. Nowadays, we directly add to knownHashes, which is only removed if the Capacity of the FifoSet is reached! This can take some time. However, as you said, someone could just produce random hashes and spam, right?
  2. With the new pendingHashes, someone can periodically send the same invalid hashes and, also, do the same previous attack, generating random hashes and making the PendingKnownHashesCollection with even millions of items (worse than before).

Copy link
Member

@vncoelho vncoelho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yongjiema,

Could you asap split the PRs in two different ones?
I am going to approve the part I related to the ping and the use of LastBlockIndex. We need to move forward and this ping is surely a good thing. The StartHeight looks bad to me.

Then, we discuss the another PR with just the 'PendingKnownHashesCollection`, namely part II.

Could you do that for both master2x and 3x?

Maybe you can just remove the PendingKnownHashesCollection and open the new PR with it.
Leave the opened PRs with the part I.

@vncoelho vncoelho changed the title Improve syncing (2x) Improve syncing (2x): Adding ping mechanism to TaskManager for replacing StartHeight Nov 10, 2019
@yongjiema
Copy link
Contributor Author

@vncoelho Can you think more? It will not resolve the sync stuck issue without PendingKnownHashesCollection for me.

@vncoelho
Copy link
Member

Perfect, @yongjiema, I am currently testing it under some scenarios. It is now available online here: https://neocompiler.io/#!/ecolab/cnnodesinfo

@vncoelho vncoelho requested a review from lock9 November 13, 2019 03:28
@vncoelho
Copy link
Member

vncoelho commented Nov 13, 2019

@lock9, UTs for this PR will not be possible right now.
However, it will be great if you could also try some tests.
When you have some time, please double check the PR and your review.

I am now almost completely understanding the change and in favor of it. Currently, we are performing some test to be more sure about its possible improvements.

vncoelho
vncoelho previously approved these changes Nov 13, 2019
Copy link
Member

@vncoelho vncoelho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we go, @yongjiema.
The syncing process is considerably improved.
It is like a turning water into wine. Congratulations!

@vncoelho vncoelho dismissed lock9’s stale review November 13, 2019 14:19

UTs are not possible right now, we would need to first design them for the classes of P2P that are being modified. Even for NEO3, perhaps, the UTs will come later.

@vncoelho
Copy link
Member

vncoelho commented Nov 13, 2019

@superboyiii @cloud8little, as far as I can see, you both also agree with the change.

I dismissed @lock9 requested change because the UTs are not feasible for now, even for master3x.

@erikzhang, from my tests here, I never saw the network like this!
It is really incredible job, deserves a good recognition for this, my friend!
I discussed the attack vectors with @yongjiema and from his visions I understood that is does not make the P2P less reliable.

Let's merge this PR. I should apologize for the long time we took to see this improvement (but, perhaps, we also improved it a little bit since the idea draft), @yongjiema. You were very precisely to the cause.

@vncoelho vncoelho changed the title Improve syncing (2x): Adding ping mechanism to TaskManager for replacing StartHeight Fix/Improve syncing (2x): Adding ping mechanism to TaskManager for replacing StartHeight and PendingKnownHashes strategy Nov 13, 2019
@vncoelho
Copy link
Member

vncoelho commented Nov 16, 2019

omg, this PR is a magic @yongjiema, if @jsolman takes a look he will probably be proud of it.
We are having great experience on the Neocompiler Eco Shared Privatnet.

Copy link
Member

@shargon shargon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some comments, what's the meaning of pendingKnownHashes?

Copy link
Contributor

@lock9 lock9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vncoelho, I can't approve it because I currently don't know how to properly test this. If you can guide me on how to test it, I may be able to approve it.

@vncoelho
Copy link
Member

@lock9, I am not going to guide you throughout these steps right now.
This is not the first time I am seeing this situation.
In addition, I think it is part of the learning as well.

It took many months and great part of the energy we try to put on NEO for motivating such kind of initiatives that result in this PR and knowledge evolution. It is taking us time and efforts to try to understand it, try to test it and surely a similar procedure happened with @yongjiema.

Right now, I need to focus on other aspects that I and the ones surround me consider to be important. Meanwhile, fell free to design the UTs for this PR or leave it opened until you think it should.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants