Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak on TaskManager due to continuous growth of knownHashes (2.x) #865

Merged
merged 11 commits into from
Jun 26, 2019

Conversation

vncoelho
Copy link
Member

This may have something related with node stopping syncing and being lost with huge sizes of knownHashes hashset.

This implementation is a workaround, FIFOSet used in ProtocolHandler would not be so good here.

The bad trade-off I see would be when we call knownHashes.ExceptWith(payload.Hashes) that we would need to iterate the dictionary and remove the first known value.

@vncoelho vncoelho changed the title First draft with knownhashes as dictionary Fix memory leak on TaskManager due to continuous growth of knownHashes Jun 25, 2019
@vncoelho vncoelho changed the title Fix memory leak on TaskManager due to continuous growth of knownHashes Fix memory leak on TaskManager due to continuous growth of knownHashes (2.x) Jun 25, 2019
@vncoelho vncoelho added the port-to-3.x Feature or PR must be ported to Neo 3.x branch label Jun 25, 2019
@igormcoelho
Copy link
Contributor

Whats expected growth of this hashset ? Isnt it a better approach having large bloom filter here?

@vncoelho
Copy link
Member Author

vncoelho commented Jun 25, 2019

The bloom filter was already suggested by you in some previous discussions.

In fact, Igor, I thought that we had already solved this leak, but then I realized it was the leak on ProtocolHandler.

The size of this hashset becomes enormous! Even privatenet it was easily reaching 10K+ in a short period of time (I do not exactly remember the number, but the experiments were empirically good! aheuaheuahuea).

@vncoelho vncoelho marked this pull request as ready for review June 25, 2019 16:16
@vncoelho
Copy link
Member Author

root@7a9e514116c8:/# /opt/start_node.sh 
[17:00:18.870] OnStart
NEO-CLI Version: 2.10.2.0

[17:00:18.898] initialize: height=20 view=0 index=1 role=Backup
[17:00:20.913] timeout: height=20 view=0
[17:00:20.936] Skip requesting change view to nv=1 because nc=0 nf=3
knownHashes.Count:1
knownHashes.Count:2
knownHashes.Count:3
knownHashes.Count:4
knownHashes.Count:5
knownHashes.Count:6
knownHashes.Count:7
knownHashes.Count:8
knownHashes.Count:9
knownHashes.Count:10
knownHashes.Count:11
knownHashes.Count:12
knownHashes.Count:13
knownHashes.Count:14
knownHashes.Count:15
[17:00:21.955] persist block: height=20 hash=0x772f4dd1d62ab3e098643d71c3cf5d35e7e499a1bcf186a48180c37cc3663e4d tx=1
[17:00:21.956] initialize: height=25 view=0 index=1 role=Primary
[17:00:21.956] persist block: height=21 hash=0x106f3fb2e805722039f9bea43bf80e7505b45e9ff59dcf1f220356478dc146d1 tx=1
[17:00:21.956] initialize: height=25 view=0 index=1 role=Primary
[17:00:21.956] persist block: height=22 hash=0xced098a809c7b041fb85e14a4480fad9fc9a9ae41dbb1cf966d4932871d5cd3e tx=1
[17:00:21.956] initialize: height=25 view=0 index=1 role=Primary
[17:00:21.956] persist block: height=23 hash=0xcf666653b22ca93b6b75bf02a6b1b31265f55e9a1ba3798e91a563fe23ffa0f7 tx=1
[17:00:21.957] initialize: height=25 view=0 index=1 role=Primary
[17:00:21.959] persist block: height=24 hash=0x632f85e5118fbe67551bb275276b1930d0fcb9b16119b04129dc2a8288018ed7 tx=1
[17:00:21.960] initialize: height=25 view=0 index=1 role=Primary
knownHashes.Count:16
knownHashes.Count:17
[17:00:22.486] OnChangeViewReceived: height=25 view=0 index=0 nv=1
[17:00:22.489] OnChangeViewReceived: height=25 view=0 index=3 nv=1
[17:00:22.966] timeout: height=25 view=0
[17:00:22.975] send prepare request: height=25 view=0
knownHashes.Count:18
knownHashes.Count:19
knownHashes.Count:20
knownHashes.Count:21
knownHashes.Count:22
knownHashes.Count:23
knownHashes.Count:24
knownHashes.Count:25
[17:00:23.133] OnChangeViewReceived: height=25 view=0 index=2 nv=1
knownHashes.Count:26
[17:00:23.142] changeview: view=1 primary=02a7bc55fe8684e0119768d104ba30795bdcc86619e864add26156723ed185cd62
[17:00:23.142] initialize: height=25 view=1 index=1 role=Backup
[17:00:23.146] OnPrepareResponseReceived: height=25 view=1 index=2
knownHashes.Count:27
knownHashes.Count:28
[17:00:23.156] OnPrepareRequestReceived: height=25 view=1 index=0 tx=1
[17:00:23.176] send prepare response
[17:00:23.180] send commit

3s without any transaction.

After 2min 1K has been reached.

@vncoelho
Copy link
Member Author

vncoelho commented Jun 25, 2019

With a great probability, this is reaching 1M+ after weeks, @KickSeason.

@vncoelho
Copy link
Member Author

Tested and verified! I believe it is working nice now :D

neo/IO/Caching/FIFOSet.cs Outdated Show resolved Hide resolved
@shargon shargon requested a review from erikzhang June 26, 2019 08:52
erikzhang
erikzhang previously approved these changes Jun 26, 2019
@erikzhang erikzhang dismissed their stale review June 26, 2019 09:53

IEnumerable

@shargon
Copy link
Member

shargon commented Jun 26, 2019

I need it for

hashes.ExceptWith(knownHashes);

@vncoelho
Copy link
Member Author

vncoelho commented Jun 26, 2019

Much simpler now, thanks @shargon and @erikzhang.
Maybe just remove the comment before setting the Fifo.

@vncoelho vncoelho merged commit d993e4d into master-2.x Jun 26, 2019
@vncoelho vncoelho deleted the fix-leak-knownhashes-TaskManager branch June 26, 2019 11:17
@igormcoelho
Copy link
Contributor

Congratulations! Nice move!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
port-to-3.x Feature or PR must be ported to Neo 3.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants