Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write synchronously in LevelDBSnapshot.Commit and prevent data loss. #680

Closed
wants to merge 2 commits into from

Conversation

jsolman
Copy link
Contributor

@jsolman jsolman commented Apr 3, 2019

I have been puzzled for a while on an issue where syncing MainNet continued to throw exceptions when I sync the chain using machines with high performance cpu and lower disk performance. I discovered the batch write in commit was not being performed synchronously, which seems to lead to the LevelDB current snapshot for the block being created with missing data, which potentially leads to data loss upon persisting the next block. Usually this surfaces as some exception when trying to persist the block.

Update: I believe the root cause now also includes the Blockchain running Import messages on different OS threads.

This fixes part of the problem to fix to the root cause issue I reported way back here as what I thought was a neo-vm issue at the time: neo-project/neo-vm/issues/53

@jsolman jsolman requested a review from erikzhang April 3, 2019 12:12
@jsolman jsolman added bug Used to tag confirmed bugs critical Issues (bugs) that need to be fixed ASAP labels Apr 3, 2019
@jsolman
Copy link
Contributor Author

jsolman commented Apr 3, 2019

I’m not entirely sure this has eliminated the issue I have been seeing yet. Testing is still ongoing.

@jsolman jsolman removed the critical Issues (bugs) that need to be fixed ASAP label Apr 3, 2019
@jsolman
Copy link
Contributor Author

jsolman commented Apr 3, 2019

I think Blockchain may also need to use a PinnedDispatcher: https://getakka.net/articles/actors/dispatchers.html#pinneddispatcher

The updated ImportBlocks plugin issues imports in separate messages, and the Blockchain needs to execute them on the same OS thread for safety.

@jsolman
Copy link
Contributor Author

jsolman commented Apr 3, 2019

Actually, I believe using the PinnedDispatcher for the Blockchain is more important than synchronizing the batched writes. We probably don’t have to synchronize the batch writes after all.

@erikzhang
Copy link
Member

Doing so will significantly reduce performance and is not necessary.

@jsolman
Copy link
Contributor Author

jsolman commented Apr 3, 2019

I agree, I will close this and re-open another one with the PinnedDispatcher for the Blockchain. That will fix the issue I think.

@jsolman jsolman closed this Apr 3, 2019
@erikzhang
Copy link
Member

What is the reason for using PinnedDispatcher?

@jsolman
Copy link
Contributor Author

jsolman commented Apr 3, 2019

It seems if the Blockchain gets a second import message and runs on a different thread it may be possible for the blockchain snapshot that is obtained to not be the latest snapshot of the LevelDB database. This is what my tests are indicating. If the same OS thread is used such as if calling ImportBlocks once with an Iterable that will get all blocks in one call as the existing Plugin does, no problem is encountered during syncing.

@jsolman jsolman deleted the FixDatabaseConsistency branch April 3, 2019 16:50
@jsolman
Copy link
Contributor Author

jsolman commented Apr 3, 2019

The Blockchain.currentSnapshot will be right because it uses interlocked exchange from the same thread, but the new one obtained during OnPersist could be wrong possibly, because it obtains it again from the DB possibly on a different OS thread.

@jsolman
Copy link
Contributor Author

jsolman commented Apr 3, 2019

The actor’s state itself should be guaranteed to have memory visibility if the thread changes, but things outside the actor like in LevelDB may not have consistent memory visibility if the thread has changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Used to tag confirmed bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants