-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bitmonerod: segfaults on (probably) corrupt lmdb blockchain data #898
Comments
Yes, save a copy of the LMDB data file please. I probably won't get to look at it any time soon though. Your backtrace appears to be a non-debug build, can you get a trace from a debug build? |
|
Can you also check, in frame #6, print m_height |
|
That's kind of what I expected. This says that it never read any valid block count from the DB when first opening it. I think some earlier function must have failed, before reaching here, and we didn't catch the error code. |
We should think about a way to toggle from the default "--db_sync_mode fastest:async:1000" back down to "--db_sync_mode safe" after the daemon gets fully sync'd. After the daemon has caught up to the network, we know that new blocks will only commit ~1 every 2 minutes so running in fully synchronous mode won't be generating a lot of disk flushes. |
I definitely agree with switching to safe mode once synced, but there is another case to consider. You already have gigabytes of blockchain downloaded but are offline for a time. When you come online you are in sync mode, but corruption there means you lose your whole DB. I think any unsafe DB modes should only be used on initial sync, or if specified as a non default (can be used by advanced users to speed up later partial syncs) |
Yeah, definitely unsafe modes should only be used if specified explicitly. For your intermediate case, I think we could use NOMETASYNC by itself. That is still synchronous, but unlike full sync mode which does 2 fsyncs per commit, it only does 1 fsync per commit. In this case, a crash cannot lose integrity, but it could lose the last committed txn. It's a compromise setting; faster than fully sync'd mode with a 1 txn possible loss. |
Losing any number of transactions is okay here, as long as there is no corruption. I guess if the failure case loses one, then we also want batching of blocks during a bulk sync to maximize performance safely (may already occur; I'm not sure). |
Bulk syncing batches 200 blocks at a time. |
My box had some unexpected unclean hard shutdowns due to hardware problems.
Now bitmonerod fails to start due to this segfault. The blockchain data probably was not closed/writtent to disk cleanly.
Expected behavior: upon encountering corruption in the blockchain DB on disk, bitmonerod should report it without crashing.
I have the lmdb data and the core for this. @hyc if you want it.
Monero 'Hydrogen Helix' (v0.9.4.0-18dd507
The text was updated successfully, but these errors were encountered: