Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data lost on crash #740

Closed
fpessolano opened this issue Mar 15, 2019 · 15 comments
Closed

Data lost on crash #740

fpessolano opened this issue Mar 15, 2019 · 15 comments

Comments

@fpessolano
Copy link

I am using default options (thus syncwrite is true), if my server die normally and the database is closed, all data is saved to disk. If the server crashes (or I interrupt it) in a non serviceable manner, no data is written to the disk.
Is there a way to force data to be immediately written to the disk so that no data would be lost in case of a non serviceable server crash?
Thanks

@jarifibrahim
Copy link
Contributor

@fpessolano Badger writes all values (create/update) to the vlog first and then processes them. In case of a server crash, the values would be written to the vlog and so badger will be able to recover from the failure when it starts the next time.

@mangalaman93 Please correct me if I'm wrong.

@mangalaman93
Copy link
Contributor

@jarifibrahim looks correct. It also depends whether the call to read, write has been completed before server crashed. If not, the writes may not be visible on disk.

@manishrjain
Copy link
Contributor

There's an open bounty for data loss: #601 .

@fpessolano
Copy link
Author

Well, I have just covered with recover various crashes as well as system signals interruption since I noticed that data was not being saved to the disk.
There is also a note that this is the normal behavior as data is written only when tablemax is reached, meaning that if a crash recovery is missed, data will be lost. Or at least this is how I read it.

@manishrjain
Copy link
Contributor

manishrjain commented Mar 18, 2019

The data, as @jarifibrahim mentioned, gets written to value log first. Then makes its way to LSM tree. On a crash, the value log gets replayed. You can test out accessing data via APIs, it would be there. Feel free to reopen the issue, if you can't find it.

@fpessolano
Copy link
Author

fpessolano commented Mar 18, 2019 via email

@manishrjain
Copy link
Contributor

If you can share a code snippet, we can try to identify your issue. Most likely, it is with how you're writing or accessing data, than an issue in Badger itself.

@fpessolano
Copy link
Author

fpessolano commented Mar 18, 2019

Sure, the read snippets is quite basic:

r = make([]byte, lb)
err = db.View(func(txn *badger.Txn) error {
	item, err := txn.Get(id)
	if err != nil {
		return err
	}
	val, err := item.ValueCopy(nil)
	if err != nil {
		return err
	}
	copy(r, val)
	return nil
})

And the write also:

err = db.Update(func(txn *badger.Txn) error {
	var err error
	if ttl {
		err = txn.SetWithTTL(id, a, currentTTL)
	} else {
		err = txn.Set(id, a)
	}
	return err
})

some parameters are rom the (not copied) enclosing code of course ...

@richp10
Copy link

richp10 commented Apr 2, 2019

I have only started looking at Badger today but am not finding it resilient to program crashing. With simple code similar to above, my app works well..... but if I crash my app (or just stop termination from Goland) the database does not work when I next start it.

In effect I have lost all my database because badger does not recover from the crash. The only way I can get things working again is to stop, delete the database and restart.

Any data in the database is effectively lost because the app will not restart.

Log shows the following:
All 0 tables opened in 0s
Replaying file id: 0 at offset: 0
"Value log truncate required to run DB. This might result in data loss"

Each attempt to access the database is then blocked. The problem is that it blocks in the function WaitForMark in watermark.go - so any process trying to access the db freezes.

Crashing without calling .Close() should not leave the database unable to recover so completely when it restarts.

@manishrjain
Copy link
Contributor

You need to turn on the truncate flag from default options, so Badger can truncate the value log slightly to bring it back to the point up until which all the entries are correctly check summed.

@richp10
Copy link

richp10 commented Apr 2, 2019

Amazing, that seems to work perfectly. So perfectly I would have thought this should be the default.. I just searched the main page and I don't think this is mentioned anywhere prominent!..

Anyhow - thanks so much for the pointer, it looks like it takes badger from a bit disappointing to probably the best solution for my app!

@fpessolano
Copy link
Author

Badger is pretty good, my apologies for not posting what manishrjain to point out the solution to ym original post.

@fpessolano
Copy link
Author

Sorry to reopen this case, but on a product database we had an hard crash (electrricity went out) and we lost data. Truncate is true and syncwrite are botjh true, how can we prevent this from happening again?
Thanks

PS: as this is deployed we have to stick to version 1.5.5

@manishrjain
Copy link
Contributor

So perfectly I would have thought this should be the default

It used to be the default, but we heard a complain from IPFS folks. They wanted to have an option to not truncate and get involved manually.

Truncate is true and syncwrite are botjh true, how can we prevent this from happening again?

Shouldn't be the case for any write which has executed successfully. Unless the disk itself got corrupted?

@fpessolano
Copy link
Author

I opened a new issue with all the settings i use, just in case. We are exploring if the client disk got corrupted as well.
Just in case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants