Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Badger open data directory without any error with vlog file what contains broken value data #1049

Closed
nordicdyno opened this issue Sep 22, 2019 · 6 comments · Fixed by #1052
Closed
Assignees
Labels
area/data-loss Issues related to data loss or corruption. kind/enhancement Something could be better. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate or work on it.

Comments

@nordicdyno
Copy link

nordicdyno commented Sep 22, 2019

What version of Go are you using (go version)?

$ go version
1.13

What version of Badger are you using?

  • v1.6
  • v2.0.0-rc2

Does this issue reproduce with the latest master?

yes

What are the hardware specifications of the machine (RAM, OS, Disk)?

MacBook Pro (15-inch, 2018), 2,2 GHz i7, 16Gb, APPLE SSD AP0256M

What did you do?

I've spoiled data in vlog file in data directory (emulate real world issues with hardware) and badger just starts with this directory with any error.

What did you expect to see?

Error on database open.

What did you see instead?

Badger works without any error if spoiled data are in values body.

Extra

More info and reproducible tests are here:
https://github.com/nordicdyno/badger-spoiler

I'm not sure, but probably it's related to #601

@nordicdyno nordicdyno changed the title Badger could start on broken data without any error Badger could start with vlog file contains broken values data without any error Sep 22, 2019
@nordicdyno nordicdyno changed the title Badger could start with vlog file contains broken values data without any error Badger starts with vlog file contains broken values data without any error Sep 22, 2019
@nordicdyno nordicdyno changed the title Badger starts with vlog file contains broken values data without any error Badger open data directory without any error with vlog file what contain broken value data Sep 22, 2019
@nordicdyno nordicdyno changed the title Badger open data directory without any error with vlog file what contain broken value data Badger open data directory without any error with vlog file what contains broken value data Sep 22, 2019
@jarifibrahim
Copy link
Contributor

@nordicdyno The test you're referring to corrupts a bit in the middle of the file. This is an unlikely scenario. AFAIK, a file cannot be corrupted in the middle. It can be corrupted only from the end.

If the corruption was at the end, badger would've recognized it. We perform checksum verification of the entries in vlog after a specific point (called as head point).

@nordicdyno
Copy link
Author

I disagree. It's very likely scenario if we speak about disk corruption after badger stop*. I don't see any reason why it could NOT happen. And problem is what we never know about it as badger users.

* I don't want to speak about probability here, but real world could be a messy place

@nordicdyno
Copy link
Author

Why not just add checksum for every item in vlog and checking it against replayed values on start? (or smth like that)

@jarifibrahim jarifibrahim added kind/enhancement Something could be better. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate or work on it. labels Sep 24, 2019
@jarifibrahim jarifibrahim self-assigned this Sep 24, 2019
@jarifibrahim
Copy link
Contributor

This should be fixed by #1052 .

Thanks for writing a test @nordicdyno :)

@jarifibrahim jarifibrahim added the area/data-loss Issues related to data loss or corruption. label Sep 24, 2019
@afiskon
Copy link

afiskon commented Sep 25, 2019

For the record - this might be not a problem if the data is corrupted before the checkpoint (meaning the record that means that everything before it was synced to the heap), e.g. the record will not be used anyway during the recovery process after the restart. If the vlog entry is used during the recovery then I agree with @nordicdyno it's checksum should be checked. This should be a default behavior.

While working on PostgreSQL I had an unpleasant experience of recovering data from databases/backups which random entries were corrupted. Sadly sometimes this really happens in real world due to hardware failures.

@jarifibrahim
Copy link
Contributor

For the record - this might be not a problem if the data is corrupted before the checkpoint (meaning the record that means that everything before it was synced to the heap), e.g. the record will not be used anyway during the recovery process after the restart.

Even if the record before the checkpoint (head) is corrupted, we'll have the same problem.
The record might not be replayed, but if the value is stored in the vlog then the user might get corrupted data. Badger stores the value in either the SST or the value log depending upon the value of ValueThreshold

badger/options.go

Lines 268 to 278 in 64eb1df

// WithValueThreshold returns a new Options value with ValueThreshold set to the given value.
//
// ValueThreshold sets the threshold used to decide whether a value is stored directly in the LSM
// tree or separatedly in the log value files.
//
// The default value of ValueThreshold is 32, but LSMOnlyOptions sets it to 65500.
func (opt Options) WithValueThreshold(val int) Options {
opt.ValueThreshold = val
return opt
}

If the vlog entry is used during the recovery then I agree with @nordicdyno it's checksum should be checked. This should be a default behavior.

We already verify the checksum when we replay the entries. See

badger/value.go

Lines 382 to 391 in e3b5652

if _, err := io.ReadFull(reader, crcBuf[:]); err != nil {
if err == io.EOF {
err = errTruncate
}
return nil, err
}
crc := y.BytesToU32(crcBuf[:])
if crc != tee.Sum32() {
return nil, errTruncate
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/data-loss Issues related to data loss or corruption. kind/enhancement Something could be better. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate or work on it.
Development

Successfully merging a pull request may close this issue.

3 participants