Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Startup crash trying to read key from corrupted data file [JIRA: RIAK-1819] #212

Closed
engelsanchez opened this issue May 19, 2015 · 5 comments

Comments

@engelsanchez
Copy link
Contributor

On startup, when a hintfile is not available or does not pass the CRC check, Bitcask loads keys directly from the data file. When it does, it is not validating each record in the data file against its CRC. This changed as part of an optimization introduced last year that tried to skip over values when reading the data file only to read keys. Unfortunately this means that it can not use the CRC to verify the validity of the key or value, and may end up with crash errors like the one below. Since this is only an optimization, and it only applies when hintfiles are not available, I propose removing it to handle corruption properly.

It may be worth to also protect the key conversion code with a try/catch to also protect from loading a key in an unhandle format. This could happen as part of a downgrade or similar situation.

[error] <0.730.0> scan_key_files: error function_clause @ [
{riak_kv_bitcask_backend,key_transform_to_1,[<<>>],[{file,"src/riak_kv_bitcask_back
end.erl"},{line,99}]},{bitcask,'-scan_key_files/5-fun-0-',7,[{file,"src/bitcask.erl
"},{line,1188}]},{bitcask_fileops,fold_keys_int_loop,5,[{file,"src/bitcask_fileops.
erl"},{line,595}]},{bitcask_fileops,fold_file_loop,8,[{file,"src/bitcask_fileops.er
l"},{line,720}]},{bitcask_fileops,fold_keys_loop,4,[{file,"src/bitcask_fileops.erl"
},{line,575}]},{bitcask,scan_key_files,5,[{file,"src/bitcask.erl"},{line,1196}]},{b
itcask,init_keydir_scan_key_files,4,[{file,"src/bitcask.erl"},{line,1289}]},{bitcas
k,init_keydir,4,[{file,"src/bitcask.erl"},{line,1241}]}]
@krestenkrab
Copy link

Which version of Riak introduced this issue? We have a Riak installation on some apparently very unreliable hardware, where this has bitten us a couple of times over the last month. I'd like to downgrade that setup to a "pre-optimization" version.

@engelsanchez
Copy link
Contributor Author

@krestenkrab It seems I was wrong about the origin of the lack of CRC checks when only reading keys from data files instead of hintfiles. The optimization I was thinking of introduced variable chunk sizes while reading the file, but the code all the way back to pre 1.4 was already skipping the value and not computing the CRC:

https://github.com/basho/bitcask/blob/1.5.0/src/bitcask_fileops.erl#L370

So we have been vulnerable to corruption in that scenario for a long time apparently.

@evanmcc
Copy link
Contributor

evanmcc commented Jun 15, 2015 via email

@engelsanchez
Copy link
Contributor Author

This was fixed in #214, included in Bitcask 1.7.2 and shipped with Riak 2.0.6. It will ship in the upcoming 2.1 point release.

@oleksiyk
Copy link

I got the same error when one of the Riak 2.1.1 nodes went down (just Riak server went down, not the server itself) and now I can't start Riak with the following error:

2016-02-28 12:41:57.134 [warning] <0.4207.0> Hintfile '/var/lib/riak/fs_chunks/422465317040964124793252646957050560369293000704/5.bitcask.hint' invalid
2016-02-28 12:41:57.164 [error] <0.4207.0> scan_key_files: error function_clause @ [{riak_kv_bitcask_backend,key_transform_to_1,[<<>>],[{file,"src/riak_kv_bitcask_backend.erl"},{line,99}]},{bitcask,'-scan_key_files/5-fun-0-',7,[{file,"src/bitcask.erl"},{line,1188}]},{bitcask_fileops,fold_keys_int_loop,5,[{file,"src/bitcask_fileops.erl"},{line,595}]},{bitcask_fileops,fold_file_loop,8,[{file,"src/bitcask_fileops.erl"},{line,720}]},{bitcask_fileops,fold_keys_loop,4,[{file,"src/bitcask_fileops.erl"},{line,575}]},{bitcask,scan_key_files,5,[{file,"src/bitcask.erl"},{line,1196}]},{bitcask,init_keydir_scan_key_files,4,[{file,"src/bitcask.erl"},{line,1289}]},{bitcask,init_keydir,4,[{file,"src/bitcask.erl"},{line,1241}]}]

I've updated Riak to 2.1.3 on the failed node and it started successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants