Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race makes opening Bitcask dir impossible #187

Closed
engelsanchez opened this issue Sep 5, 2014 · 2 comments
Closed

Race makes opening Bitcask dir impossible #187

engelsanchez opened this issue Sep 5, 2014 · 2 comments
Assignees
Milestone

Comments

@engelsanchez
Copy link
Contributor

I have reproduced a problem where Bitcask gets stuck, unable to re-open a cask in a certain directory.

On open, first a keydir object is created, but not marked as ready. Then the files are scanned to populate it, at which point it is marked ready here bitcask.erl#L1248 and things are good. However, if the scan errors out, we hit this branch instead in bitcask.erk#1244, which does not mark the keydir as ready, but leaves it behind in that state. Calling open again on the same directory finds this existing keydir, but detects it is not ready, so tries to wait for it to load in bitcask.erl#L1252, eventually timing out.

When the error happens, the newly created keydir should probably be released.

Now, the fact that the error happens on scan might lead to a different bug. What has been observed is that the function to list files in a directory, bitcask_fileops:list_dir/1 returns {error, einval}, which is not handled in bitcask_fileops:data_file_tstamps/1, causing the error that leads to the stuck keydir. Notice how this function is trying to avoid a call to the file server by calling the efile port directly, which might be part of the reason. I'm currently investigating the exact sequence of events that leads to this.

@engelsanchez engelsanchez self-assigned this Sep 5, 2014
@engelsanchez engelsanchez added this to the 2.0.1 milestone Sep 5, 2014
@engelsanchez
Copy link
Contributor Author

The issue that caused the scan to fail has been filed separately here: #188. We should fix both sides: scans shouldn't fail, but any failure shouldn't result in an unusable keydir.

@engelsanchez
Copy link
Contributor Author

Fixed in the 1.7 branch by #190. A separate issue will track the merging of all fixes for the Riak 2.0.1 release into the develop branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant