Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitcask may get stuck due to caching port #188

Closed
engelsanchez opened this issue Sep 5, 2014 · 5 comments
Closed

Bitcask may get stuck due to caching port #188

engelsanchez opened this issue Sep 5, 2014 · 5 comments
Assignees
Milestone

Comments

@engelsanchez
Copy link
Contributor

There is code in Bitcask to list files in a directory that tries to avoid serializing on the file server by caching an efile port in the process dictionary, which it uses to list directories when needed. The problem is that the port may go away. The process that opened it would get a notification if it happened, but nothing would tie that back to Bitcask to release the cached port. After that point, calls to list directory contents to, say, open a bitcask, would fail forever.

The caching function is bitcask_fileops: get_efile_port/0, used by bitcask_fileops:list_dir/1.

I believe the best thing to do at this point is to use a regular directory listing operation here. The initial file server serialization problem was caused by merges piling up on the Riak side when things got slow. The Bitcask backend code will now avoid issuing merge requests until the last one has finished, which should prevent that from happening again.

@engelsanchez
Copy link
Contributor Author

I have confirmed that this happened to me because I was running in the Erlang shell and by chance happened to make a typo at just the right moment in the sequence of commands. The typo caused an exception, which the shell didn't catch. The shell fakes a lot of things, trying to make the shell process seem persistent. In fact, the process died and was faked to look like it stayed around. Since the original process that opened the "efile" port and cached it in the process dictionary died, the port went away, leaving us in this state.

This is very unlikely to happen in Riak, as making that port go away would require something catastrophic. Yet, it might be a good idea to do away with this hack anyway.

@engelsanchez
Copy link
Contributor Author

A fix for this issue here #189

@slfritchie
Copy link
Contributor

Hrm, I disagree with Bitcask using any operation that gets redirected to file_server_2: it sucks, plain and simple. Any other OTP app can send a spew of requests to it, then Bitcask's requests get stuck behind it, and we're back in the same serialized pickle.

I agree that noodling around in the shell can cause problems with the port "cache" in the process dictionary. However, AFAIK, any exception that invalidates the cached port will also kill the owning vnode process, so the problem effectively doesn't happen. Having said that, #189 is a good bit of paranoia.

@slfritchie
Copy link
Contributor

#189 has been merged, closing

@engelsanchez
Copy link
Contributor Author

Agree. In the end I just detected the stale cached port and reloaded. This will likely only help people playing with Bitcask on a shell, but paranoia is good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants