-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitcask may get stuck due to caching port #188
Comments
I have confirmed that this happened to me because I was running in the Erlang shell and by chance happened to make a typo at just the right moment in the sequence of commands. The typo caused an exception, which the shell didn't catch. The shell fakes a lot of things, trying to make the shell process seem persistent. In fact, the process died and was faked to look like it stayed around. Since the original process that opened the "efile" port and cached it in the process dictionary died, the port went away, leaving us in this state. This is very unlikely to happen in Riak, as making that port go away would require something catastrophic. Yet, it might be a good idea to do away with this hack anyway. |
A fix for this issue here #189 |
Hrm, I disagree with Bitcask using any operation that gets redirected to file_server_2: it sucks, plain and simple. Any other OTP app can send a spew of requests to it, then Bitcask's requests get stuck behind it, and we're back in the same serialized pickle. I agree that noodling around in the shell can cause problems with the port "cache" in the process dictionary. However, AFAIK, any exception that invalidates the cached port will also kill the owning vnode process, so the problem effectively doesn't happen. Having said that, #189 is a good bit of paranoia. |
#189 has been merged, closing |
Agree. In the end I just detected the stale cached port and reloaded. This will likely only help people playing with Bitcask on a shell, but paranoia is good. |
There is code in Bitcask to list files in a directory that tries to avoid serializing on the file server by caching an efile port in the process dictionary, which it uses to list directories when needed. The problem is that the port may go away. The process that opened it would get a notification if it happened, but nothing would tie that back to Bitcask to release the cached port. After that point, calls to list directory contents to, say, open a bitcask, would fail forever.
The caching function is bitcask_fileops: get_efile_port/0, used by bitcask_fileops:list_dir/1.
I believe the best thing to do at this point is to use a regular directory listing operation here. The initial file server serialization problem was caused by merges piling up on the Riak side when things got slow. The Bitcask backend code will now avoid issuing merge requests until the last one has finished, which should prevent that from happening again.
The text was updated successfully, but these errors were encountered: