-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvs_watch() can miss values on stalled read #812
Comments
This problem stems from the design of the get handler, which watch leverages. When the get handler must make an RPC to load some data into its cache in order to continue a walk through the namespace, it simply stores the original get request on wait queue of the cache entry, and when the load RPC returns, the wait queue is "run", restarting the queued request. It is restarted from the beginning, walking from the current root. If the root hash has since changed, the walk starts from there, and ultimately the newest value is returned to the caller. This is fine for get which is effectively a "get the latest value of key" request. watch is handled just like a get except the watch request is also stored in a list of active watchers that is "run" on each commit. As long as the watch request can complete its walk without stalling, this results in a watch response for every commit that changes the watched key. However, if the walk stalls, and another commit occurs in between, the watch restarts from the beginning at the new root hash and can return a newer value. The fix should could be either:
Currently a message handler called from the reactor or called from the "waitlist" abstraction has no way of knowing how it is being called. If there were a way to "annotate" the request message with this stored walk context and retrieve it later without breaking the "constness" of the message argument, this would be easy to implement. Another possibility is to rework the coproc support to work in conjunction with a genericized wait queue abstraction. Any reactor handler could call |
Is this still an issue? In any event, moving off to 0.10.0 |
I think this is fixed. All the way back in #1066, I refactored walks/lookups to no longer start at the beginning. It starts at whereever it last was in the walk. So this was effectively fixed via the suggested:
up above. I'm going to close this. Edit: For documentation purposes, the key patch is 30d8a12 |
If a kvs watch request is stalled traversing the namespace to read the new value, the key could have changed between the stall and its eventual restart. The watch request will return the more recent value and the "watcher" (client calling kvs_watch) will have missed the original value.
The text was updated successfully, but these errors were encountered: