-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve kvs watch efficiency #810
Conversation
Current coverage is 75.00% (diff: 88.78%)@@ master #810 diff @@
==========================================
Files 145 145
Lines 24911 24992 +81
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 18666 18744 +78
- Misses 6245 6248 +3
Partials 0 0
|
This seems very nice! My very uninformed first observation is that moving The |
Thanks @grondo - that's the sort of early feedback I was looking for. I think you're probably spot-on. I wish there was a way to map watch notifications onto events, and then one would simply subscribe to the keys of interest. I suppose we could try the easy thing and generate a globally distributed event per key instead of packing the list into the setroot event. An event with no payload only takes up a few bytes... |
Would deleted keys be included in either the kvs.setroot keys list, or the global per-key events? I'm thinking of a purge of 1000 job entries under a single commit. If a basedir is unlinked from the kvs, I'm assuming that an event would only be generated for Hm, does |
An unlink would be listed once in the keys list (or have one event), and you are right, there would be no list entry/event for a key contained within an unlinked directory . I hadn't thought about that. I had only thought about the opposite, where a change to a key implies that its parent directories also change. I've been pondering the volume of events in this proposed change off and on this morning and haven't really come up with a good idea to limit them to an acceptable level while keeping the design simple. The way change notifications are distributed now is pretty focused. To answer your final question, yes |
Problem: a kvs.watch notification is not sent for every update of a watched key if two or more commits operating on the key are merged by the kvs for performance. In that case, a watcher is only notified once for the single, merged commit. This may be a problem for some use cases, for example a finite state machine may use the kvs to communicate state by commiting each change to a watched key. Watchers cannot follow the complete state if intermediate steps are lost. Solution: check commits before merging them to see if they change the same keys to different values, and abort the merge if so. This check does not apply to directories modified as a side effect of a commit; only the fully qualified keys supplied in the commit messages are checked for duplicates.
Maybe the best approach for this PR is to carry through with the original plan only "filter" the setroot events, forwarding them point to point to watchers. It will still shift the burden of looking up changed keys on to the client, but the number of client wakeups will be on the same order as before - or doubled I guess because each changed key requires a The one rpc, multiple responses pattern still bugs me. I thought I might try an idea here like the stream idea from #271, where client opens a "channel" to service in order to receive messages on. The kvs module could use this channel to forward filtered setroot events point to point. Possibly this idea could later be generalized to allow routing of any type of message point to point through the TBON (like a tunnel). |
Who would filter setroot events? The kvs module on each rank? So client wouldn't have to subscribe to all |
Exactly. |
The kvs watch-mt test starts N threads watching the same key, then modifies the key monotonically M times. Now that watch is guaranteed to get a callback on every key change, this test can be tightened up to verify monotonicity of the watched key. Add a run of the test on rank 1 to ensure a "slave" KVS behaves the same as the "master".
Include an array of (de-duplicated) keys that were modified by the commit in the kvs.setroot event. This prepares the way for more efficient kvs_watch() handling as described in flux-framework#803. Update test for kvs.setroot codec.
Allow an optional root treeobj to accompany a kvs.get request. If present, key will be looked up relative to this directory. It is an EINVAL error if the treeobj is invalid or not a directory reference. Update test for kvs.get codec.
Add a new function kvs_getat() which is like kvs_get() but has a new "treeobj" argument that specifies the specific root (or other directory) "snapshot" that the key will be looked up in.
Exercise flux-kvs getat via t1000-kvs-basic.t sharness test.
Closing this PR and will resubmit a PR for I've opened a couple of issues #812 and #813 on some The performance issue is maybe a lower priority and could be returned to when we switch to the RFC 11 "Treeobj" metadata format. |
This PR is a speculative attempt to redesign the kvs watch machinery to make it more efficient and scalable as discussed in #803. So far:
kvs.setroot
event sends out a list of keys modified by the commitkvs_getat()
which is likekvs_get()
but with additional treeobj argument (think of it like a snapshot-relativekvs_get()
as suggested in kvsdir_* API should provide atomicity in namespace traversal #64)flux kvs getat treeobj key
and sharness testsWhat I was thinking of doing roughly, is removing the watch machinery from the kvs module entirely, and instead have the client
kvs_watch()
implementation:kvs_getat()
on the root treeobj from the event (call user callback)I haven't quite figured out how to synchronize the first key value with the event-based updates. We might need to expose the root sequence number in the dirent so we can correlate a dirent obtained from
kvs_get_treeobj (".")
with one obtained from the setroot event.There might be a bit more latency getting updates this way, since the setroot event only triggers the sequence to obtain the next item; there has to be an RTT with the local kvs module before the new value can be passed to the user callback. I think we can hide all that so it can remain asynchronous (reactor driven) though, e.g. setroot event handler sends rpc and registers a completion, then completion calls user callback.
Work in progress, but thought I'd post for early feedback.