improve kvs watch efficiency #810

garlick · 2016-09-15T01:51:50Z

This PR is a speculative attempt to redesign the kvs watch machinery to make it more efficient and scalable as discussed in #803. So far:

the kvs.setroot event sends out a list of keys modified by the commit
new function kvs_getat() which is like kvs_get() but with additional treeobj argument (think of it like a snapshot-relative kvs_get() as suggested in kvsdir_* API should provide atomicity in namespace traversal #64)
command line flux kvs getat treeobj key and sharness tests

What I was thinking of doing roughly, is removing the watch machinery from the kvs module entirely, and instead have the client kvs_watch() implementation:

subscribe to kvs.setroot events
get current value of key (call user callback)
as each event comes in, if watched key is in the list, call kvs_getat() on the root treeobj from the event (call user callback)

I haven't quite figured out how to synchronize the first key value with the event-based updates. We might need to expose the root sequence number in the dirent so we can correlate a dirent obtained from kvs_get_treeobj (".") with one obtained from the setroot event.

There might be a bit more latency getting updates this way, since the setroot event only triggers the sequence to obtain the next item; there has to be an RTT with the local kvs module before the new value can be passed to the user callback. I think we can hide all that so it can remain asynchronous (reactor driven) though, e.g. setroot event handler sends rpc and registers a completion, then completion calls user callback.

Work in progress, but thought I'd post for early feedback.

codecov-io · 2016-09-15T02:03:40Z

Current coverage is 75.00% (diff: 88.78%)

Merging #810 into master will increase coverage by 0.06%

@@             master       #810   diff @@
==========================================
  Files           145        145          
  Lines         24911      24992    +81   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          18666      18744    +78   
- Misses         6245       6248     +3   
  Partials          0          0

Diff Coverage	File Path
•••••••• 82%	src/modules/kvs/kvs.c
••••••••• 90%	src/cmd/flux-kvs.c
••••••••• 92%	src/modules/kvs/libkvs.c
•••••••••• 100%	src/modules/kvs/proto.c
•••••••••• 100%	src/modules/kvs/json_dirent.c

Powered by Codecov. Last update b532e12...a1620c3

coveralls · 2016-09-15T02:04:27Z

Coverage increased (+0.1%) to 75.372% when pulling d1913e6 on garlick:kvs_watch into b532e12 on flux-framework:master.

grondo · 2016-09-15T03:32:27Z

This seems very nice!

My very uninformed first observation is that moving kvs_watch into the handle implementation will require all handles using kvs_watch to receive and parse every kvs.setroot event, searching for one key (or a glob) in the event's keys array (if I've understood correctly). This seems like a win for handles that watch many keys, but handles (modules, api users) that only watch a single key will see much more activity (?) (i.e. wakeup for every kvs.setroot)

The kvs_getat() interface is really cool! I can't even conceive of all the things this could be used for...

garlick · 2016-09-15T04:29:53Z

Thanks @grondo - that's the sort of early feedback I was looking for. I think you're probably spot-on.

I wish there was a way to map watch notifications onto events, and then one would simply subscribe to the keys of interest.

I suppose we could try the easy thing and generate a globally distributed event per key instead of packing the list into the setroot event. An event with no payload only takes up a few bytes...

grondo · 2016-09-15T16:58:20Z

Would deleted keys be included in either the kvs.setroot keys list, or the global per-key events? I'm thinking of a purge of 1000 job entries under a single commit. If a basedir is unlinked from the kvs, I'm assuming that an event would only be generated for basedir., and not every path under basedir., and therefore a watcher for basedir.a.b.c key would have to interpolate that the key of interest is deleted and invoke callbacks? I assume this would require annotation in the event that key was deleted (so the watcher would know to match basedir of keys not literal key path itself) (I could be missing something obvious though)

Hm, does kvs_watch callback fire in the current code if a key is deleted via unlink of a parent directory?

garlick · 2016-09-15T17:17:56Z

An unlink would be listed once in the keys list (or have one event), and you are right, there would be no list entry/event for a key contained within an unlinked directory . I hadn't thought about that. I had only thought about the opposite, where a change to a key implies that its parent directories also change.

I've been pondering the volume of events in this proposed change off and on this morning and haven't really come up with a good idea to limit them to an acceptable level while keeping the design simple. The way change notifications are distributed now is pretty focused.

To answer your final question, yes kvs_watch callbacks should fire if its parent is unlinked now. The way watched keys are monitored now is the kvs module (on every rank) executes a full get after every commit for every watched key and compares the result with the previous one.

Problem: a kvs.watch notification is not sent for every update of a watched key if two or more commits operating on the key are merged by the kvs for performance. In that case, a watcher is only notified once for the single, merged commit. This may be a problem for some use cases, for example a finite state machine may use the kvs to communicate state by commiting each change to a watched key. Watchers cannot follow the complete state if intermediate steps are lost. Solution: check commits before merging them to see if they change the same keys to different values, and abort the merge if so. This check does not apply to directories modified as a side effect of a commit; only the fully qualified keys supplied in the commit messages are checked for duplicates.

coveralls · 2016-09-15T23:51:57Z

Coverage increased (+0.07%) to 75.316% when pulling a1620c3 on garlick:kvs_watch into b532e12 on flux-framework:master.

garlick · 2016-09-16T17:34:46Z

Maybe the best approach for this PR is to carry through with the original plan only "filter" the setroot events, forwarding them point to point to watchers. It will still shift the burden of looking up changed keys on to the client, but the number of client wakeups will be on the same order as before - or doubled I guess because each changed key requires a kvs_getat() to pick up the new value. The overhead in the kvs module will be much reduced especially for keys that don't change often.

The one rpc, multiple responses pattern still bugs me. I thought I might try an idea here like the stream idea from #271, where client opens a "channel" to service in order to receive messages on. The kvs module could use this channel to forward filtered setroot events point to point. Possibly this idea could later be generalized to allow routing of any type of message point to point through the TBON (like a tunnel).

grondo · 2016-09-16T18:00:29Z

Who would filter setroot events? The kvs module on each rank? So client wouldn't have to subscribe to all kvs.setroot events, but it would still get one message per kvs.setroot event and have to filter keys? Is this actually the same proposal as earlier, just no global event?

garlick · 2016-09-16T18:08:53Z

Exactly.

The kvs watch-mt test starts N threads watching the same key, then modifies the key monotonically M times. Now that watch is guaranteed to get a callback on every key change, this test can be tightened up to verify monotonicity of the watched key. Add a run of the test on rank 1 to ensure a "slave" KVS behaves the same as the "master".

Include an array of (de-duplicated) keys that were modified by the commit in the kvs.setroot event. This prepares the way for more efficient kvs_watch() handling as described in flux-framework#803. Update test for kvs.setroot codec.

Allow an optional root treeobj to accompany a kvs.get request. If present, key will be looked up relative to this directory. It is an EINVAL error if the treeobj is invalid or not a directory reference. Update test for kvs.get codec.

Add a new function kvs_getat() which is like kvs_get() but has a new "treeobj" argument that specifies the specific root (or other directory) "snapshot" that the key will be looked up in.

Exercise flux-kvs getat via t1000-kvs-basic.t sharness test.

coveralls · 2016-09-19T17:17:15Z

Coverage increased (+0.06%) to 75.308% when pulling 261ee47 on garlick:kvs_watch into b532e12 on flux-framework:master.

garlick · 2016-09-21T18:13:50Z

Closing this PR and will resubmit a PR for kvs_getat().

I've opened a couple of issues #812 and #813 on some kvs_watch() correctness problems and will try to solve those in another PR.

The performance issue is maybe a lower priority and could be returned to when we switch to the RFC 11 "Treeobj" metadata format.

garlick added the review label Sep 15, 2016

garlick force-pushed the kvs_watch branch from d1913e6 to a1620c3 Compare September 15, 2016 23:28

garlick added 7 commits September 19, 2016 09:59

modules/kvs: send keys[] with setroot event

a7da268

Include an array of (de-duplicated) keys that were modified by the commit in the kvs.setroot event. This prepares the way for more efficient kvs_watch() handling as described in flux-framework#803. Update test for kvs.setroot codec.

modules/kvs: allow optional root dirent in kvs.get

147dbff

Allow an optional root treeobj to accompany a kvs.get request. If present, key will be looked up relative to this directory. It is an EINVAL error if the treeobj is invalid or not a directory reference. Update test for kvs.get codec.

modules/kvs: add kvs_getat()

60fb58e

Add a new function kvs_getat() which is like kvs_get() but has a new "treeobj" argument that specifies the specific root (or other directory) "snapshot" that the key will be looked up in.

cmd/flux-kvs: add getat subcommand

fd450c3

test/kvs: add getat tests

a2c3a83

Exercise flux-kvs getat via t1000-kvs-basic.t sharness test.

test/kvs: add unit tests for json_dirent.c

261ee47

garlick force-pushed the kvs_watch branch from a1620c3 to 261ee47 Compare September 19, 2016 17:00

garlick closed this Sep 21, 2016

garlick removed the review label Sep 21, 2016

garlick mentioned this pull request Sep 23, 2016

add functions to allow stable KVS namespace traversal #820

Closed

garlick deleted the kvs_watch branch September 6, 2017 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve kvs watch efficiency #810

improve kvs watch efficiency #810

garlick commented Sep 15, 2016

codecov-io commented Sep 15, 2016 •

edited

Loading

coveralls commented Sep 15, 2016

grondo commented Sep 15, 2016

garlick commented Sep 15, 2016

grondo commented Sep 15, 2016

garlick commented Sep 15, 2016

coveralls commented Sep 15, 2016

garlick commented Sep 16, 2016

grondo commented Sep 16, 2016

garlick commented Sep 16, 2016

coveralls commented Sep 19, 2016

garlick commented Sep 21, 2016

improve kvs watch efficiency #810

improve kvs watch efficiency #810

Conversation

garlick commented Sep 15, 2016

codecov-io commented Sep 15, 2016 • edited Loading

Current coverage is 75.00% (diff: 88.78%)

coveralls commented Sep 15, 2016

grondo commented Sep 15, 2016

garlick commented Sep 15, 2016

grondo commented Sep 15, 2016

garlick commented Sep 15, 2016

coveralls commented Sep 15, 2016

garlick commented Sep 16, 2016

grondo commented Sep 16, 2016

garlick commented Sep 16, 2016

coveralls commented Sep 19, 2016

garlick commented Sep 21, 2016

codecov-io commented Sep 15, 2016 •

edited

Loading