libkz: avoid blocking RPCs in reactor callbacks #1411

garlick · 2018-04-02T17:28:25Z

Here are some changes libkz to defer the kz_ready_f callback until data is actually available, so it doesn't have to be fetched with a synchronous RPC from the callback. I hope this will solve the excessive re-queue calls noted in #1406.

There is some other unrelated cleanup too - just what I needed to fix up to keep my sanity in this old code.

@grondo if you have a chance to look at the changes in the lua bindings and wrexecd I'd appreciate it. The goal there was to avoid looping on kz_get() in the user callback since only the first call will be non-blocking.

Problem: kz_destroy() can alter the errno value. Save/restore errno at the top/bottom of this function. Also avoid redundant checks for non-NULL in calls to destructors that do that internally.

Problem: keys are computed using dynamically allocated memory. Use a buffer in the kz handle to build keys, which are of a predictable size for any given kz handle.

Problem: logic for KZ_FLAGS_TRUNC is a bit dubious, and it appears that truncation should be the default behavior when opening an existing kz stream for writing anyway. Drop this flag and associated logic. Update sharness test and 'kzutil', used only for testing.

Problem: the KZ_FLAGS_NOEXIST is always used, thus the code for handling open when that flag is not supplied is never used. Drop the code and the flag and update users.

Problem: if user calls kz_set_ready_cb() with a NULL handler, the internal KVS watch callback is not disabled. Add logic to unwatch the kz directory if the user registers a NULL handler.

Problem: internal KVS watch callback makes synchronous RPC calls. Have the watch callback start a KVS lookup. Call the user's kz_ready_f handler in the KVS lookup continuation.

coveralls · 2018-04-02T18:00:27Z

Coverage increased (+0.009%) to 78.911% when pulling 0371415 on garlick:libkz_async into 7f04a25 on flux-framework:master.

codecov-io · 2018-04-02T18:00:33Z

Codecov Report

Merging #1411 into master will increase coverage by <.01%.
The diff coverage is 78.64%.

@@            Coverage Diff            @@
##           master   #1411      +/-   ##
=========================================
+ Coverage   78.59%   78.6%   +<.01%     
=========================================
  Files         163     163              
  Lines       29995   30003       +8     
=========================================
+ Hits        23575   23583       +8     
  Misses       6420    6420

Impacted Files	Coverage Δ
src/bindings/lua/flux-lua.c	`81.27% <100%> (-0.77%)`	⬇️
src/modules/wreck/wrexecd.c	`75.68% <66.66%> (-0.09%)`	⬇️
src/common/libkz/kz.c	`70.53% <76.66%> (-2.29%)`	⬇️
src/common/libflux/message.c	`81.36% <0%> (-0.36%)`	⬇️
src/common/libflux/reactor.c	`93.73% <0%> (+0.28%)`	⬆️
src/broker/overlay.c	`74.45% <0%> (+0.31%)`	⬆️
src/common/libflux/handle.c	`84.15% <0%> (+0.49%)`	⬆️
src/common/libutil/base64.c	`95.77% <0%> (+0.7%)`	⬆️
... and 6 more

grondo · 2018-04-02T18:21:17Z

This is impressive work, @garlick! -- and a lot less change than I was expecting!

I tested this with 128 broker instance (over 4 real nodes) on ipa and it looks stable to me, running up to 1024 tasks. Unfortunately, anything higher than that hits #1407. Unsurprisingly, the I/O on this branch seems much faster (though I didn't do any actual quantitative tests).

@grondo if you have a chance to look at the changes in the lua bindings and wrexecd I'd appreciate it. The goal there was to avoid looping on kz_get() in the user callback since only the first call will be non-blocking.

Yeah, even without your changes looping on kz_get() until EAGAIN/EWOULDBLOCK was probably really bad behavior in an event loop, and should have been removed no matter the solution.

grondo · 2018-04-02T18:39:27Z

Update: Maybe we have an fd leak somewhere, setting ulimit -n 8192 seems to reduce or eliminate the incidence of #1407 (at least on this branch). Now I'm able to run up to 4096 tasks (but above that getting fork() errors -- have to also increase max user processes)

It still takes ~3m to read all the I/O via the reactor loop via kz/lua code, but the job completed in 10s (meaning all tasks exited with exit status saved in KVS, all I/O complete and committed to KVS)

$ tiime flux wreckrun -n4096 hostname
<snip 4096 lines of output>
real	3m12.563s
user	1m31.552s
sys	1m25.873s
$ flux wreck ls
    ID NTASKS STATE                    START      RUNTIME    RANKS COMMAND
     1   4096 exited     2018-04-02T11:27:06      10.086s  [0-127] hostname

However, this is so much better than the extreme runtimes reported by @trws

grondo · 2018-04-02T18:40:24Z

@garlick, if this is ready to go in , it seems like a no-brainer at this point!

garlick · 2018-04-02T18:41:33Z

Sure, thanks for the extra testing!

trws · 2018-04-02T18:46:45Z

Yeah, that sounds like a great improvement for sure! I'll test this with the splash branch on a few hundred nodes shortly.

trws · 2018-04-02T20:51:05Z

This certainly helps, but I'm sorry to say that it doesn't seem as though this fixed it completely. Without this patch, a 180 node single task per node job would take minutes, now a 200 node job completes in less than two seconds. Unfortunately the 400 node job still takes 5 minutes to start, and more than 10 to run with output redirected. Without it redirected, about 3 seconds. There's some kind of degenerate case we haven't managed to squash, maybe setting up all of those watches?

garlick added 10 commits April 1, 2018 12:53

libkz: [cleanup] preserve errno in kz_destroy

0c749df

Problem: kz_destroy() can alter the errno value. Save/restore errno at the top/bottom of this function. Also avoid redundant checks for non-NULL in calls to destructors that do that internally.

libkz: [cleanup] drop unused kz->stream member

ac3d313

libkz: [cleanup] avoid excessive malloc for keys

e02a19d

Problem: keys are computed using dynamically allocated memory. Use a buffer in the kz handle to build keys, which are of a predictable size for any given kz handle.

libkz: drop unnecessary KZ_FLAGS_NOEXIST flag

f1b1985

Problem: the KZ_FLAGS_NOEXIST is always used, thus the code for handling open when that flag is not supplied is never used. Drop the code and the flag and update users.

libkz: if ready_cb is NULL, disable KVS watcher

d1cce5d

Problem: if user calls kz_set_ready_cb() with a NULL handler, the internal KVS watch callback is not disabled. Add logic to unwatch the kz directory if the user registers a NULL handler.

libkz: avoid synchronous RPC in watch callback

250b2ac

Problem: internal KVS watch callback makes synchronous RPC calls. Have the watch callback start a KVS lookup. Call the user's kz_ready_f handler in the KVS lookup continuation.

bindings/lua: call kz_get() just once in ready callback

8d8720c

wrexecd: call kz_get() just once in ready callback

8fd0ac3

t/kzutil: call kz_get() just once in ready callback

0371415

grondo mentioned this pull request Apr 2, 2018

lua: src/zuuid.c:82: zuuid_new: Assertion `0' failed. #1407

Closed

grondo merged commit 0d32cd3 into flux-framework:master Apr 2, 2018

garlick deleted the libkz_async branch April 2, 2018 20:04

This was referenced Apr 2, 2018

large job launch with output redirect does not work [well] #1406

Closed

[splash] wreck loses stdio output on large jobs #1419

Closed

[splash] redirected output empty, data in kvs but never pushed to file #1426

Closed

grondo mentioned this pull request May 10, 2018

0.9.0 Release #1479

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libkz: avoid blocking RPCs in reactor callbacks #1411

libkz: avoid blocking RPCs in reactor callbacks #1411

garlick commented Apr 2, 2018

coveralls commented Apr 2, 2018

codecov-io commented Apr 2, 2018

grondo commented Apr 2, 2018

grondo commented Apr 2, 2018

grondo commented Apr 2, 2018

garlick commented Apr 2, 2018

trws commented Apr 2, 2018

trws commented Apr 2, 2018

libkz: avoid blocking RPCs in reactor callbacks #1411

libkz: avoid blocking RPCs in reactor callbacks #1411

Conversation

garlick commented Apr 2, 2018

coveralls commented Apr 2, 2018

codecov-io commented Apr 2, 2018

Codecov Report

grondo commented Apr 2, 2018

grondo commented Apr 2, 2018

grondo commented Apr 2, 2018

garlick commented Apr 2, 2018

trws commented Apr 2, 2018

trws commented Apr 2, 2018