Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libkz: avoid blocking RPCs in reactor callbacks #1411

Merged
merged 10 commits into from
Apr 2, 2018

Conversation

garlick
Copy link
Member

@garlick garlick commented Apr 2, 2018

Here are some changes libkz to defer the kz_ready_f callback until data is actually available, so it doesn't have to be fetched with a synchronous RPC from the callback. I hope this will solve the excessive re-queue calls noted in #1406.

There is some other unrelated cleanup too - just what I needed to fix up to keep my sanity in this old code.

@grondo if you have a chance to look at the changes in the lua bindings and wrexecd I'd appreciate it. The goal there was to avoid looping on kz_get() in the user callback since only the first call will be non-blocking.

garlick added 10 commits April 1, 2018 12:53
Problem: kz_destroy() can alter the errno value.

Save/restore errno at the top/bottom of this function.

Also avoid redundant checks for non-NULL in calls
to destructors that do that internally.
Problem: keys are computed using dynamically allocated
memory.

Use a buffer in the kz handle to build keys, which are
of a predictable size for any given kz handle.
Problem: logic for KZ_FLAGS_TRUNC is a bit dubious, and
it appears that truncation should be the default behavior
when opening an existing kz stream for writing anyway.

Drop this flag and associated logic.

Update sharness test and 'kzutil', used only for testing.
Problem: the KZ_FLAGS_NOEXIST is always used, thus the
code for handling open when that flag is not supplied is
never used.

Drop the code and the flag and update users.
Problem: if user calls kz_set_ready_cb() with a NULL handler,
the internal KVS watch callback is not disabled.

Add logic to unwatch the kz directory if the user registers
a NULL handler.
Problem: internal KVS watch callback makes synchronous
RPC calls.

Have the watch callback start a KVS lookup.  Call the
user's kz_ready_f handler in the KVS lookup continuation.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.009%) to 78.911% when pulling 0371415 on garlick:libkz_async into 7f04a25 on flux-framework:master.

@codecov-io
Copy link

Codecov Report

Merging #1411 into master will increase coverage by <.01%.
The diff coverage is 78.64%.

@@            Coverage Diff            @@
##           master   #1411      +/-   ##
=========================================
+ Coverage   78.59%   78.6%   +<.01%     
=========================================
  Files         163     163              
  Lines       29995   30003       +8     
=========================================
+ Hits        23575   23583       +8     
  Misses       6420    6420
Impacted Files Coverage Δ
src/bindings/lua/flux-lua.c 81.27% <100%> (-0.77%) ⬇️
src/modules/wreck/wrexecd.c 75.68% <66.66%> (-0.09%) ⬇️
src/common/libkz/kz.c 70.53% <76.66%> (-2.29%) ⬇️
src/common/libflux/message.c 81.36% <0%> (-0.36%) ⬇️
src/common/libflux/reactor.c 93.73% <0%> (+0.28%) ⬆️
src/broker/overlay.c 74.45% <0%> (+0.31%) ⬆️
src/common/libflux/handle.c 84.15% <0%> (+0.49%) ⬆️
src/common/libutil/base64.c 95.77% <0%> (+0.7%) ⬆️
... and 6 more

@grondo
Copy link
Contributor

grondo commented Apr 2, 2018

This is impressive work, @garlick! -- and a lot less change than I was expecting!

I tested this with 128 broker instance (over 4 real nodes) on ipa and it looks stable to me, running up to 1024 tasks. Unfortunately, anything higher than that hits #1407. Unsurprisingly, the I/O on this branch seems much faster (though I didn't do any actual quantitative tests).

@grondo if you have a chance to look at the changes in the lua bindings and wrexecd I'd appreciate it. The goal there was to avoid looping on kz_get() in the user callback since only the first call will be non-blocking.

Yeah, even without your changes looping on kz_get() until EAGAIN/EWOULDBLOCK was probably really bad behavior in an event loop, and should have been removed no matter the solution.

@grondo
Copy link
Contributor

grondo commented Apr 2, 2018

Update: Maybe we have an fd leak somewhere, setting ulimit -n 8192 seems to reduce or eliminate the incidence of #1407 (at least on this branch). Now I'm able to run up to 4096 tasks (but above that getting fork() errors -- have to also increase max user processes)

It still takes ~3m to read all the I/O via the reactor loop via kz/lua code, but the job completed in 10s (meaning all tasks exited with exit status saved in KVS, all I/O complete and committed to KVS)

$ tiime flux wreckrun -n4096 hostname
<snip 4096 lines of output>
real	3m12.563s
user	1m31.552s
sys	1m25.873s
$ flux wreck ls
    ID NTASKS STATE                    START      RUNTIME    RANKS COMMAND
     1   4096 exited     2018-04-02T11:27:06      10.086s  [0-127] hostname

However, this is so much better than the extreme runtimes reported by @trws

@grondo
Copy link
Contributor

grondo commented Apr 2, 2018

@garlick, if this is ready to go in , it seems like a no-brainer at this point!

@garlick
Copy link
Member Author

garlick commented Apr 2, 2018

Sure, thanks for the extra testing!

@trws
Copy link
Member

trws commented Apr 2, 2018

Yeah, that sounds like a great improvement for sure! I'll test this with the splash branch on a few hundred nodes shortly.

@grondo grondo merged commit 0d32cd3 into flux-framework:master Apr 2, 2018
@garlick garlick deleted the libkz_async branch April 2, 2018 20:04
@trws
Copy link
Member

trws commented Apr 2, 2018

This certainly helps, but I'm sorry to say that it doesn't seem as though this fixed it completely. Without this patch, a 180 node single task per node job would take minutes, now a 200 node job completes in less than two seconds. Unfortunately the 400 node job still takes 5 minutes to start, and more than 10 to run with output redirected. Without it redirected, about 3 seconds. There's some kind of degenerate case we haven't managed to squash, maybe setting up all of those watches?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants