-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libkz: avoid blocking RPCs in reactor callbacks #1411
Conversation
Problem: kz_destroy() can alter the errno value. Save/restore errno at the top/bottom of this function. Also avoid redundant checks for non-NULL in calls to destructors that do that internally.
Problem: keys are computed using dynamically allocated memory. Use a buffer in the kz handle to build keys, which are of a predictable size for any given kz handle.
Problem: logic for KZ_FLAGS_TRUNC is a bit dubious, and it appears that truncation should be the default behavior when opening an existing kz stream for writing anyway. Drop this flag and associated logic. Update sharness test and 'kzutil', used only for testing.
Problem: the KZ_FLAGS_NOEXIST is always used, thus the code for handling open when that flag is not supplied is never used. Drop the code and the flag and update users.
Problem: if user calls kz_set_ready_cb() with a NULL handler, the internal KVS watch callback is not disabled. Add logic to unwatch the kz directory if the user registers a NULL handler.
Problem: internal KVS watch callback makes synchronous RPC calls. Have the watch callback start a KVS lookup. Call the user's kz_ready_f handler in the KVS lookup continuation.
Codecov Report
@@ Coverage Diff @@
## master #1411 +/- ##
=========================================
+ Coverage 78.59% 78.6% +<.01%
=========================================
Files 163 163
Lines 29995 30003 +8
=========================================
+ Hits 23575 23583 +8
Misses 6420 6420
|
This is impressive work, @garlick! -- and a lot less change than I was expecting! I tested this with 128 broker instance (over 4 real nodes) on ipa and it looks stable to me, running up to 1024 tasks. Unfortunately, anything higher than that hits #1407. Unsurprisingly, the I/O on this branch seems much faster (though I didn't do any actual quantitative tests).
Yeah, even without your changes looping on |
Update: Maybe we have an fd leak somewhere, setting It still takes ~3m to read all the I/O via the reactor loop via kz/lua code, but the job completed in 10s (meaning all tasks exited with exit status saved in KVS, all I/O complete and committed to KVS)
However, this is so much better than the extreme runtimes reported by @trws |
@garlick, if this is ready to go in , it seems like a no-brainer at this point! |
Sure, thanks for the extra testing! |
Yeah, that sounds like a great improvement for sure! I'll test this with the splash branch on a few hundred nodes shortly. |
This certainly helps, but I'm sorry to say that it doesn't seem as though this fixed it completely. Without this patch, a 180 node single task per node job would take minutes, now a 200 node job completes in less than two seconds. Unfortunately the 400 node job still takes 5 minutes to start, and more than 10 to run with output redirected. Without it redirected, about 3 seconds. There's some kind of degenerate case we haven't managed to squash, maybe setting up all of those watches? |
Here are some changes libkz to defer the
kz_ready_f
callback until data is actually available, so it doesn't have to be fetched with a synchronous RPC from the callback. I hope this will solve the excessive re-queue calls noted in #1406.There is some other unrelated cleanup too - just what I needed to fix up to keep my sanity in this old code.
@grondo if you have a chance to look at the changes in the lua bindings and wrexecd I'd appreciate it. The goal there was to avoid looping on
kz_get()
in the user callback since only the first call will be non-blocking.