-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KVS refactor of lookup() / walk() functions #1066
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1066 +/- ##
==========================================
+ Coverage 77.81% 77.91% +0.09%
==========================================
Files 148 150 +2
Lines 25764 25965 +201
==========================================
+ Hits 20049 20231 +182
- Misses 5715 5734 +19
|
Great job @chu11 - this was complicated, messy, and critical code and now it is definitely heading in the right direction. As we discussed privately, I think clarity (and my ability to review these changes) would improve significantly if Note the work on "futures" in #1053. Possibly walk or some future more reusable incarnation could return a |
@garlick It was a little hard to splice out b/c of the dependencies (such as the Thinking about it, I think I see a way to slice things out now that might be pretty good. As you mentioned privately, |
If it can be done in the near term it would probably be good to do it in this PR. If it's a really big project though, then maybe we should consider putting this in as is. |
My idea to splice things out didn't work out. A lot of it centers on the fact that the I sort of thought this was a good "checkpoint" for a PR, but perhaps its not. My general idea was:
The order was predominantly based on the fact that w/ "state", it would be easy to pass back information to |
It ended up doing step 4 above wasn't as difficult as I thought it'd be. Perhaps the code isn't the prettiest, but it's not bad. It'll allow us to splice off the I'll issue a new PR for the |
@chu11 sorry for not providing more review, wasn't sure if I was waiting for more or not. Were you able to verify that memory isn't leaking? Another thing to check just to be sure we're not causing problems for others would be the soak test. We could get a baseline that could serve all the KVS rework activity and then run it for successive PR's. (Some past results are posted to various closed issues - e.g. #614) I'm running an errand and when I return around 2 I'll have one more spin through the code. |
@garlick Yeah, I ran a bunch of valgrind runs using the I'm presently re-building this tree to be rebased under the patches from PR #1069. With PR #1069, I can splice |
That sounds perfect, thanks.
…On May 18, 2017 12:20 PM, "Al Chu" ***@***.***> wrote:
@garlick <https://github.com/garlick> Yeah, I ran a bunch of valgrind
runs using the src/test/valgrind.sh. Also wrote an alternate cheapo
workload consisting of a collection of simple kvs operations. The leak
summary at the very end didn't change, which I think is the most important
thing to verify didn't change.
I'm presently re-building this tree to be rebased under the patches from
PR #1069 <#1069>. With PR
#1069 <#1069>, I can
splice lookup() and walk() into a new file. So hopefully this PR will be
easier to understand, as all the changes will be more isolated.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1066 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKX23olmmgaPtCWmYdMeCLJ-dlSK3Gbks5r7JoLgaJpZM4NdP5d>
.
|
Pushed an update, based on the branch from #1069 . With that branch, I can now split out the The bad thing is that with |
I think it'd be OK to continue on with the refactoring in this PR. After all it's a pretty focused topic. Wow, it's really nice having that split off. I wonder if we can do anything about the appalling number of in/out parameters in this function? That might bear some pondering. Maybe we could turn it into more of a class - for example, maybe struct lookup *lookup_create (struct cache *cache);
int lookup_set_epoch (struct lookup *l, int epoch); // call from heartbeat_cb
... seems like root, rootdir, and root_dirent should be packaged up in a struct? maybe the result (alp, load_ref, stall, ep) could also be packaged up? (I'm probably not seeing all the ramifications of what I'm suggesting, but it would be nice if that parameter list was a little easier to look at :-) Great work. |
I meant to add: unit test? |
@garlick Yeah, the appalling number of parameters is something I'm working on. In a prior play-branch I had a Unit tests were something I was going to do after the refactor of If we're going to go full steam ahead here w/ a mega PR here, should we just drop PR #1069? I think its an ok "mini-checkpoint", but it's only 4 commits. Maybe doesn't matter at this point. |
I'm OK with the mega PR approach, especially if we end up taking a few wrong turns and can squash that down in the history. If this PR is a superset of that one, and you agree with the approach, then yeah I guess that PR could be abandoned. |
Finished up something that looks near complete yesterday evening. Still need to valgrind and soak it and add unit tests. The key completion is that now I think I can splice out a few cleanup and "setup convenience functions" PRs out of this to shorten the number of commits. |
Add extra tests for invalid cache entries. Test that checks for invalid entries work and expire fails for non valid entries.
Add convenience function cache_lookup_and_get_json(), which returns a json value of a cache entry if it is found and the cache entry is deemed valid.
Create new json_util files and place the copydir() and compare_json() functions in there. Rename functions to json_object_copydir() and json_compare() respectively.
The load() function may load data from the KVS cache or it may issue a rpc to retrieve data from the broker content cache if the data is not yet locally cached. The load() function's dependency on both cache and rpc APIs makes it difficult to splice out functions that call load() into separate APIs. This patch removes the call of load() from the lookup() and walk() functions. In lookup() and walk(), data is instead looked up only in the local KVS cache. If it is not stored there locally, it is the responsibility of the original caller to then call load() to load the missing data. A in/out parameter will pass the missing reference all the way back to the original caller.
Splice out lookup() and walk() into new file lookup.[ch].
Refactor walk function logic in preparation for further refactoring. - Move parsing of terminal path component from outside of loop into loop, adjusting loop logic accordingly. - Increment depth counter on recursive step instead of begining of function.
Remove recursion from walk() function and implement code logic non-recursively. Use stack to manage "recursive" calls through links.
Refactor walk() logic to be simpler by passing in initial root dirent and not root directory. The adjusted logic is effectively changed from: load & set dir = root dir while (path components left) dirent = directory entry of path component if dirent is a link recursively resolve dirent if not last path component load & set dir = dir of dirent next path component to: set dirent = directory entry of root while (path components left) load & set dir = dir of dirent dirent = directory entry of component if dirent is a link recursively resolve dirent next path component This makes the logic simpler and removes some duplicate code.
Remove call of load() before calling lookup(), as it is no longer necessary with refactored walk() using root_dirent. Adjust parameters that need to be passed into lookup() based on this adjustment. Adjust logic of callers, as wait_create_msg_handler() can be called later.
Place lookup() parameters in a struct for passing variables in and out. This is beginning refactoring for a more thorough lookup() api.
With the lookup_handle now existing, the user no longer is required to pass root_dirent to the lookup() call. It can be calculated and maintained internally.
Cleanup code now that lookup handle exists: - Move levels & other data structures into lookup handle. - Only pass lookup handle as parameter when appropriate. - Cleanup error/stall paths.
In |
The refactoring and isolation of I think we should go ahead and get this merged. What do you think @chu11? Ready? |
@garlick Close, trying to eek out a few more performance tweaks (removed an uncommon branch check), used a switch statement on lookup "states" which I think is cleaner and might be faster too. |
Add a state variable to lookup handle so that prior actions completed by a lookup are not repeated.
Make lookup_t struct private and create set of API functions to get value results and add convenience get/set functions. Also add convenience function for testing.
With lookup now API-ized, do not exit on unexpected errors, instead return errors that are appropriate. Do not exit when walking a path and reaching a DIRVAL key type or an unexpected type. Instead return EPERM. When reading a linkval with READLINK, if the user sets READDIR, return ENOTDIR.
Do not recreate lookup data structure when rpcs are replayed due to stalls. State maintained in lookup data structure now allows rpcs to continue from the last point they ended instead of replaying from the beginning.
@garlick From the point of your last comment, converted the if-state-branch logic in |
OK I think it's good to go once travis passes. |
In this PR, we convert the KVS walk function from a recursive implementation into a non-recursive one and simplified the logic as well. The predominant logic changes are:
A) In preparation for maintaining "state" so that RPCs aren't fully replayed, the entire path of the kvs get request is pre-parsed and stored in a list.
B) The walk() function has been made non-recursive and "depth" is maintained in a stack. This is also in preparation for maintaining "state" for removal of RPC replay.
C) A variety of code logic changes to make the walk() function simpler.
The predominant logic changes that may look strange in this PR are:
A) The pre-parsing of the path makes things more inefficient for the time being b/c RPCs are still completely replayed (i.e. it is fully parsed on every RPC-then continuation).
B) There are some paths (such as through get_request_cb()) in which a load() will be called twice on the same root reference. This will be resolved when lookup() is refactored. This PR is only sticking to walk().
I still need to run through valgrind to make sure I didn't mem-leak anything. But wanted to throw the PR up here for a first skim.