-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KVS commit refactor #1105
KVS commit refactor #1105
Conversation
Code between relayfence_request_cb and fence_request_cb is nearly identical, but uses different code style, formatting, and logging patterns. Make them the same.
Remove 'ref' from call to content_store_request_send, in which ref isn't used.
Remove flux_msg_destroy() call in fence_finalize(), as the messages will be deleted shortly thereafter when the fence_t data structure is destroyed. Adjust surrounding code appropriately.
Move variable into the only block it is used.
Looks like clang is sad here:
while the gcc build is failing the valgrind test |
Rename fence_append_request to fence_add_request_copy for clarity.
Rename to fence_add_request_data() for clarity.
Include count increment in fence_append_ops(), which indicates number of fence counts occuring so far.
Create new function fence_process_fence_request(), which wraps up counting checks to see if a fence is ready for committing.
Splice out portion of code into new function fence_merge().
Split fence_t into fence_t and commit_t structs. Add new commit_create() and commit_destroy() functions. Adjust fence_create() function.
Rename commit_apply_fence() to commit_apply(), as former no longer makes sense given fence_t struct split.
Move fence_t into a new file. Move fence_create(), fence_destroy(), fence_add_request_data(), fence_add_request_copy(), fence_merge() into new file too.
Add fence_count_reached(), fence_get_flags(), fence_set_flags(), fence_get_json_ops(), and fence_get_json_names() to fence files to abstract away fence_t.
Add new function fence_iter_request_copies().
6d374f9
to
e5b5583
Compare
Codecov Report
@@ Coverage Diff @@
## master #1105 +/- ##
==========================================
+ Coverage 78.13% 78.22% +0.08%
==========================================
Files 155 157 +2
Lines 25959 26132 +173
==========================================
+ Hits 20284 20442 +158
- Misses 5675 5690 +15
|
Re-pushed with a variety of mini-fixes. |
ran some soak tests out of /tmp on catalyst
master mean .12294 0.5% slower Probably within range of what one would have expected given a new data structure allocation and minor API function calling overhead. Or just within range of randomness of runs. I'll see if I can get the number down a bit, but minimally it's well within range of what it was before. |
Using new fence API function, refactor fence_finalize_bynames and rename to finalize_fences_bynames for clarity.
Hide fence_t internally within fence api.
Now that fence is now abstracted, use zlist free function instead of destroying each element on a list manually in fence_destroy().
Move href_t definition into new types.h file.
Instead of calling store(), which will store to cache and send rpcs to the content store, instead only store to cache and return a list of entries which need to be sent to content store. Caller to commit_unroll() is responsible for sending data to content store and waiting on dirty entries. Remove store(), as it is no longer used.
Splice large chunks of code into commit_process(), which handles most processing, return to commit_apply() when data must be loaded/stored/waited.
Cleanup code by adding a state variable to commit_t, which more logically explains progression of code instead of checks for flags and variables being NULL/non-NULL.
Add enum of potential commit_process return types, to make function more clear.
Move commit_t, commit_create(), and commit_destroy() into its own files. Slight adjustment, use aux variable to handle ctx.
Move several kvs_ctx_t fields into a commit_mgr_t type. Create new commit_mgr_create() and commit_mgr_destroy() functions.
Move fence_add, fence_lookup, fence_process_fence_request, and commit_merge_all into commit mgr API. Adjust API and functions accordingly. As a result, some functions can be made static as they are no longer called except inside commit_mgr API.
Place variables into commit_mgr_t and commit_t to begin abstracting away kvs_ctx_t from various functions.
With recent changes to commit_t and commit_mgr_t, no longer pass around kvs_ctx_t to various functions such as store_cache(), commit_unroll(), commit_link_dirent(), and commit_process(). As noop_stores is now in commit_mgr_t, eliminate the stats_t struct and adjust appropriately in stat get/clear callbacks.
Add various functions to get/set/deal with commit_t and commit_mgr_t internals, to abstract away commit internal details from callers.
Add iteration callback functions to commit API to access internal lists. As consequence, eliminate store_content_store().
Move commit_process(), store_cache(), commit_unroll(), and commit_link_dirent() into commit API. Latter 3 can be made static and hidden from caller.
Hide commit_t and commit_mgr_t from callers.
Add state checks to some commit API functions.
Now that the commit functions have been API-ized, we can hide some internals for optimization. Collapse the missing_refs and dirty_cache_entries lists into one, to remove an extra zlist create and destroy.
Wow, nice cleanup. I've been through this at the 10,000 foot level and it seems good. Thanks for the extra paranoia of running the soak test and analyzing the results. I wouldn't worry about the small performance hit - it's close enough that I doubt you've introduced any new stalls or other significant problems that will balloon up at scale. The new unit tests are great addition! The main thing here IMHO is that you are feeling like the code is getting structured in such a way that you can work on it with confidence. The next steps of converting to jansson and the treeobj metadata will be much easier with your help and with the new unit tests. Anyway, awesome, and when you're done fine tuning it, let me know and I think this could go in. |
Now that the commit functions have been API-ized, replace some asserts with some logical checks instead and return appropriate values to the user.
e5b5583
to
0e420ff
Compare
A few tweaks, two lists in commit_t collapsed into one, removing an unnecessary branch check, a few minor cleanups/tweaks, a few more API-niceties (sp?), a few more tests for coverage. It's closer to even now, with some normal randomness. I've gotten anywhere from 0.19% to 0.6% slower on this kvs refactor branch. I just saw this when running soak with 1000 jobs on hype2 (which is empty of users and has less randomness). master branch
kvs refactor branch
the kvs refactor branch was actually a tad faster for this particular run. That's probably not going to be normal, but it's clearly very close. |
Great! Ready to go in? |
Yup, it's good to go. |
This is a refactor of the "commit" side of the kvs.
There is only really one "epiphany" in this refactor. The original
fence_t
data structure was split into afence_t
and acommit_t
data structure. By doing this, there is now a clear delineation between an actual "fence" that occurs when callers send RPCs and the data structures used behind the scene to actually commit data to the content store. IMO, splitting that up eliminated a lot of the monolithic-ness. Took me awhile to realize this.After that, there isn't really anything magical per se, just the typical lineage of refactoring 1 tiny thing at a time: code cleanup, api change, move functions into a new file, create an API, etc. There is now a "fence" API and a "commit" API in new files that each have their own unit tests. The "commit" API has a "commit manager" that actually wraps around
commit_t
.One interesting refactor of note is the function
commit_process()
which processes a commit and does all the commit mods and unrolling. It may return missing references to the user to load, or pointers to cache entries that are dirty and need to be backed up. By removing RPCs out of the commit logic, this allows the commit API to be separately unit-testable.Still need to valgrind test & soak test. The introduction of the
commit_t
data structure adds some additional malloc() calls per fence. Hopefully this doesn't have a noticeable impact on performance. There are few error cases that are more optimized now which will help balance some things out.