Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KVS commit refactor #1105

Merged
merged 49 commits into from
Jul 12, 2017
Merged

Conversation

chu11
Copy link
Member

@chu11 chu11 commented Jul 11, 2017

This is a refactor of the "commit" side of the kvs.

There is only really one "epiphany" in this refactor. The original fence_t data structure was split into a fence_t and a commit_t data structure. By doing this, there is now a clear delineation between an actual "fence" that occurs when callers send RPCs and the data structures used behind the scene to actually commit data to the content store. IMO, splitting that up eliminated a lot of the monolithic-ness. Took me awhile to realize this.

After that, there isn't really anything magical per se, just the typical lineage of refactoring 1 tiny thing at a time: code cleanup, api change, move functions into a new file, create an API, etc. There is now a "fence" API and a "commit" API in new files that each have their own unit tests. The "commit" API has a "commit manager" that actually wraps around commit_t.

One interesting refactor of note is the function commit_process() which processes a commit and does all the commit mods and unrolling. It may return missing references to the user to load, or pointers to cache entries that are dirty and need to be backed up. By removing RPCs out of the commit logic, this allows the commit API to be separately unit-testable.

Still need to valgrind test & soak test. The introduction of the commit_t data structure adds some additional malloc() calls per fence. Hopefully this doesn't have a noticeable impact on performance. There are few error cases that are more optimized now which will help balance some things out.

chu11 added 5 commits July 11, 2017 11:42
Code between relayfence_request_cb and fence_request_cb is nearly
identical, but uses different code style, formatting, and logging
patterns.  Make them the same.
Remove 'ref' from call to content_store_request_send,
in which ref isn't used.
Remove flux_msg_destroy() call in fence_finalize(), as the
messages will be deleted shortly thereafter when the fence_t
data structure is destroyed.  Adjust surrounding code appropriately.
Move variable into the only block it is used.
@chu11 chu11 requested a review from garlick July 11, 2017 18:59
@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 78.495% when pulling 6d374f9 on chu11:kvs_fence_refactor_try25 into 062ade5 on flux-framework:master.

@garlick
Copy link
Member

garlick commented Jul 11, 2017

Looks like clang is sad here:

In file included from ../../../../../src/modules/kvs/kvs.c:55:
../../../../../src/modules/kvs/types.h:3:14: error: redefinition of typedef 'href_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
typedef char href_t[BLOBREF_MAX_STRING_SIZE];
             ^
../../../../../src/modules/kvs/types.h:3:14: note: previous definition is here
typedef char href_t[BLOBREF_MAX_STRING_SIZE];
             ^

while the gcc build is failing the valgrind test

chu11 added 12 commits July 11, 2017 14:11
Rename fence_append_request to fence_add_request_copy for clarity.
Rename to fence_add_request_data() for clarity.
Include count increment in fence_append_ops(), which indicates
number of fence counts occuring so far.
Create new function fence_process_fence_request(), which wraps
up counting checks to see if a fence is ready for committing.
Splice out portion of code into new function fence_merge().
Split fence_t into fence_t and commit_t structs.  Add new
commit_create() and commit_destroy() functions.  Adjust
fence_create() function.
Rename commit_apply_fence() to commit_apply(), as former no longer
makes sense given fence_t struct split.
Move fence_t into a new file.  Move fence_create(), fence_destroy(),
fence_add_request_data(), fence_add_request_copy(), fence_merge() into
new file too.
Add fence_count_reached(), fence_get_flags(), fence_set_flags(),
fence_get_json_ops(), and fence_get_json_names() to fence files
to abstract away fence_t.
Add new function fence_iter_request_copies().
@chu11 chu11 force-pushed the kvs_fence_refactor_try25 branch from 6d374f9 to e5b5583 Compare July 11, 2017 23:49
@codecov-io
Copy link

codecov-io commented Jul 11, 2017

Codecov Report

Merging #1105 into master will increase coverage by 0.08%.
The diff coverage is 85.88%.

@@            Coverage Diff             @@
##           master    #1105      +/-   ##
==========================================
+ Coverage   78.13%   78.22%   +0.08%     
==========================================
  Files         155      157       +2     
  Lines       25959    26132     +173     
==========================================
+ Hits        20284    20442     +158     
- Misses       5675     5690      +15
Impacted Files Coverage Δ
src/modules/kvs/cache.c 93.89% <100%> (+0.24%) ⬆️
src/modules/kvs/json_util.c 93.33% <100%> (+2.42%) ⬆️
src/common/libkvs/json_dirent.c 97.72% <100%> (ø) ⬆️
src/modules/kvs/kvs.c 79.11% <80.9%> (-0.32%) ⬇️
src/modules/kvs/fence.c 85.48% <85.48%> (ø)
src/modules/kvs/commit.c 87.65% <87.65%> (ø)
src/cmd/flux-event.c 65.55% <0%> (-1.12%) ⬇️
src/common/libutil/dirwalk.c 93.33% <0%> (-0.75%) ⬇️
src/broker/overlay.c 71.32% <0%> (-0.7%) ⬇️
src/common/libutil/veb.c 98.31% <0%> (-0.57%) ⬇️
... and 12 more

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 78.548% when pulling e5b5583 on chu11:kvs_fence_refactor_try25 into 062ade5 on flux-framework:master.

@chu11
Copy link
Member Author

chu11 commented Jul 12, 2017

Re-pushed with a variety of mini-fixes. .h file header guards fixed the clang issue. Found the mem-leak and fixed that. Coverage looks pretty good. I think I can add 1-2 tiny things to add a little coverage. Will run soak tests tomorrow and hopefully will be awesome.

@chu11
Copy link
Member Author

chu11 commented Jul 12, 2017

ran some soak tests out of /tmp on catalyst

master

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1200  0.1200  0.1200  0.1236  0.1300  0.1400 
 0.1100  0.1200  0.1200  0.1218  0.1200  0.1400 
 0.1200  0.1200  0.1200  0.1228  0.1300  0.1300 
 0.1100  0.1200  0.1200  0.1243  0.1300  0.1700 
 0.1100  0.1200  0.1200  0.1222  0.1300  0.1300 
kvs refactor branch

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1200  0.1200  0.1200  0.1239  0.1300  0.1300 
 0.1200  0.1200  0.1200  0.1243  0.1300  0.1300 
 0.1200  0.1200  0.1200  0.1238  0.1300  0.1400 
 0.1200  0.1200  0.1200  0.1223  0.1200  0.1300 
 0.1200  0.1200  0.1200  0.1235  0.1300  0.1300 

master mean .12294
kvs refactor branch mean .12356

0.5% slower

Probably within range of what one would have expected given a new data structure allocation and minor API function calling overhead. Or just within range of randomness of runs. I'll see if I can get the number down a bit, but minimally it's well within range of what it was before.

chu11 added 5 commits July 12, 2017 08:36
Using new fence API function, refactor fence_finalize_bynames
and rename to finalize_fences_bynames for clarity.
Hide fence_t internally within fence api.
Now that fence is now abstracted, use zlist free function instead of
destroying each element on a list manually in fence_destroy().
Move href_t definition into new types.h file.
chu11 added 16 commits July 12, 2017 08:42
Instead of calling store(), which will store to cache and
send rpcs to the content store, instead only store to cache
and return a list of entries which need to be sent to content
store.  Caller to commit_unroll() is responsible for sending
data to content store and waiting on dirty entries.

Remove store(), as it is no longer used.
Splice large chunks of code into commit_process(), which
handles most processing, return to commit_apply() when
data must be loaded/stored/waited.
Cleanup code by adding a state variable to commit_t, which
more logically explains progression of code instead of checks
for flags and variables being NULL/non-NULL.
Add enum of potential commit_process return types, to
make function more clear.
Move commit_t, commit_create(), and commit_destroy() into its own
files.  Slight adjustment, use aux variable to handle ctx.
Move several kvs_ctx_t fields into a commit_mgr_t type.  Create
new commit_mgr_create() and commit_mgr_destroy() functions.
Move fence_add, fence_lookup, fence_process_fence_request, and
commit_merge_all into commit mgr API.  Adjust API and functions
accordingly.  As a result, some functions can be made static as
they are no longer called except inside commit_mgr API.
Place variables into commit_mgr_t and commit_t to begin abstracting
away kvs_ctx_t from various functions.
With recent changes to commit_t and commit_mgr_t, no longer pass
around kvs_ctx_t to various functions such as store_cache(),
commit_unroll(), commit_link_dirent(), and commit_process().

As noop_stores is now in commit_mgr_t, eliminate the stats_t struct
and adjust appropriately in stat get/clear callbacks.
Add various functions to get/set/deal with commit_t and commit_mgr_t
internals, to abstract away commit internal details from callers.
Add iteration callback functions to commit API to access internal
lists.  As consequence, eliminate store_content_store().
Move commit_process(), store_cache(), commit_unroll(), and
commit_link_dirent() into commit API.  Latter 3 can be made
static and hidden from caller.
Hide commit_t and commit_mgr_t from callers.
Add state checks to some commit API functions.
Now that the commit functions have been API-ized, we can hide
some internals for optimization.  Collapse the missing_refs and
dirty_cache_entries lists into one, to remove an extra zlist create
and destroy.
@garlick
Copy link
Member

garlick commented Jul 12, 2017

Wow, nice cleanup. I've been through this at the 10,000 foot level and it seems good.

Thanks for the extra paranoia of running the soak test and analyzing the results. I wouldn't worry about the small performance hit - it's close enough that I doubt you've introduced any new stalls or other significant problems that will balloon up at scale.

The new unit tests are great addition!

The main thing here IMHO is that you are feeling like the code is getting structured in such a way that you can work on it with confidence. The next steps of converting to jansson and the treeobj metadata will be much easier with your help and with the new unit tests.

Anyway, awesome, and when you're done fine tuning it, let me know and I think this could go in.

chu11 added 3 commits July 12, 2017 13:32
Now that the commit functions have been API-ized, replace some
asserts with some logical checks instead and return appropriate
values to the user.
@chu11 chu11 force-pushed the kvs_fence_refactor_try25 branch from e5b5583 to 0e420ff Compare July 12, 2017 22:18
@chu11
Copy link
Member Author

chu11 commented Jul 12, 2017

A few tweaks, two lists in commit_t collapsed into one, removing an unnecessary branch check, a few minor cleanups/tweaks, a few more API-niceties (sp?), a few more tests for coverage.

It's closer to even now, with some normal randomness. I've gotten anywhere from 0.19% to 0.6% slower on this kvs refactor branch.

I just saw this when running soak with 1000 jobs on hype2 (which is empty of users and has less randomness).

master branch

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  0.100   0.110   0.110   0.111   0.110   0.160

kvs refactor branch

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 0.1000  0.1100  0.1100  0.1106  0.1100  0.1500

the kvs refactor branch was actually a tad faster for this particular run. That's probably not going to be normal, but it's clearly very close.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.09%) to 78.486% when pulling 0e420ff on chu11:kvs_fence_refactor_try25 into 062ade5 on flux-framework:master.

@garlick
Copy link
Member

garlick commented Jul 12, 2017

Great! Ready to go in?

@chu11
Copy link
Member Author

chu11 commented Jul 12, 2017

Yup, it's good to go.

@garlick garlick merged commit afcbd94 into flux-framework:master Jul 12, 2017
@chu11 chu11 mentioned this pull request Jul 26, 2017
@grondo grondo mentioned this pull request Aug 23, 2017
@chu11 chu11 deleted the kvs_fence_refactor_try25 branch June 5, 2021 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants