-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvs: refactor kvs cache to handle raw data as "primary" data #1246
Conversation
Remove cache entry "type" parameter / concept from external facing API. It is still used internally for several purposes. Adjust / remove many cache unit tests as a result.
This is a high value cleanup! I will try provide some review commands before monday. |
Codecov Report
@@ Coverage Diff @@
## master #1246 +/- ##
==========================================
- Coverage 77.99% 77.93% -0.06%
==========================================
Files 154 154
Lines 28964 28902 -62
==========================================
- Hits 22590 22525 -65
- Misses 6374 6377 +3
|
Just missed having coverage on the diff. Looking through coverage diff, I don't think I can eek out a few more lines. It's all bad error path scenarios. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a few comments inline.
src/modules/kvs/cache.c
Outdated
@@ -59,8 +59,9 @@ struct cache_entry { | |||
waitqueue_t *waitlist_valid; | |||
void *data; /* value object/data */ | |||
int len; | |||
bool data_valid; /* flag indicating if data set, don't use | |||
* data == NULL as test */ | |||
bool entry_valid; /* flag indicating if data set, don't use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: rename to valid
and use same declaration style as dirty
(either bool or bitfield is fine with me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed that this should be consistent, however I'll make a new issue and clean up in another PR. Some functions (such as cache_entry_clear_dirty()
) return the current setting of the dirty bit as a 1 or 0. This should be a bool, not a 1 or 0. It'll be a bit more cleanup than I'd like in this PR.
src/modules/kvs/cache.c
Outdated
else if (hp->type == CACHE_DATA_TYPE_RAW) | ||
free (hp->data); | ||
} | ||
if (hp->o) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test for null is redundant. both free()
and json_decref()
are no-ops if argument is NULL.
src/modules/kvs/cache.c
Outdated
@@ -84,42 +84,42 @@ struct cache_entry *cache_entry_create (void) | |||
return hp; | |||
} | |||
|
|||
struct cache_entry *cache_entry_create_json (json_t *o) | |||
struct cache_entry *cache_entry_create_raw (void *data, int len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would appear this function has no users?
src/modules/kvs/cache.c
Outdated
return hp; | ||
} | ||
|
||
struct cache_entry *cache_entry_create_raw (void *data, int len) | ||
struct cache_entry *cache_entry_create_json (json_t *o) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would appear this function has only one user in setroot_event_cb()
.
suggest changing that user to cache_entry_create()
+ cache_entry_set_json()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a good idea. Was honestly surprised the cache_entry_create_raw()
and cache_entry_create_json
functions had almost no use.
src/modules/kvs/cache.c
Outdated
else if (!hp->data) { | ||
/* attempt to change already valid cache entry, | ||
* cannot, must call cache_entry_clear_data() */ | ||
errno = EBADE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function doesn't set errno in all the -1 return cases (e.g. if args are invalid)
the second test for !hp->data is redundant with the first leg of the conditional.
maybe it would be better to change the hp->entry_valid leg of the outer conditional to simply
if (hp->entry_valid) {
if (hp->o)
json_decref (o); // already stored
else
hp->o = o;
assert (hp->len > 0 && hp->data != NULL); // o/w would not be valid json
} else ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh, why did I add the redundant check :-)
let me think about how to handle the logic here, as I seemed to have copied some logic from the set_raw()
function, but perhaps it shouldn't have been copied over in that way.
src/modules/kvs/cache.c
Outdated
json_decref (o); | ||
else if (!hp->data) { | ||
/* attempt to change already valid cache entry, | ||
* cannot, must call cache_entry_clear_data() */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache_entry_clear_data()
doesn't appear to have any users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah created this function as a "just in case", knowing there were no real use cases at the time. Perhaps would be wise to remove. If anything, users can cache_remove_entry
then recreate if they are in dire straits to put new data in the cache entry.
just re-pushed with some fixes. Removed Changed the logic in Removed unnecessary checks for NULL in Will do the cleanup regarding consistent dirty bit and valid bit in a separate PR (#1247). |
coverage diff fell from about 77% to about 71%, I believe mostly due to high coverage functions that were removed. |
IF you're feeling OK with this, I'd say squash it down and we can merge it. |
Internally refactor KVS cache code to store json object and raw data memory in separate variables instead of one.
Make 'raw' KVS cache functions the primary function over 'json' functions by moving them to be the first function listed over the 'json' equivalent. There is no functional change in this commit, only the movement and re-ordering of code.
Refactor cache API to make all cache entries store raw data instead of raw or json data (but not both). All json functions are now convenience functions operating under the assumption of raw data underneath. For example, cache_entry_create_json() and cache_entry_set_json() are convenience functions that take a json object and extract the raw string out of it. The raw string is now the primary data of the cache entry. The cache_entry_get_json() function is a convenience function that returns the json object equivalent of the raw data string stored within a cache entry. To avoid regularly encoding/decoding raw data into/from json objects, a json object is cached in the cache entry. Internally, the cache_data_type_t is no longer needed and has been removed. Update and add unit tests appropriately.
Update internal lookup API for KVS cache changes. Adjust callback lookup_ref_f to not pass back raw_data flag, as it is no longer relevant. Adjust several log error messages to make more sense given changes. Update tests appropriately. Most notably, valref's that previously had a (at the time invalid) dirref reference within it, will now pass. Valref blobrefs can now to point to anything. If user wishes to point to a random treeobj, they can.
Remove cache type usage in internal commit API. Take advantage of fact all cache entries have raw data. Adjust unit tests appropriately for loss of cache types.
Handle the fact that KVS cache now automatically converts raw data to json and json to raw data, so that the conversion no longer needs to be done at this level. Also adjust to removal of cache entry types. Fixes flux-framework#1239
Most notably, valrefs that previously had a (at the time invalid) dirref reference within it, will now pass. Valref blobrefs can now point to anything.
Function was unused except in tests, so remove it. Remove / adjust unit tests appropriately.
Remove cache_entry_create_raw(), which was unused except in tests. Remove cache_entry_create_json(), which was unused except for only 1 location. Replace calls to these functions with cache_entry_create() and call to either cache_entry_set_raw() or cache_entry_set_json(). Adjust unit tests appropriately.
In cache_entry_set_raw() and cache_entry_set_json(), set errno = EINVAL when input is invalid. Update unit tests appropriately.
squashed and re-pushed |
This PR refactors the KVS cache to favor the storage of raw data over json objects. This accomplishes several goals.
before this PR, KVS cache entries had a data "type" associated with it. This can lead to a "race" in which how the data was loaded into the KVS cache the first time and could lead to errors/bugs later. For example, if data was incorrectly loaded into the KVS cache as type json, a later attempt to read the data out of the KVS cache as raw data would fail, leading to an error.
This should clean up the code and hopefully make it less confusing. Many unit tests were removed in this PR, so I feel the less confusing part was accomplished.
The primary idea behind this refactor to remove the "type" system with KVS cache entries and make the KVS cache primarily for storing raw data.
Users can set get/set json objects in the KVS cache, but the API is simply a set of convenience functions converting those json objects to/from their raw string form.
As an aside, I am sometimes anal when it comes to code "ordering". In commit chu11@368f858 I literally just move "json" code below "raw data" code, b/c I want "raw data" code to be listed first as it is now the "primary" way the KVS cache works. I know its more code change than may be necessary.