-
Notifications
You must be signed in to change notification settings - Fork 122
Markus/internal cache #4619
Markus/internal cache #4619
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two more options here:
- Change the API and remove
KeySet *
fromkdbGet
andkdbSet
(also option 4 in [decisions] sequences of kdbGet and kdbSet operations #4574). If the keyset is owned by theKDB
handle, it should not be as big surprise, if there is extra data in there. I certainly wouldn't try to asset anything on the contents of aKeySet *
that I don't own directly, unless the condition is explicitly documented somwhere. - Make all the keys returned by
kdbGet
completely read-only. To change the data you need to append an entirely new key to replace the existing one. Then we just need to keep a shallow copy internally.
### MMAP Cache with parent key | ||
|
||
We make the mmap cache non-optional so that we always have a keyset of configuration data internally. | ||
From this keyset, we use `ksBelow` to return the correct keyset. | ||
|
||
**Cons:** | ||
|
||
- invalidation of OPMPHM | ||
|
||
### MMAP Cache without parent key | ||
|
||
We make the mmap cache non-optional and only use a single cache, caching everything. | ||
We remove the parent key of `kdbGet` and `kdbSet` and always return the keyset of the whole KDB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For using mmap
this much, I think we use pointers too much. I'm sure @mpranj has done benchmarks for the pointer correction code in mmapstorage
, but maybe not for huge KeySets
. If we cache everything and use it a lot, the pointer correction code might become a bottleneck.
In any case, if we go down this route, we should compare doing the pointer correction with something similar to what Flatbuffers do. AFAIK they don't use pointers and store everything as offsets. The offset is resolved into a pointer when data is accessed. So a KeySet
would always be in one large memory buffer and all keys only know the relative offset to their name. When you call keyName
that offset is resolved and you get a char *
. Would be a huge internal change, may be needed, if mmap
the entire KDB (which could be very big).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we cache everything and use it a lot, the pointer correction code might become a bottleneck.
Might, but we don't know that for a fact. I assumed the same as you did, but in my benchmarks the pointer correction was never a bottleneck.
AFAIK they don't use pointers and store everything as offsets.
We also store everything as offsets, it's just that we resolve the offsets eagerly.
Definitely a bigger task, but at least it's quite clear what to do here ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the current solution works, it's probably best to leave it. It would be interesting, if resolving the offsets only on access would do be an improvement. But it's probably too hard to benchmark. It will depend on how frequently the data is actually accessed after the mmap cache is loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will depend on how frequently the data is actually accessed after the mmap cache is loaded.
Indeed.
doc/decisions/internal_cache.md
Outdated
assert (keyName(key) == keyName(key_dup)); // stays always valid | ||
``` | ||
|
||
This is already implemented for the MMAP cache, so the implementation should be straightforward (do the same COW duplications as done for MMAP). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, the mmap data is already COW. So actually not "do the same", but more rename the MMAP flags to COW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually had a different flag in mind so that it doesn't interfere with the mmap cache. See 728fec1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I got that. What I wanted to say is that mmap
already does COW, so we can reuse the code and probably the flag. If there is some code that is only needed for mmap and not COW, we could make mmap set two flags one for mmap and one for the general COW code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this write-up! It now makes the problem much clearer for me. I spend a lot of time today to look over the mmapstorage
plugin, and some things in there would really make a lot of sense for general usage of in Elektra.
I think the In-Memory COW cache sounds brilliant, and with some modifications (storing of the original value) we can make this work for change tracking.
We also need to reach an agreement of what we do with the KeySets, as of now it seems that only Keys are COW which leads to some problems with metadata that @kodebach already pointed out.
|
||
### In-Memory COW Cache | ||
|
||
We keep a duplicated keyset in-memory and tag the keys as copy-on-write (COW). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear: As far as I understand, this whole COW only concerns Keys, and not KeySets, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unspecified, depending on the solution with meta-data.
AFAICT all the mmap options and the in-memory COW option could be used directly without modifications for change tracking. All these options result in a separate internal copy of the data originally loaded. Because of all them are COW (mmap is just a slightly different approach) when the caller changes the values, the copy inside |
Co-authored-by: Klemens Böswirth <23529132+kodebach@users.noreply.github.com> Co-authored-by: Maximilian Irlinger <maxi6594@gmail.com>
Thank you all for your comments! If there are no further questions about the problem, I would like to merge as draft and let @atmaxinger take over this decision. |
@atmaxinger please press "Merge pull request" if you think this is ready for you to take over. |
Clarifications for @atmaxinger
Basics
(added as entry in
doc/news/_preparation_next_release.md
whichcontains
_(my name)_
)Please always add something to the release notes.
(first line should have
module: short statement
syntax)close #X
, are in the commit messages.doc/news/_preparation_next_release.md
scripts/dev/reformat-all
Checklist
(not in the PR description)
Review
Labels