Markus/internal cache #4619

markus2330 · 2022-10-27T06:31:16Z

Clarifications for @atmaxinger

Basics

Short descriptions of your changes are in the release notes
(added as entry in doc/news/_preparation_next_release.md which
contains _(my name)_)
Please always add something to the release notes.
Details of what you changed are in commit messages
(first line should have module: short statement syntax)
References to issues, e.g. close #X, are in the commit messages.
The buildservers are happy. If not, fix in this order:
- add a line in doc/news/_preparation_next_release.md
- reformat the code with scripts/dev/reformat-all
- make all unit tests pass
- fix all memleaks
The PR is rebased with current master.

Checklist

I added unit tests for my code
I fully described what my PR does in the documentation
(not in the PR description)
I fixed all affected documentation (see Documentation Guidelines)
I added code comments, logging, and assertions as appropriate (see Coding Guidelines)
I updated all meta data (e.g. README.md of plugins and METADATA.ini)
I mentioned every code not directly written by me in reuse syntax

Review

Documentation is introductory, concise, good to read and describes everything what the PR does
Examples are well chosen and understandable
Code is conforming to our Coding Guidelines
APIs are conforming to our Design Guidelines
Code is consistent to our Design Decisions

Labels

Add the "work in progress" label if you do not want the PR to be reviewed yet.
Add the "ready to merge" label if the basics are fulfilled and no further pushes are planned by you.

kodebach

There are two more options here:

Change the API and remove KeySet * from kdbGet and kdbSet (also option 4 in [decisions] sequences of kdbGet and kdbSet operations #4574). If the keyset is owned by the KDB handle, it should not be as big surprise, if there is extra data in there. I certainly wouldn't try to asset anything on the contents of a KeySet * that I don't own directly, unless the condition is explicitly documented somwhere.
Make all the keys returned by kdbGet completely read-only. To change the data you need to append an entirely new key to replace the existing one. Then we just need to keep a shallow copy internally.

doc/decisions/internal_cache.md

kodebach · 2022-10-27T11:33:32Z

doc/decisions/internal_cache.md

+### MMAP Cache with parent key
+
+We make the mmap cache non-optional so that we always have a keyset of configuration data internally.
+From this keyset, we use `ksBelow` to return the correct keyset.
+
+**Cons:**
+
+- invalidation of OPMPHM
+
+### MMAP Cache without parent key
+
+We make the mmap cache non-optional and only use a single cache, caching everything.
+We remove the parent key of `kdbGet` and `kdbSet` and always return the keyset of the whole KDB.


For using mmap this much, I think we use pointers too much. I'm sure @mpranj has done benchmarks for the pointer correction code in mmapstorage, but maybe not for huge KeySets. If we cache everything and use it a lot, the pointer correction code might become a bottleneck.

In any case, if we go down this route, we should compare doing the pointer correction with something similar to what Flatbuffers do. AFAIK they don't use pointers and store everything as offsets. The offset is resolved into a pointer when data is accessed. So a KeySet would always be in one large memory buffer and all keys only know the relative offset to their name. When you call keyName that offset is resolved and you get a char *. Would be a huge internal change, may be needed, if mmap the entire KDB (which could be very big).

If we cache everything and use it a lot, the pointer correction code might become a bottleneck.

Might, but we don't know that for a fact. I assumed the same as you did, but in my benchmarks the pointer correction was never a bottleneck.

AFAIK they don't use pointers and store everything as offsets.

We also store everything as offsets, it's just that we resolve the offsets eagerly.
Definitely a bigger task, but at least it's quite clear what to do here ...

Since the current solution works, it's probably best to leave it. It would be interesting, if resolving the offsets only on access would do be an improvement. But it's probably too hard to benchmark. It will depend on how frequently the data is actually accessed after the mmap cache is loaded.

It will depend on how frequently the data is actually accessed after the mmap cache is loaded.

Indeed.

doc/decisions/internal_cache.md

kodebach · 2022-10-27T11:36:29Z

doc/decisions/internal_cache.md

+assert (keyName(key) == keyName(key_dup)); // stays always valid
+```
+
+This is already implemented for the MMAP cache, so the implementation should be straightforward (do the same COW duplications as done for MMAP).


Exactly, the mmap data is already COW. So actually not "do the same", but more rename the MMAP flags to COW.

I actually had a different flag in mind so that it doesn't interfere with the mmap cache. See 728fec1

Yes, I got that. What I wanted to say is that mmap already does COW, so we can reuse the code and probably the flag. If there is some code that is only needed for mmap and not COW, we could make mmap set two flags one for mmap and one for the general COW code.

doc/decisions/internal_cache.md

atmaxinger

Thank you for this write-up! It now makes the problem much clearer for me. I spend a lot of time today to look over the mmapstorage plugin, and some things in there would really make a lot of sense for general usage of in Elektra.

I think the In-Memory COW cache sounds brilliant, and with some modifications (storing of the original value) we can make this work for change tracking.

We also need to reach an agreement of what we do with the KeySets, as of now it seems that only Keys are COW which leads to some problems with metadata that @kodebach already pointed out.

doc/decisions/internal_cache.md

atmaxinger · 2022-10-27T12:55:03Z

doc/decisions/internal_cache.md

+
+### In-Memory COW Cache
+
+We keep a duplicated keyset in-memory and tag the keys as copy-on-write (COW).


Just to be clear: As far as I understand, this whole COW only concerns Keys, and not KeySets, right?

This is unspecified, depending on the solution with meta-data.

doc/decisions/internal_cache.md

kodebach · 2022-10-27T13:30:02Z

I think the In-Memory COW cache sounds brilliant, and with some modifications (storing of the original value) we can make this work for change tracking

AFAICT all the mmap options and the in-memory COW option could be used directly without modifications for change tracking. All these options result in a separate internal copy of the data originally loaded. Because of all them are COW (mmap is just a slightly different approach) when the caller changes the values, the copy inside KDB will remain untouched.

Co-authored-by: Klemens Böswirth <23529132+kodebach@users.noreply.github.com> Co-authored-by: Maximilian Irlinger <maxi6594@gmail.com>

markus2330 · 2022-10-27T18:47:43Z

Thank you all for your comments! If there are no further questions about the problem, I would like to merge as draft and let @atmaxinger take over this decision.

markus2330 · 2022-10-28T13:10:29Z

@atmaxinger please press "Merge pull request" if you think this is ready for you to take over.

Markus Raab added 3 commits October 27, 2022 07:02

doc: update goals

4707197

decisions: update internal cache

23bbb88

decisions: fix link

584688b

markus2330 mentioned this pull request Oct 27, 2022

[decision] Change Tracking #4554

Merged

22 tasks

markus2330 requested a review from atmaxinger October 27, 2022 10:11

kodebach suggested changes Oct 27, 2022

View reviewed changes

atmaxinger reviewed Oct 27, 2022

View reviewed changes

markus2330 and others added 6 commits October 27, 2022 20:24

Apply suggestions from code review

7f088e9

Co-authored-by: Klemens Böswirth <23529132+kodebach@users.noreply.github.com> Co-authored-by: Maximilian Irlinger <maxi6594@gmail.com>

decisions: added from discussions

6dd685d

decisions: clarification different flag

728fec1

decisions: clarify text as suggested

20ac16b

decisions: small clarification

fb27293

decisions: reformat

1dc7b5e

kodebach approved these changes Oct 27, 2022

View reviewed changes

decisions: some last comments integrated

8bde2a0

mpranj approved these changes Oct 28, 2022

View reviewed changes

atmaxinger merged commit 514319e into ElektraInitiative:master Oct 28, 2022

mpranj modified the milestones: 0.9.*, 0.9.12 Jan 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Markus/internal cache #4619

Markus/internal cache #4619

markus2330 commented Oct 27, 2022 •

edited

Loading

kodebach left a comment

kodebach Oct 27, 2022

mpranj Oct 27, 2022

kodebach Oct 27, 2022

mpranj Oct 28, 2022

kodebach Oct 27, 2022

markus2330 Oct 27, 2022

kodebach Oct 27, 2022

atmaxinger left a comment

atmaxinger Oct 27, 2022

markus2330 Oct 27, 2022

kodebach commented Oct 27, 2022

markus2330 commented Oct 27, 2022

markus2330 commented Oct 28, 2022


		### In-Memory COW Cache

		We keep a duplicated keyset in-memory and tag the keys as copy-on-write (COW).

Markus/internal cache #4619

Markus/internal cache #4619

Conversation

markus2330 commented Oct 27, 2022 • edited Loading

Basics

Checklist

Review

Labels

kodebach left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atmaxinger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kodebach commented Oct 27, 2022

markus2330 commented Oct 27, 2022

markus2330 commented Oct 28, 2022

markus2330 commented Oct 27, 2022 •

edited

Loading