-
Notifications
You must be signed in to change notification settings - Fork 123
Conversation
fc65446
to
7148da4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated before, I don't think libelektra-core
should do the change tracking directly by storing the old value. It is simply not part of libelektra-core
's goal. Also, while it is not relevant for the use-case in libelektra-kdb
(*), if we do do change tracking in libelektra-core
, it should also include the key name. That would make the whole size overhead a lot worse. I say this, because from libelektra-core
's POV there is no reason why only values should be tracked, but not names.
IMO if we do this in libelektra-core
it would be far better, if both KeySet
and Key
had a field changeCallback
. This would be a function pointer, that is called when the Key
/KeySet
is changed. Then libelektra-kdb
could set this callback and keep an internal record of the changes. However, I don't think this really is a good solution either, because it is still beyond what libelektra-core
should do.
(*) for libelektra-core
tracking names is not needed, because we only deal with KeySet
s and when a Key
is in a KeySet
it's name cannot change.
I think you discredit option 1 far too quickly. It is not only the easiest to implement, but it is also not nearly as bad as you make it out to be. There are already constraints on the kdbGet
and kdbSet
relation. The constraint right now is specifically:
For kdbSet (handle, setKs, setParent)
to succeed, kdbGet (handle, getKs, getParent)
must have been called previously and the relation between getParent
and setParent
must be (at least) one of:
setParent
is the same as or belowgetParent
- both
getParent
andsetParent
are stored in the same backend
AFAICT the only required change would be to say "the last kdbGet
call using handle
must have been kdbGet (handle, getKs, getParent)
" instead of "kdbGet (handle, getKs, getParent)
must have been called previously". This could easily be checked and we could return an error if this is not the case.
Yes, it has an impact on the developer using Elektra. But it is not a huge impact. You just need to use a separate KDB
handle. That too has some memory (and computation) overhead, but in general it should be less than the overhead of e.g. option 3.
In addition to that, I believe that it would be very rare that one of the problematic sequences was created intentionally. It is far more likely that it happend through sharing of a KDB
handle between unrelated components or even threads (which is explicitly unsupported).
Furthermore, I think this might not even have any additional memory overhead. In the KDB
handle's KeySet * backends
we already have a struct _BackendData
for every backend that was used. These contain a KeySet * keys
, with the keys for the backend. Importantly, those keys
are already keyDup
'd. I think that data could be used to detect changes between kdbGet
and kdbSet
.
Finally, when talking about memory overhead you also have to keep in mind that Elektra is global. If you start a change tracking session, this should affect all applications using Elektra. Therefore, each of these applications will have the overhead of change tracking.
doc/decisions/change_tracking.md
Outdated
it may be noticable for `Key`. On a 64-bit system we'd add 8+8=16 bytes to it. | ||
To put this in perspective, the current size of the `Key` struct is 64 bytes, | ||
so we'd add 25% overhead to an empty key. However, this percentage will be much lower in | ||
a real-world application, as the usefulness of an empty key is very low. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you take all the allocated data into account and not just struct _Key
, then even for an "empty" key it is technically just ~23% overhead. That's because even an "empty" key needs a name an the smallest name /
, needs 2 bytes for the escaped and 3 bytes for the unescaped form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atmaxinger can you please elaborate which 16 bytes you mean? Isn't a single pointer to "change-tracking data" enough? And maybe it even can be in meta-data, not needing any extra bytes in _Key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8 bytes for the pointer, 8 bytes for the size.
I won't comment on the rest, because this clearly goes into too much detail. If the problem is now clear, this PR should be merged and this should be addressed in the next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job, it is much more clear to me now!
## Constraints | ||
|
||
Change tracking must: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add links to the reelvant use cases here.
@markus2330 @kodebach I have updated the decision and addressed (hopefully) all your remarks. In particular, I have removed the first two constraints-violating alternatives, and added more detailed info about the hooks-based approach. It would be greatly appreciated if we come to a consensus which approach we'll decide on soon, so that I can start implementing it as quickly as possible 🚀. |
Of course, but I think we need to decide #4574 first. If the sequences are limited, this decision becomes much simpler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is duplication with #4574. Please only request reviews when all work was done. Or is this wrongly shown by GitHub?
doc/decisions/change_tracking.md
Outdated
1. on `kdbGet` duplicate the returned keyset, | ||
2. on `kdbSet` calculate what changed in between. | ||
|
||
This basic approach will fail when there is another `kdbGet` operation for a different part of the KDB in between. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I stopped reading here, as this duplicates with #4574
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it and referenced #4574 instead.
@markus2330 of course it duplicates to some extend with #4574, as the "illegal" sequences mostly only affect plugins that do change tracking. Creating another decision for it was your idea. The thing is, and this is where I strongly disagree with @kodebach, we can actually decide this independently from #4574, as all of the alternatives presented in here allow us to work around the "illegal" sequences issue. In the case of alternatives 1-3, we can just store all keysets that were returned by In the case of alternative 4, the change tracking is completely done on KeySet level, so the sequences of Essentially, all this decision is about is how we should do change tracking:
|
Actually cleared or, rather replaced and updated? Unless I'm mistaken, |
Having deeply duplicated keys of everything is out of question. We need to go down with memory consumption and not double it! So it goes against the constraint to not use more memory. What I can totally imagine (and is probably a brilliant idea) is that we always use COW semantics from the keys that are passed out from our API (also in the non-mmap case; for mmap we actually already do it that way) and keep references to everything inside. Then we could:
There is already a draft decision on that topic: doc/decisions/internal_cache.md @atmaxinger can you take over this decision, too? |
You misunderstood, my idea... I suspect, we already have a deepDup'ed copy of the keys that we can use. We could probably get rid of this, but AFAIK it was already present in the split of the old backend system.
General copy-on-write semantics are very complicated and I don't believe we can do that without API changes. Since the IMO the only option to do COW properly, is extending the Also @markus2330 since you seem to have a vague idea about how this should work, could you maybe write that up? To me it is currently entirely unclear what exactly you mean by "always use COW semantics from the keys that are passed out from our API". |
@kodebach actually cleared, on mutliple occasions. Yes, @markus2330 I don't see a way to do change tracking without using more memory. After all, we need to store original values of changed keys. However, as mentioned this will only be done if a plugin explicitly requests it. Having a central code that does the change tracking can actually lower the memory consumption compared to the current state. Currently, if you use both the
I read it, and IMHO it really isn't adding anything to this. This decision is very vague and I don't understand its purpose. Actually it very much reminds me of stuff we already do in new-backend, especially the "keep duplicated keyset internally". Maybe @kodebach can correct me on that. Also, the considered alternatives are only bullet points without further description. As with @kodebach, I have no idea what you mean with "COW semantics". Could you be so kind and write it up? Does it differ much from what I have written in this decision? I'd also like to mention that for me this decision is slowly drifting into bikeshedding. As I have already mentioned, the whole change tracking stuff should be behind an API. How it is implemented should be an implementation detail. I'd rather build a trivial but non-optimised solution first to validate our use cases (this also includes RecordElektra). Optimizations can then always be done later on. If we start doing full COW for the whole of Elektra before implementing this, it will take a long time before we end up with anything useful. |
@markus2330 @kodebach if we don't reach a decision here in short time, I will implement a prototype based on option Nr 3 - doing it in a seperate plugin while reusing the already duplicated data. |
How would you access the already duplicated data in a plugin? I think using You could pass a "current KDB state" One advantage of this, would be that all hook plugins are implemented as a single (main) plugin and not a list of plugins. Where multiple plugins could be active (e.g. notifications) the main hook plugin would call extra plugins. |
As for other plugins, they will just use the change-tracking API that I will provide. This API will be general enough that we can replace the implementation of the change tracking to whatever we like later. For example: bool elektraChangeTrackingIsEnabled(KDB *kdb);
KeySet * elektraChangeTrackingGetAddedKeys(KDB* kdb, Key * parentKey);
KeySet * elektraChangeTrackingGetRemovedKeys(KDB * kdb, Key * parentKey);
KeySet * elektraChangeTrackingGetModifiedKeys(KDB * kdb, Key * parentKey);
bool elektraChangeTrackingValueChanged(KDB * kdb, Key * key, Key * parentKey);
bool elektraChangeTrackingMetaChanged(KDB * kdb, Key * key, Key * parentKey);
KeySet * elektraChangeTrackingGetAddedMetaKeys(KDB * kdb, Key * key, Key * parentKey);
KeySet * elektraChangeTrackingGetRemovedMetaKeys(KDB * kdb, Key * key, Key * parentKey);
KeySet * elektraChangeTrackingGetModifiedMetaKeys(KDB * kdb, Key * key, Key * parentKey);
Key * elektraChangeTrackingGetOriginalKey(KDB * kdb, Key * key, Key * parentKey);
Key * elektraChangeTrackingGetOriginalMetaKey(KDB * kdb, Key * key, const char * metaName, Key * parentKey); Alternatively, I could just put the computed changeset into a bool elektraChangeTrackingIsEnabled(KDB *kdb);
ChangeTrackingContext * elektraChangeTrackingGetContext(KDB * kdb, Key * parentKey);
KeySet * elektraChangeTrackingGetAddedKeys(ChangeTrackingContext * context);
KeySet * elektraChangeTrackingGetRemovedKeys(ChangeTrackingContext * context);
KeySet * elektraChangeTrackingGetModifiedKeys(ChangeTrackingContext * context);
bool elektraChangeTrackingValueChanged(ChangeTrackingContext * context, Key * key);
bool elektraChangeTrackingMetaChanged(ChangeTrackingContext * context, Key * key);
KeySet * elektraChangeTrackingGetAddedMetaKeys(ChangeTrackingContext * context, Key * key);
KeySet * elektraChangeTrackingGetRemovedMetaKeys(ChangeTrackingContext * context, Key * key);
KeySet * elektraChangeTrackingGetModifiedMetaKeys(ChangeTrackingContext * context, Key * key);
Key * elektraChangeTrackingGetOriginalKey(ChangeTrackingContext * context, Key * key);
Key * elektraChangeTrackingGetOriginalMetaKey(ChangeTrackingContext * context, Key * key, const char * metaName); |
It's one KeySet per backend, they can be merged into a single KeySet to get the one from I think the merging should just be done in
|
Exactly
Can't we just add it to
From |
@kodebach turns out while yes we do deep duplication in |
Ah okay that actually makes more sense. I was really sure why we keep this internal deep-duped copy, but if we actually return those keys it makes a lot more sense. However, maybe we could still use this. I think we can just add an extra parameter void backendsMerge (KeySet * backends, KeySet * ks, bool deepDup)
{
for (elektraCursor i = 0; i < ksGetSize (backends); i++)
{
const Key * backendKey = ksAtCursor (backends, i);
BackendData * backendData = (BackendData *) keyValue (backendKey);
if (keyGetNamespace (backendKey) != KEY_NS_DEFAULT)
{
ssize_t size = ksGetSize (backendData->keys);
backendData->getSize = size;
if (deepDup)
{
KeySet * dup = ksDeepDup (backendData->keys);
ksAppend (ks, dup);
ksDel (dup);
}
else
{
ksAppend (ks, backendData->keys);
}
}
}
} Normally |
Please Note: all relevant discussions must be accounted for within the decision. I will not read your discussion to have a fair view on the decision, as someone else will have when later reading the decision (e.g. in the second round).
In particular it is important that
The progress will be faster if your PRs are in a better state. Please finish #4515 before creating new PRs. If you are really stuck with everything, please write in #4463. |
Yes, of course the document needs to be updated. But until now the discussion wasn't completed, so there was no point in updating the document. Now @atmaxinger should update the document and add using
Any additional duplicating that happens, will of course only happen when change tracking is active. However, when it is active the duplication must happen during |
You don't need to. But those discussions are relevant for reaching a consensus and updating the document. How else would you or @kodebach be able to share your thoughts if not with these discussions? There is no point to update anything in the file if we don't even agree on what to update.
Yes, while the function The only way we don't deep-dup is when using option 4 - the COW approach. But you didn't event comment on that, nor did you provide an answer to my question whether this was what you ment with your "COW semantics". That said, this approach also has downsides, as already mentioned in the decision file.
This is a completey different PR which can be worked on in parallel. There is no need for it to be merged while doing this. In fact, change tracking is more important for the general record use-case. #4515 only deals with usability improvents. Still very important for the finished product, but change-tracking is the base of it all. |
Absolutely, thus my insistence on a clear problem, constraints, assumptions etc. This was not the case in my last review so obviously any discussion on such grounds are questionable. Please inform me when I should reread.
No, this statement is plain wrong. E.g. read #4619. We needed to discard the whole implementation because it is a dead end. At that time we obviously didn't put enough efforts in finding the right decision. So please let us not repeat the same. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small things, otherwise I think the problem is described clear enough to merge the PR as draft.
doc/decisions/change_tracking.md
Outdated
it may be noticable for `Key`. On a 64-bit system we'd add 8+8=16 bytes to it. | ||
To put this in perspective, the current size of the `Key` struct is 64 bytes, | ||
so we'd add 25% overhead to an empty key. However, this percentage will be much lower in | ||
a real-world application, as the usefulness of an empty key is very low. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atmaxinger can you please elaborate which 16 bytes you mean? Isn't a single pointer to "change-tracking data" enough? And maybe it even can be in meta-data, not needing any extra bytes in _Key.
doc/decisions/change_tracking.md
Outdated
it may be noticable for `Key`. On a 64-bit system we'd add 8+8=16 bytes to it. | ||
To put this in perspective, the current size of the `Key` struct is 64 bytes, | ||
so we'd add 25% overhead to an empty key. However, this percentage will be much lower in | ||
a real-world application, as the usefulness of an empty key is very low. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8 bytes for the pointer, 8 bytes for the size.
I won't comment on the rest, because this clearly goes into too much detail. If the problem is now clear, this PR should be merged and this should be addressed in the next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is clear, IMO this can merged. @markus2330 What do you say?
a70f165
to
c64ccf8
Compare
Co-authored-by: Klemens Böswirth <23529132+kodebach@users.noreply.github.com> Co-authored-by: Markus Raab <markus2330@users.noreply.github.com>
c64ccf8
to
0d76634
Compare
@markus2330 I have rebased to current master and addressed all your remarks. |
Thank you, great job! I agree the problem is clearly presented now! ❤️ |
Basics
(added as entry in
doc/news/_preparation_next_release.md
whichcontains
_(my name)_
)Please always add something to the release notes.
(first line should have
module: short statement
syntax)close #X
, are in the commit messages.doc/news/_preparation_next_release.md
scripts/dev/reformat-all
Checklist
(not in the PR description)
Review
Labels