Skip to content
This repository has been archived by the owner on Oct 15, 2024. It is now read-only.

[New Backend] Discussion: Record Elektra tooling #4371

Closed

Conversation

atmaxinger
Copy link
Contributor

DO NOT MERGE

This is the first draft of what I envision the session recording tooling to be. The proposed solution should work with all the use cases I could think of, but someone more senior with the usage of Elektra will probably find some features lacking.

As for the tooling, we could probably also write a C library for it as proposed in #4208. That would also require a rewrite of the existing import and export tooling in C.

@markus2330 Please take a look at it and offer your valuable insights. We could also discuss this next week if there are any big topics to discuss.

@kodebach I'd also really appreciate your feedback.

Copy link
Member

@kodebach kodebach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks very promising to me, but there are a few problems I can see.

The recording on a per-user basis could cause problems with sudo, but I don't really have a good solution for this. Maybe you could store the whole recording data outside the KDB in e.g. $HOME. The env-var should be preserved for sudo command, but that obviously complicates your plugin.

The kdb record export feature is also a bit problematic. I think you should start with a simpler kdb record replay. Maybe also consider creating a kdb record diff/kdb record apply to create and apply proper patch files (in a somewhat custom format) instead of relying on export/import.
The kdb export/kdb import tools were designed to export/import state snapshots. But I think you want a proper diff/patch system. Where the patch file can only applied, if the KDB is in a compatible state (like with git).


The `kdb record` tooling enables the user to record configuration sessions. It also allows exporting and importing the configuration changes.

Recording sessions are stored on a per-user basis. This implies that a recording session can only track changes made by a single user.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about cases where a sudo kdb is required, either because you need to change system:/ or spec:/ keys, or because of other permission issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could mount user:/elektra/record with a (custom) resolver that only uses the $HOME to detect the user directory?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such a resolver would fail, if $HOME isn't set. It also kinda defeats the point of user:/.

I'm not sure what the correct solution is but, storing information about system:/ and/or spec:/ keys in the user:/ namespace was weird from the beginning. You could store recordings in system:/elektra/record, but then you get into permissions issues. With a single mountpoint you either always need sudo, or anybody can write into any session. So for proper permissions, you'd need to create a new mountpoint for every session, then the file could be owned by the user that started the session. But creating mountpoints again requires sudo.

Maybe we should store the recordings outside the KDB entirely. I'm think of something like this:

  1. Every session is stored in a single file. The path of the file serves as the unique ID (no session_name needed)
  2. kdb record start [session_file] takes the path to the session file, if none is given it creates one in the current directory. If the given file exists the session will be resumed, otherwise a new file is created.
  3. The session file uses a line-based format (something custom or, e.g. JSON lines). That way adding a new entry to a session is easy, just append a line. It also avoids the need for sequence numbers or unique timestamps, the ordering is given by the order of the lines.
  4. The active session is defined by /elektra/record/session (normal cascading lookup). kdb record start writes to user:/elektra/record/session by default, but to cover the sudo case you can use sudo kdb record start --system with writes to system:/elektra/record/session which starts a system-wide session. Alternatively, you could call kdb record start and sudo kdb record start with the same file to start the session for both the current and the root user, as along as both users have access to the file this should work. System-wide (or even root) sessions could of course be problematic on multi-user systems, because a different user could modify the KDB (possibly with sudo) simultaneously which would interfere with the recording. So as a final option, I would suggest that the ELEKTRA_RECORD_SESSION environment variable is also read (*) and takes precedence. Then you can simply set the env-var (has to be done manually) and as long as you stay in the same shell sudo changes will be recorded correctly.

By storing recordings outside the KDB you also don't need to make sure you exclude the user:/elektra/record keys from the recording.

Item 4 from above, could be applied to your current setup as well. But then you still need to solve the problem of where to store the recording.


(*) We could use gopts for that, but IMO that would be overkill here. Directly reading ELEKTRA_RECORD_SESSION and putting it into proc:/elektra/record/session (ensures precedence) either during kdbOpen or during kdbGet, is much simpler and also does the job.


If you really want to store everything in the KDB, there is another option. It's not pretty, but it would work. Let the env-var ELEKTRA_RECORD_USER define the user for which the changes should be recorded, if the root user causes a change. In other words, if you set ELEKTRA_RECORD_USER=kodebach before calling sudo kdb set system:/foo 123 then the change will be recorded in the active session of user kodebach instead of the root user's session. I say this is not pretty, because it requires special case handling for when the root users changes something (not so hard), and writing to another user's user:/ namespace (quite a bit harder).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it looks like storing the session externally is the best option.

Not sure though that JSON is the best format for this, as storing binary data is pretty cumbersome. I should probably evaluate some other formats like BSON or protobuf as well.

Question is whether @markus2330 approves of adding a new dependency to the kdb utility. We could make it optional at build-time though and only include the kdb record commands only when it is included.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure though that JSON is the best format for this, as storing binary data is pretty cumbersome.

That's true.

I should probably evaluate some other formats like BSON or protobuf as well.

To make the whole "just append to a file" idea work, you need a format that has streaming support. If you need to wrap the whole file into something like a JSON array, then "just append to a file" doesn't work.

If you want to look at protobuf, maybe also check out flatbuffers. It's similar but designed to require minimal parsing/unpacking. The format can even be mmaped.

adding a new dependency to the kdb utility.

I see a few options:

  1. Use a sufficiently simple format that we can implement the parsing ourselves.
  2. Use a library that can be linked statically (e.g. I think for flatbuffers this can be done)
  3. Make kdb record optional as you suggest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atmaxinger Can you please summarize pros/cons of external serialization in a decision?

I think it is very elegant to have everything within KDB:

  1. we can use all our plugins as storage format
  2. users can read/manipulate the recordings with their own tooling easily

But if there are compelling reasons against it (sudo alone imho is not, this could be fixed by having "system" sessions that are also accessible by a user, e.g. because it is mounted to /tmp), we'll look for alternatives. Currently the only feature in Elektra that has external data is the cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by having "system" sessions that are also accessible by a user

That doesn't work in a true multi-user environment. The system session would be active for all users and that would be nightmare. The environment variable trick, was the only solution I could come up with for this problem.

I think it is very elegant to have everything within KDB [...]

I guarantee you we will again run into the limitations of the KeySet structure. For one you are limit to those formats that can store a KeySet 1:1 with all metadata, non-leaf values, binary data, arbitrary names etc.

Some other reasons:

  • It's kind of an abuse of the KDB. The KDB is meant to store configuration data. This is quite clearly not configuration data.
  • Users can't choose the storage plugin anyway, since we store data below /elektra where adding mountpoints is prohibited.
  • I don't think reading/manipulating the recordings with existing tooling would really be easy, if we force the recording into a KeySet format. The format with the key names as basenames needs an extra parsing step to actually know what's going on. You need to first parse the storage file as normal, but then you need to handle the key names as basenames before you actually have the parsed the data fully.
  • Using something like "JSON lines" would actually be incredibly easy to handle with existing tools. Just using the jq tool already gets you very far. The problem is of course binary data, but Base64 is always an option, or we could support a text-based and a binary format.

One thing we could do is a mixed solution. We use a non-KDB wrapper format for the log entries, but each entry contains an old and a new field which are encoded by the plugin specified in format. This would also take care of the issue of changes to multiple keys, since we are recording a full KeySet for old and new.

Binary values will still be a problem, if the wrapper format is text-based. But again we can use Base64, or we use a binary wrapper format. The wrapper format could even be implemented in plugins. Although those would have to be separate from the normal storage plugins.

Here are some examples with different wrapper formats:

JSON Lines

{ "parent": "user:/foo", "format": "dump", "old": "kdbOpen 2\n$end\n", "new": "kdbOpen 2\n$key string 0 3\n\n123\n$end\n"}
{ "parent": "user:/foo", "format": "dump", "old": "kdbOpen 2\n$key string 0 3\n\n123\n$end\n", "new": "kdbOpen 2\n$key string 0 4\n\n1234\n$end\n"}

YAML

parent: "user:/foo"
format: dump
old: |
  kdbOpen 2
  $end
new: |
  kdbOpen 2
  $key string 0 3

  123
  $end
---
parent: "user:/foo"
format: dump
old: |
  kdbOpen 2
  $key string 0 3
  
  123
  $end
new: |
  kdbOpen 2
  $key string 0 4
  
  1234
  $end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get good results in a reasonable time we must take one step after the other:

  1. Requirements+Description of user-facing parts
  2. Major decisions (structure of log format, internal/external, global plugins, tooling, etc) and architecture
  3. Details like which extra libraries, how to implement etc.

Currently we are at step 1, slowly progressing to step 2. Now is not yet the time to discuss syntax, formats etc. (Step 3) They easily bring us astray, as happened in this discussion. Let us focus again on requirements.

The question is which of the following statements need to be captured by the same session:

  1. kdb set dir:/some/key
  2. kdb set user:/some/key
  3. sudo -u other_user kdb set user:/some/key
  4. sudo kdb set system:/some/key
  5. kdb set user:/some/key executed by another user

also answering the question:

Do we want the record tooling to work different to Elektra, which knows system, user and dir scopes?

I would answer the second question with no and would not expect 3-5 to end up in a user-session. I would expect all of the calls to end up in a system-session.

From your earlier statements, related to these requirements:

But creating mountpoints again requires sudo.

Not if we finally implement #1074. Maybe it is related to how recording should look like, as also for recording there might be use cases to have recordings for a system, for a user and for a specific directory.

Users can't choose the storage plugin anyway, since we store data below /elektra where adding mountpoints is prohibited.

I don't see a fundamental problem, we could easily allow well-defined mountpoints within /elektra, in specific for recording.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is quickly becoming one of those discussions with 1000 questions and no answers, where all arguments go straight past each other. I will address some of your concerns, but my main response is at the bottom and in a separate comment outside this thread..


You did not address my concerns, about system sessions in multi-user environments. Let's say Alice needs to record a sequence that modifies both user:/ end system:/ keys. To achieve her goal, Alice starts a session alice1 in both user and in system mode.

kdb record start alice1           # creates and activates the session for current user
sudo kdb record start alice1  # creates and activates the session system-wide

Now all changes that Alice makes are recorded in the user/system session. But what happens if at the same time, Bob also uses the machine and makes some changes to system:/ keys. These would be recorded in the alice1 session, since the session is active system-wide. Even worse, what if Bob also wants to start a system session

sudo kdb record start bob1

Now some of Alice's changes will be in alice1 (maybe with some stuff from Bob), but the rest will go into bob1.

Unless I misunderstood what you mean by "system sessions", this cannot work like this. There is only one system:/elektra/record/config/session/current/name key for the whole system and we can't use a key in user:/ to define the active system session, because that once again won't be accessible to the root user.

This problem totally negates your question "which of the following statements need to be captured by the same session". If the whole concept doesn't work, there is no need to decide what is recorded in which session.

I think this "system session" concept could only work, if we use an environment variable to define the active session instead. That wouldn't be shared between users. But you didn't even react to this suggestion at all...


To address some other things:

But creating mountpoints again requires sudo.

Not if we finally implement #1074.

IIRC the decision there was that user:/elektra/mountpoints can only defined user:/ or dir:/ mountpoints, but not system:/ ones. So that's not gonna help.

I don't see a fundamental problem, we could easily allow well-defined mountpoints within /elektra

True, that issue might not actually be a problem.


With all of that out of the way, I agree with you that we need to go step-by-step here. However, IMO we should at least partially consider everything from the beginning. It is useless, if we come up with a perfect description of requirements, only to discover that those requirements cannot be put into practice.

doc/dev/kdb-record.md Outdated Show resolved Hide resolved
* `kdb set user:/my/app/hello max`
* `kdb set user:/my/app/hello susi`

Only the value `susi` will be exported. Additionally, if you execute `kdb set user:/my/app/hello world` the key will not be exported at all, as there are no changes compared to the beginning of the recording.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should also be a kdb record replay that simply applies one change after another exactly as recorded. Firstly, this is much simpler to implement, so you'll have something to test much sooner. And secondly, there could be cases where because of complex plugin interactions you need some intermediate state to get the correct final result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, a "replay" and "undo" (go back to what was before the session started) sound much more intuitive. "replay" is probably export and then import with merge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"replay" is probably export and then import with merge.

I think that's unnecessarily complex. Directly applying all the recorded changes one-by-one and then doing kdbSet at the end (to make the abort in case of conflicts easy), seems much simpler.

Copy link
Contributor Author

@atmaxinger atmaxinger Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear: should replay export something so that it can be used on another machine/kdb instance, or is it only a "redo" if you "undo" the changes of this session?

If we want to export something I can envision that it just outputs a shell script with the exact kdb commands that have been entered in this session.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about replay applying the changes immediately, but writing to a file could be even better. But my main point was that kdb record export seems to produce the diff between start and end. So something like this

kdb set user:/foo/bar abc

kdb record start
kdb set user:/foo/bar 123
kdb set user:/foo/bar abc
kdb record export

produces an empty export file, since user:/foo/bar had the same value at the start and at the end.

With kdb record replay the intermediate change should still happen, because it could have side effects (depending on what plugins do).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear: should replay export something so that it can be used on another machine/kdb instance, or is it only a "redo" if you "undo" the changes of this session?

Yes, we need to distinguish these use cases clearly. The minimum local functionality is undo (redo, replay is probably not needed so often). The more interesting part is the export+import functionality, which must continue to work after modifications, ergo 3-way merge yields better results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this really depends on what export actually does. If it ignores intermediate changes, some other command that doesn't is also essential. It could also be an option for the kdb record export command. But if we cannot export the full log of changes, the tool is just a diff tool and not a proper recording tool.

Added system:/hosts/ipv4/www.microsoft.com with value 4.4.4.4
Removed system:/hosts/ipv4/www.apple.com
$ kdb record export --format ansible
<output corresponding to ansible-libelektra plugin (https://galaxy.ansible.com/elektra_initiative/libelektra)>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide the full output

Comment on lines 39 to 41
$ kdb set system:/hosts/ipv4/www.google.com 8.8.8.8
$ kdb set system:/hosts/ipv4/www.microsoft.com 4.4.4.4
$ kdb rm system:/hosts/ipv4/www.apple.com
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on your installation these may not work. It is quite likely that sudo is required to modify system:/ keys, since /etc/hosts normally is owned by root:root.

Every modification will include its UTC timestamp and the modified key in its path.
For example, `user:/record-elektra/sessions/MySession/12345678/system:\/hello`. The value of this key will be the modification action, i.e. `created`, `modified`, `deleted`.

Based on which modification took place (creation, modification or deletion), the old and new value for the key will also be recordes as `.../old` and `.../new`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, seems like I forgot about meta-data. However it should not be too hard to fit that in using the same mechanism.. will take a look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to clarify what exactly gets recorded.


## Importing configuration changes

The output of the `kdb record export` command can be imported with the existing `kdb import` functionality. Depending on the specified storage plugin, other steps may be necessary, e.g. the Ansible plugins creates an Ansible playbook.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don' think this is as easy as you make it out to be. I'm pretty sure you need at least a special kdb import --strategy, if not a separate kdb record import.

See below (Tooling) for more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there should be also an explicit statement which --strategy must be used. For the user, however, it is better to provide a replay, even if it is only a 2 line shell script.

The tooling will be implemented as part of the `kdb` command line utility.

The most complex part of it will be the `kdb record export` command.
When invoked, it must compute the minimal changeset of the session.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't record the full initial state of a session, which is fine, since making a copy of the full KDB for every recording would be pretty expensive. But this means that you only know about keys that were modified or deleted, unmodified keys will simply be missing from the changeset (which is totally normal).

But this means you need a special format for the changeset. The standard storage formats (like dump) have no way to record deleted keys. A deleted key would simply not be present. But as stated above, unmodified keys would also be missing. So there is no way to distinguish them.

With the standard kdb import you then only have two options:

  1. Keep the KDB as is and only overwrite what is contained in the imported KeySet
  2. Delete the part of the KDB where data will be imported and write the imported KeySet there

Therefore, with option 1 you might keep keys that should be deleted and with option 2 you might delete keys that should be kept unmodified.

So, you need a somewhat special export format, that can either be imported with a certain plugin, or with a separate kdb record import plugin. Your export format needs to record deleted keys and that is not something the normal storage formats are designed to do.

However, it might be enough to use a format that supports metadata (like dump or quickdump) and add a special metakey (e.g. meta:/elektra/deleted) to keys that should be deleted. You'll still need a separate kdb record import or a special kdb import --strategy to correctly apply these changes though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

I'm thinking we should define two new functions that plugins can implement - diffImport and diffExport (names are up for discussion).

This would allow us to implement various formats in the future.

kdb record export and kdb record import would then just call the respective function of the plugin.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is a good approach. I think you should either use a KeySet based format (hard to do, but may be possible with heavy metadata use), which could then be stored with the existing plugins, OR you should just hard code a custom format somewhere for now.

The diffImport and diffExport functions would have entirely different requirements than the normal get/set functions of storage plugins, so it wouldn't really make sense to add them to the existing plugins.

Personally, I don't really see the point of having different patch formats, so I would just create one hardcoded format.


## Exporting configuration changes

`kdb record export [session_name] [--format <format>]`: Will dump the changed keys of the specified session onto `stdout`. If no session name is provided, the current session will be used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably provide an example before this, since the current order makes it sound like kdb record doesn't actually record all changes. Until you get to the example, it sounds like you just get a diff between start and stop states.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, it is better to explain first and then give examples. The explanation probably needs an update. Simply saying "dump the changed keys" is vague. What about removed keys?

"onto stdout. If no session name is provided, the current session will be used." -> again details for the man page

Comment on lines 43 to 46
$ kdb record changes
Changed system:/hosts/ipv4/www.google.com from 1.2.3.4 to 8.8.8.8
Added system:/hosts/ipv4/www.microsoft.com with value 4.4.4.4
Removed system:/hosts/ipv4/www.apple.com
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there should also be a way to mark the initial state of a keys as important.

$ kdb record mark system:/hosts/ipv4/www.libelektra.org
$ kdb record changes
Unchanged system:/hosts/ipv4/www.libelektra.org must have value 7.7.7.7

This would ensure that system:/hosts/ipv4/www.libelektra.org is recorded and exported with the current value.

In the diff/patch system I mentioned, this could be used to force compatible KDB states. In the export/import system it is less important since you could just change and then change back the key to get it into the changeset. But sometimes there may be keys that can't be changed without side-effects that you don't want during the session recording. For example, you might need to record the port a server uses, to make sure the rest of your session is compatible. But if this server is currently running, changing the port could break things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced if such a feature is needed. Wouldn't someone simply edit the exported file in such cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't someone simply edit the exported file in such cases?

Depends how easy it is to edit the file. But it's also a matter of discoverability. Users need to know that they can add such things to the exported file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this feature could come in really handy and it doesn't look like it's going to be too much of a hassle to implement. I'll consider it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Editing the files obviously has a big drawback if you are in a round-trip workflow. So yes, some explicit statement of that you need the default as it is now makes sense.

It would bring quite a number of new cases, though: also marked keys could be changed, added or removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would bring quite a number of new cases, though: also marked keys could be changed, added or removed.

I don't think so. IMO "marking" a key could internally just be implemented as recording a "Changed user:/foo from A to A", i.e. with the /old and /new keys being equal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kodebach I don't understand your sentence.

I mean cases like that a marked key could be removed and a later time reappear, then it still should be marked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could restrict marking keys to keys that have not been modified yet in the current session. Before an import we then check that those marked keys have the exact value.

Alternatively, we could just "mark" all changed keys implicitly. We already record the old values of modified keys. We could just check that the "old" values of all changed keys that the will be imported equal the values in the target database. This way we don't need an explicit mark command. Additionally, we can provide a command line paramter like --dont-check that skips this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just check that the "old" values of all changed keys that the will be imported equal the values in the target database

I would've done this anyway. If you don't check for the old value, the whole thing isn't really different from just making your modifications and doing a normal kdb export and a normal kdb import on the other end.

Additionally, we can provide a command line paramter like --dont-check that skips this.

Yes, either that, or there should be away to do this on a key by key level. But the default should the that the old value is checked. Otherwise, you accidentally replace something important.

This way we don't need an explicit mark command.

We still do, because maybe you want to mark an unchanged key. You cannot mark all of those, because then you're just doing an export of the whole KDB.


I think before we can discuss this properly, we need to distinguish between a diff and a log.

kdb set user:/foo 123
kdb record start

kdb record mark user:/foo
kdb rm user:/foo
kdb set user:/foo 123

After this sequence there will be a log of everything that happend somewhere in user:/elektra/record. This log will contain something like:

Started session
Marked key user:/foo with value 123
Removed key user:/foo with value 123
Added key user:/foo with value 123

In this log there is no problem at all. My proposal here was that instead of introducing a new "Marked key ... with value ..." type of log entry, we just let the log look like this (even if it looks a bit silly):

Started session
Changed key user:/foo from value 123 to value 123
Removed key user:/foo with value 123
Added key user:/foo with value 123

Of course this is just the internal representation. On the user-facing side, we should use a more sensible text.

Now all the problems you are thinking of only arise, when you try to collapse this log into a single diff. I don't know whether user:/foo should still be marked in this case. IMO it is a matter of definition, if removing a key removes it's marked status. Things become even more complicated, when you also add a final change:

Started session
Marked key user:/foo with value 123   |   Changed key user:/foo from value 123 to value 123
Removed key user:/foo with value 123
Added key user:/foo with value 123
Changed key user:/foo from value 123 to value 456

Now what? Is the key now marked with value 456? Has it now lost the marking?

This is actually, why I think we really need a way to export the log and not just the diff. That's also why I wanted kdb record replay (which could also take an exported log as input). The replay command would replay the log one step at a time, instead of just applying a diff. So in our example a replay would:

  1. Check that user:/foo has the value 123
  2. Check that user:/foo has the value 123 and remove it
  3. Add the key user:/foo with value 123
  4. (Check that user:/foo has value 123 and change it's value to 456)

That's where the "Changed user:/foo from A to A" idea came from. We need to check that the current value is A anyway (otherwise there is a conflict) so the fact that we then set it to A doesn't change anything, but it achieved the goal of having a check.

Final note: So far this discussion is only about changing one key at a time. But it must of course be possible to have "changed/added/removed/marked" for multiple keys in a single log entry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we must clarify logs/recordings and diffs/merges. @atmaxinger please also extend doc/help/elektra-glossary.md

Copy link
Contributor

@markus2330 markus2330 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several features not yet described clearly and the overall structure should be changed (tutorial, man pages and architecture split in different files).
Overall: Looks amazing, I am very much looking forward to this functionality. 🚀

doc/dev/kdb-record.md Outdated Show resolved Hide resolved
doc/dev/kdb-record.md Outdated Show resolved Hide resolved
doc/dev/kdb-record.md Outdated Show resolved Hide resolved
* `kdb record stop`: Stops the current recording session.
* `kdb record delete [session_name]`: Deletes the specified recording session. If no session name is provided, the current session will be deleted. Deleting a session will not stop the recording.
* `kdb record list-sessions`: Lists all sessions by their name
* `kdb record changes [session_name]`: Shows all changes for the specified session. If no session name is provided, the current session will be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this some kind of export, too? I don't think this command is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on what the actual format of record export looks like, it might be hard for a human to read. I think a simple listing of all changes like this could be quite helpful. We should, however, make it very clear that the output is not intended to be machine readable and might change at any time.

Copy link
Contributor Author

@atmaxinger atmaxinger Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as @kodebach mentioned the data displayed by this command is intended to be human-readable, as opposed to the machine-readable output of record export.

This feature is intended as a helpful reminder when debugging the currently changed configuration. This should be documented more clearly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is kdb record status then? Changes, also in human-readable or statistical formats could be done by plugins, then it is also usable outside of kdb record.


## Exporting configuration changes

`kdb record export [session_name] [--format <format>]`: Will dump the changed keys of the specified session onto `stdout`. If no session name is provided, the current session will be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, it is better to explain first and then give examples. The explanation probably needs an update. Simply saying "dump the changed keys" is vague. What about removed keys?

"onto stdout. If no session name is provided, the current session will be used." -> again details for the man page

Comment on lines 43 to 46
$ kdb record changes
Changed system:/hosts/ipv4/www.google.com from 1.2.3.4 to 8.8.8.8
Added system:/hosts/ipv4/www.microsoft.com with value 4.4.4.4
Removed system:/hosts/ipv4/www.apple.com
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced if such a feature is needed. Wouldn't someone simply edit the exported file in such cases?

<output corresponding to ansible-libelektra plugin (https://galaxy.ansible.com/elektra_initiative/libelektra)>
```

## Architecture
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the place where it should be! The file can be renamed to doc/dev/record.md as it describes the whole recording, not only the user-facing tools.

doc/dev/kdb-record.md Outdated Show resolved Hide resolved

All recording sessions are located under `user:/record-elektra/sessions/<SESSION_NAME>`.

Every modification will include its UTC timestamp and the modified key in its path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about actions occurring within a second? Sequence numbers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! This is only a problem if the same key was to be changed multiple times in the same time interval.
We could use a timestamp with nanosecond resolution or ticks, that should be good enough.
Sequence numbers would also work, if we can generate them reliably.

doc/dev/kdb-record.md Outdated Show resolved Hide resolved
@kodebach
Copy link
Member

I think we need to take a step back here. We need to answer the fundamental question:

What is the intended use case for kdb record?

Not just, how can I use it, or what does it do. But actually why would I use it?

@atmaxinger @markus2330 Please both answer this question. Let's postpone all further discussion until then, so that we can compare, if we are even trying to achieve the same goal here. When we have our answers, we can find some common ground and add a use-case definition to the document.


For me the intended use case of kdb record is similar to our shell-recorder, but specific to Elektra. So I have a sequence of modifications to the KDB (*) that I want to record and then replay. This replaying could be for multiple reasons, maybe I want to use it as a test case (similar to shell-recorder), or maybe I want replay it on another machine. Maybe I just want a log so I can undo things, or it could even be that I just want to record the changes for monitoring/auditing purposes and not replay them at all.

The core of all these is that I want to record all changes that are made to the whole KDB over a certain period of time and get a detailed log of every change. In that sense, kdb record could also be compared to git. Every time the postcommit phase of a kdbSet is executed, something similar to a new git commit will be created.


(*) could be a sequence of kdb commands, but could also be caused by something else

Note: The creation of a single diff between start and end of recording, is bonus to me. IMO it would be easier to export at the start and end of the modifications and do a diff between those exports, than to record everything just to collapse it into a single diff.

@atmaxinger
Copy link
Contributor Author

@kodebach I would use it for iteratively prototyping configuration, similar to how we build Dockerfiles.

In this case I am not yet sure which changes to the configuration are necessary. During those iterations it is quite possible that multiple keys will be changed only to then be restored to their original value. When I'm finished I want to export those changes and apply them to other systems.

For me the log of how exactly the new state was reached is a byproduct, although a useful one. As you mentioned, it would enable undo functionality and audit logs. But from my point of view this is the bonus.

The thing is - even though both use cases are different, a huge part of the code and user interface (the actual recording and the session management) is exactly the same. We just have to provide two separate "export"/"import" formats/commands:

  • one for the replay functionality that contains all the changes in the order they were made,
  • one for the diff/merge functionality that just contains the actual state changes.

I really don't see a big problem with implementing both.

- When using an "append-only" format there is no need to create sequence numbers

Cons:
- Need to implement reliable saving ourself
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resolver plugin could still be reused, if one of the directories associated with an Elektra namespace can be used. Otherwise, we could just extra the parts that are concerned with atomic writes into a separate library that can be reused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. On the other hand we just need to copy to a temporary file, append something to it, and then use rename to replace the old file in the postcommit stage. Shouldn't be too hard to implement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's basically want resolver does (when we ignore the actual resolving part). In the resolver phase at the beginning of set it resolves the real path, but actually returns the path for a temporary file. Then in commit it does a rename to actually apply the changes, assuming no other changes where applied in the meantime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be too hard to implement.

Looking forward to see fixes in the resolver plugin then #1470 😉 It is actually quite hard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resolver is quite generic that is a part of what makes it so hard. If the use case for recording is more limited or uses a different API (e.g. an append that atomically reads and updates the file), it might be easier. But I agree, it is probably not as simple as just using rename.

@kodebach
Copy link
Member

I would use it for iteratively prototyping configuration, similar to how we build Dockerfiles.

In this case I am not yet sure which changes to the configuration are necessary. During those iterations it is quite possible that multiple keys will be changed only to then be restored to their original value. When I'm finished I want to export those changes and apply them to other systems.

Iteratively building a config is already possible (as far as applications support it) with kdb set. So AFAICT the only piece missing here is a kind of "tag & diff" solution that lets tag the initial state and then create a diff a the end.

But, if you just want to start with an empty config and then try different options by setting and unsetting values and in the end you want to have a config file that can be copied/applied to other machines, then you don't need any new features. You can already create a new mountpoint for your config, play around with kdb set/kdb rm/etc. and in the end, the file for your mountpoint is the config file you need. (*) You don't even need the tag/diff feature, because in this scenario you start with an empty config, so the diff is trivial.

For me the log of how exactly the new state was reached is a byproduct, although a useful one. As you mentioned, it would enable undo functionality and audit logs. But from my point of view this is the bonus.

If the diff is the only goal, I'm not sure recording everything in between is the best approach.

The thing is - even though both use cases are different, a huge part of the code and user interface (the actual recording and the session management) is exactly the same. We just have to provide two separate "export"/"import" formats/commands

That is true to some extent. But IMO as soon as you actually implement a record/replay feature, this becomes the main feature. I say this, because record/replay requires recording all changes, maintaining correct the order, specialized export formats, etc. And then you need to squash the separate change entries into a single diff, which is kinda complicated with lots of edge cases.

So yes, you can implement both features with shared code, but it is still more complicated than just doing one of them.

I really don't see a big problem with implementing both.

Yes, having both features would be nice and seems quite possible. But if you indeed implement both, I would suggest you focus on the record/replay feature first. It is the basis for everything and you can always replay the whole recording to achieve the final config. It's not as efficient as a single direct diff, but it should still work.


(*) More precisely you may still need to use the existing kdb export, if multiple mountpoints are involved.

Copy link
Contributor

@markus2330 markus2330 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review.


We need a way to store session recordings.
Sometimes it is required to run the `kdb` application with `sudo`, especially when changing system-wide configuration.
Session recordings should be on a per-user basis. Storing them in the `user:/` namespace is not compatible with `sudo`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first sentence is a requirement, the second sentence contradicts the first (see man page of sudo: "execute a command as another user").

- Can query recordings with existing tooling

Cons:
- Still needs workaround to start session (aka mount a session in `system:/`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean here that kdb record start would need to be run as root? If it needs to mount something by design, I do not see a "workaround".


Cons:
- Still needs workaround to start session (aka mount a session in `system:/`)
- Semantic mismatch: Elektra is built for configuration data; session recordings are not configuration data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we need details what "session recordings" are (link to description).

Cons:
- Still needs workaround to start session (aka mount a session in `system:/`)
- Semantic mismatch: Elektra is built for configuration data; session recordings are not configuration data
- Need to model session recordings to fit into Elektra data model (KeySet)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as the one before?

- When using an "append-only" format there is no need to create sequence numbers

Cons:
- Need to implement reliable saving ourself
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be too hard to implement.

Looking forward to see fixes in the resolver plugin then #1470 😉 It is actually quite hard.


The same principle applies to the creation and deletion of keys within a session. If you create a key and delete it in the same session, it will not be exported.

By default, the elektra dump format is used. The optional --format parameter can be used to specify which storage plugin should be used to format the output.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also look at how merging works 😉

The base doesn't contain user:/foo (assuming kdb record defines the base), ours doesn't contain user:/foo, so theirs wins without conflict: user:/foo will stay present with 456.

$ kdb set system:/hosts/ipv4/www.microsoft.com 4.4.4.4
$ kdb rm system:/hosts/ipv4/www.apple.com
$ kdb record stop
$ kdb record changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do fixes or create separate PRs, it is getting confusing here. I stop the review here.

@kodebach
Copy link
Member

Regarding the "might need a different plugin API" issue:

I couldn't find an issue for it (if one exists), but we did want to move from the current Plugin * elektraPluginSymbol(void) API to a KeySet * elektraPluginContract(void) API. So we would export a function that returns the contract directly instead of one that returns a struct containing some predefined function pointers. That would allow directly exporting arbitrary functions.

This could be helpful here. Moving everything over to the new API would be a quite a bit of effort, but as long as the new API uses a different name for the exported function, the fallback should be seamless (just try the new one first, then the old one).

We could even use the hierarchy of the KeySet so e.g.

# these are for the current plugins
exports/kdb/open
exports/kdb/get
exports/kdb/set
exports/kdb/commit
exports/kdb/error
exports/kdb/close

# these could be for record plugins
exports/record/start
exports/record/append
exports/record/stop
exports/record/replay
exports/record/export
exports/record/import

Details should probably be discussed elsewhere, but I just wanted to mention that this is probably the way to go, if you need a different plugin API.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants