Skip to content
This repository was archived by the owner on Feb 16, 2025. It is now read-only.

[new-backend] Various open questions #4407

Closed
kodebach opened this issue Jul 13, 2022 · 10 comments
Closed

[new-backend] Various open questions #4407

kodebach opened this issue Jul 13, 2022 · 10 comments

Comments

@kodebach
Copy link
Member

kodebach commented Jul 13, 2022

This is a summary of open questions from the new-backend branch:

modules vs global

Should we merge the KeySet * modules and KeySet * global within struct _KDB?

They already use separate hierarchies, so there should be no overlap.

elektraModulesInit error

Wouldn't it be better, if elektraModulesInit set an error directly, instead of letting the calling function define the error?

static KDB * kdbNew (Key * errorKey)
{
KDB * handle = elektraCalloc (sizeof (struct _KDB));
handle->modules = ksNew (0, KS_END);
if (elektraModulesInit (handle->modules, errorKey) == -1)
{
// TODO (kodebach) [Q]: shouldn't we let elektraModulesInit set this error?
ELEKTRA_SET_INSTALLATION_ERROR (
errorKey, "Method 'elektraModulesInit' returned with -1. See other warning or error messages for concrete details");
ksDel (handle->modules);
elektraFree (handle);
return NULL;
}
handle->global =
ksNew (1, keyNew ("system:/elektra/kdb", KEY_BINARY, KEY_SIZE, sizeof (handle), KEY_VALUE, &handle, KEY_END), KS_END);
handle->backends = ksNew (0, KS_END);
return handle;
}

bootstrap resolver

The bootstrap backend which contains system:/elektra is currently loaded with default resolver and storage plugin. These could be overridden after installation by changing symlinks.

IMO we should at least create separate CMake based #defines for KDB_BOOTSTRAP_STORAGE and KDB_BOOTSTRAP_RESOLVER. We could also go one step further and hard-code the absolute path of the bootstrap storage file (via a CMake variable).

The goal here is to reduce the "moving parts" of the bootstrap sequence, because if the bootstrap works Elektra is essentially unusable and you need to manually correct/reset files.

Once that is done, we should probably go a step further. Instead of just defining KDB_BOOTSTRAP_STORAGE and KDB_BOOTSTRAP_RESOLVER, we should define the actual backend plugin used as KDB_BOOTSTRAP_BACKEND. We we'd either have to specify the config for the plugin as well, or IMO even better would be, if the plugin doesn't have a config. All it's dependencies are defined at compile-time and it just does its thing.

The CMake defines for KDB_DEFAULT_* are always the same AFAIK and we use symlinks to change the defaults. We can keep that and get rid of the CMake defines, or we can use a fully compile-time defined setup and prohibit symlinks. Either way,I think bootstrap and default backend should be separate.

What do you think?

split resolver

Should we split the resolver into the actual resolving parts and the parts needed for atomic/safe read-write?

The atomic read-write parts would form a separate library that could also be used in other parts of Elektra, or by other resolver implementations.

errnosave

Some of the kdb* functions attempt to save and restore errno. Is this really needed?

initialParent in kdbGet/kdbSet

Do we need the initialParent copy of parentKey in kdbGet/kdbSet?

kdbGet merge/split (step 12/15)

In the current kdbGet implementation, the data from all backends is merged in step 12 and then split again in step 15. That is after the main storage phase, and the poststorage phase of spec:/ backends were executed, we merge all backends so that we can execute gopts and spec. Then we need to split everything up again for the poststorage phase of non-spec:/ backends.

Do we really need to do this?

The answer here is probably that we really need these merge/split steps. But it is a kinda annoying thing and probably expensive (haven't tested it, but doing lots of iteration, comparing of key names and ksAppendKey, is always kinda expensive).

KeySet * global as argument to elektraPluginOpen

Is there a good reason, why elektraPluginOpen doesn't take the global keyset as an argument?

Currently after every elektraPluginOpen call in libelektra-kdb we need to do a plugin->global = kdb->global.

backend absolute path read-write

The backend plugin currently supports read-only access to an absolute path without any resolver being defined. Should we also support writing to such files?

This would require extracting the atomic/safe read-write code from resolver into a library.

kdb file /foo

Should kdb file work with cascading keys?

IMO there is no reason why this should be supported, but there is currently no check against it.

The only use case I can think of is, if you want to know what file a certain key will come from when an application reads it. But for that a separate kdb lookup /foo that returns e.g. user:/foo would make more sense.

specific functions for hooks/global plugins

Every "global plugins" should fulfil certain functionality specific to is hook (which is why I'll call them hook plugins here).

IMO instead of reusing the get/set/etc. functions, these hook plugins should export extra functions for the hooks points at which they are called. For spec this could be:

spec/exports/hook/spec/copymeta       # called in kdbGet after spec:/ keys have been processed and at the start of kdbSet to process newly added keys
spec/exports/hook/spec/removemeta   # called in kdbSet before actual storage happens 

Then it would be clear, when a plugin can only be used as a hook plugin and when it is dual purpose. For example, since dbus can run globally or just for a single mountpoint, it would export get/set functions and the specific hooks for notification plugins.

How will spec work?

For spec to work properly, this conditions must be fulfilled:

  1. During kdbGet
    1. Copy default values from meta:/default/ on spec:/[name] to default:/[name]
    2. Copy all metadata from spec:/[name] to all other namespaces (in particular proc:/, dir:/, user:/, system:/ and default:/)
  2. During kdbSet before prestorage (i.e. before validation/conversion happens):
    1. The same as kdbGet
  3. During kdbSet after prestorage, but before storage (i.e. after validation/conversion happens, but before data is serialised):
    1. Remove all unmodified meta:/ keys from all namespace other than spec:/ (strictly speaking only dir:/, user:/, system:/, must be handled, as nothing else is persisted)

1.i. and 1.ii. are pretty simple and are already done (apart from proper handling of wildcards, but that's another story #3954). Therefore 2.i. is also pretty simple, as long as spec is called properly (still open, but not an issue).

3.i. however is pretty complex. We need to reliably detect, whether or not a metakey has changed. One approach to this would be relying on pointer/address equality. We already share metakeys between spec:/ and non-spec:/ keys to save memory. We could simply remove all meta:/ keys whose address is the same as the one on spec:/. First of all this requires a Copy-On-Write enforcement for meta:/ keys, but that's something we want anyways (#3610). However, this leaves one open edge case. What if meta:/foo on spec:/[name] was removed? Then we have no way of knowing, whether meta:/foo on e.g. user:/[name] was added by the spec plugin or by the user. A way around that would be to keep an additional copy of all meta:/ keys inside the spec plugin. Then we always know which Key * to delete. But that is a bit of a memory waste, because there is a better solution.

We could simply introduce a metadefault:/ namespace (Note: the default:/ namespace might be reused, if we implement #3598). Whenever you ksLookup a meta:/ key, we not only check for meta:/ but also for metadefault:/ keys. The spec plugin would only add metadefault:/ keys and wouldn't have to do anything, since even libelektra-kdb itself could remove those before the storage phase in kdbSet (similar to default:/ and proc:/).

Performance questions

backendsFindParent

backendsFindParent does a lookup and removes the basename repeatedly until a backend is found. Could this be a performance bottleneck? Is there a better way to do this?

My purely theoretical analysis is that it should be fine: With m = number of parts in key & n = size of backends the new runtime should be around O(m), if the KeySet with the backends uses the hashmap and O(m*log(n)) otherwise. The old trie solution was O(k) where k is the length of the keyname we are looking for. The expectation is that k is bigger than m*log(n) in most cases, since log(n) will be generally small and m is necessarily smaller than k (generally m is much smaller than k, since even an average name part size of 2 would mean k = 2*m).

KDB phase passed as integer

Currently, we pass the phase a backend should execute from libelektra-kdb to a backend plugin as a string ("prestorage", "storage", etc.). Should we switch this to an integer?

Using an integer would allow using an enum to define the constants and also using a switch statement (instead of repeated if(strcmp() == 0) in the plugin).

Note: the integer must be passed as a binary key, otherwise we'd just pay for int->string->int conversion instead of strcmp.

@markus2330
Copy link
Contributor

Thank you for summarizing everything! ❤️

Should we merge the KeySet * modules and KeySet * global within struct _KDB?

I think keeping them separated makes perfect sense. What would be the advantage of merging them?

Wouldn't it be better, if elektraModulesInit set an error directly, instead of letting the calling function define the error?

The approach that other warning actually contains the reason for the error is ugly. Would be glad if we can get rid of this.

IMO we should at least create separate CMake based

I am strongly against putting even more complexities into CMake. Defining this stuff in CMake failed totally. Nobody except devs&build server is actually using this, so far everyone wanted to use upstream packages as-is.

A huge improvement in this area would be to have double-bootstrapping. This would also fix @Kochise's problems #4349.

Should we split the resolver into the actual resolving parts and the parts needed for atomic/safe read-write?

If you see a way to do so protocols in between these parts? The resolver definitely could need some love, e.g., #3692 #1470 are long out-standing issues. Having smaller&simpler pieces would be a big help.

Some of the kdb* functions attempt to save and restore errno. Is this really needed?

If it is inconsistent it is useless. I think at some point we correctly did it (always restoring every errno we changed).

I am in favor of keeping it: it doesn't have disadvantages and the less side-effects Elektra has, the better.

The new-backend branch is now a good time to recheck if it is done correctly.

Do we need the initialParent copy of parentKey in kdbGet/kdbSet?

IIRC this was an attempt to not destroy/leak meta-data.

merge/split Do we really need to do this?

In general split is needed to correctly determine which backends have something to do. It is possible that Thomas W. added some additional complications for global hooks. These additional complications most likely can be removed now with the simplifications in the hooks. (I do not think we need any for-each hook at all).

Is there a good reason, why elektraPluginOpen doesn't take the global keyset as an argument?

Yes: It already has two KeySets as arguments.

The backend plugin currently supports read-only access to an absolute path without any resolver being defined. Should we also support writing to such files?

Could be very useful for import/export. @atmaxinger what do you think?

Should kdb file work with cascading keys?
The only use case I can think of is, if you want to know what file a certain key will come from when an application reads it.

Is imho a good reason.

IMO instead of reusing the get/set/etc. functions, these hook plugins should export extra functions for the hooks points at which they are called.

Probably it makes sense for a few specialized plugins. I would prefer, however, if all can keep the same interface and the contract simply states how they are allowed to be mounted. It makes the tooling much easier.

Performance questions

Discussing performance without benchmarks/profiling usually doesn't make much sense. These questions are no exception.

But I can add a question for someone who actually makes benchmarks/profiling: newbackend introduced, iirc, some information which states in which position the backend currently is in. I wonder if this is really so cheap that it is worth having this feature.

@kodebach
Copy link
Member Author

What would be the advantage of merging [KeySet * global and KeySet * modules]?

An immediate benefit is, reducing the size of struct _KDB and thereby memory usage. It's not much but it's something.

The other thing is that we don't need to reload the exported symbols, when a (non-backend) plugin wants to call another plugin. The symbols would be in the global keyset, which is accessible to plugins, unlike modules.

I am strongly against putting even more complexities into CMake

It doesn't even really have to be in CMake. The current #defines don't really do anything anyway, since use symlinks. But the main point was about splitting the default backend (for system:/, user:/, etc.) from bootstrap one(s). As a secondary point, I'd like to make bootstrap more locked down and less configurable at runtime. Symlinks are fine, but the bootstrap backend(s) should be as failure-proof as possible. A failure during bootstrap is very bad, and very hard to fix for normal users.

If you see a way to do so protocols in between these parts [of resolver]?

The easiest way would be to create some extra libs. These could then just be called by resolver and whatever else needs them.

The new-backend branch is now a good time to recheck if [save/restore errno] is done correctly.

I think this would only make sense, if we do and guarantee it across the entire public API of Elektra. That would be quite a big task, because we'd have to check everything that calls external functions.

It is also a bit weird (but less so), if we don't enforce it on a plugin level.

IIRC [initialParent] was an attempt to not destroy/leak meta-data.

I don't think so AFAICT metadata is the only thing that never gets restored from initialParent to parentKey (because of meta:/error et al.). I rather think it's so that parentKey remains unchanged (except for meta:/error et al.) in an error case. If that's the case, there is probably a better solution.

In general split is needed to correctly determine which backends have something to do [...]

I think there was a misunderstanding.... The split data structure doesn't exist in the new-backend branch anymore. It was replaced by a KeySet * and the backendsFindParent function (together with some others) for correct backend lookup.

The question was about a technical implementation detail in kdbGet. I don't really expect anyone to have answer here, unless they study the code really hard. In any case, at the it doesn't matter much. The current solution works. The question was just, if there might be a better way.

Yes: [elektraPluginOpen] already has two KeySets as arguments.

And? What's the problem with that?

I would prefer, however, if all can keep the same interface and the contract simply states how they are allowed to be mounted. It makes the tooling much easier.

What tooling do you have in mind?

I can't think of anything that would be involved here, other than the kdb command for enabling/disabling a hook plugin. But that wouldn't be affected, because it would just set a key somewhere in system:/elektra (to the name of the plugin, 1/0, etc.). The tooling shouldn't care how a hook plugin needs to be called, since it will only every be called by kdbGet/kdbSet directly. That's the whole point of these hooks, they are called in very specific ways.

Discussing performance without benchmarks/profiling usually doesn't make much sense. These questions are no exception.

I know, that's why I put them into a separate section. It's basically a "we don't need to talk about this right now, but we shouldn't forget about it" section.

newbackend introduced, iirc, some information which states in which position the backend currently is in. I wonder if this is really so cheap that it is worth having this feature.

That's exactly what "KDB phase passed as integer" is about. Right now we set a string in the global keyset (every time before a backend plugin is called) and there is a new function in the plugin API for retrieving this string. Additionally there are some #defines that define the string values.

Passing the values is pretty cheap, the global keyset already exists and if setting one key makes such a big difference we'd have much bigger problems. However, replacing the string with an integer could in theory speed up the plugin side (switch vs if strcmp) and IMO it makes the plugin code easier to read/write (switch is clearer).

Note: phase is the new term for position; a phase is defined by libelektra-kdb, positions are defined by the backend plugin, other backend plugins may use different terms and setups

@markus2330
Copy link
Contributor

merging [KeySet * global and KeySet * modules

The first question needs to be if plugins should have access to modules. If they should, then the proposal makes sense. Let us see what @atmaxinger does in his plugins.

I'd like to make bootstrap more locked down and less configurable at runtime.

I agree.

resolver: The easiest way would be to create some extra libs.

Very good idea. This would also save disc space, as currently every resolver includes this code separately. This is, however, actually totally unrelated to new-backend, isn't it?

I think this would only make sense, if we do and guarantee it across the entire public API of Elektra. That would be quite a big task, because we'd have to check everything that calls external functions.

This task was already done a few times. The question is how we can ensure that it stays correct in future, i.e., add some regression tests. For kdb* we could have a plugin that modifies errno and test cases that check if errno is unchanged.

It is also a bit weird (but less so), if we don't enforce it on a plugin level.

This would be even harder to test? The current approach (to restore at user-interface) seems simpler to me.

And? What's the problem with that?

It is API smell. Users can easily confuse which argument is what.

What tooling do you have in mind?

Tools like kdb plugin-check.

I know, that's why I put them into a separate section. It's basically a "we don't need to talk about this right now, but we shouldn't forget about it" section.

I think issues with - [ ] scale quite good. And as discussed we should collect all issues in the "new-backend" project (former lcdproc project).

@kodebach
Copy link
Member Author

The first question needs to be if plugins should have access to modules.

Yeah, I'm still not quite sure about that one. Probably, we should hide it somewhere and change the APIs for opening/calling other plugins, so that it internally goes through libelektra-kdb to avoid duplication, but still limit access.

This is, however, actually totally unrelated to new-backend, isn't it?

Yes, AFAICT.

The question is how we can ensure that it stays correct in future, i.e., add some regression tests. For kdb* we could have a plugin that modifies errno and test cases that check if errno is unchanged.

It's really hard to test this in general. Things like elektraGetOpts could call any number of functions that may change errno.

It is API smell. Users can easily confuse which argument is what.

And the current API isn't? There is a field in struct _Plugin that can be accessed by the public API, but you can never give it a value. When you only use the public API the value will always be NULL.

Actually I think this could be combined with the modules thing. We use one function in libelektra-kdb that receives all the arguments (including global), but which isn't public. The public API only has a function elektraPluginOpen (Plugin * thisPlugin, const char * otherName, KeySet * config) which allows one plugin to open another. The modules and global data would be passed internally via thisPlugin.

Tools like kdb plugin-check.

AFAICT this only does a open, get, close sequence. Which works because get must be implemented for the contract. Once we have a separate function for the contract, get wouldn't be mandatory. The tool would still work. It would still check the minimal functionality, by opening the plugin, loading the contract, maybe doing some extra checks on the contract content and then closing.

I think issues with - [ ] scale quite good. And as discussed we should collect all issues in the "new-backend" project (former lcdproc project).

I agree. I'll create a separate one once we decide which of the open questions are actionable items that need something done. This is more of a discussion issue (maybe we should move to the Discussions section of GitHub?).

@kodebach

This comment was marked as resolved.

@markus2330
Copy link
Contributor

Things like elektraGetOpts could call any number of functions that may change errno.

You only need to restore errno on the exit points. Not being able to fulfill such small requirements is an indication that we should not export such symbols. E.g. in this specific case, I think we should only support gopts and have elektraGetOpts as private (static) lib. So many different ways to do the same things are also a nightmare to test and maintain. They also make the docu harder to write and decrease discoverability.

And the current API isn't?

Of course. If you see a way to simplify it, I am very open to suggestions.

which allows one plugin to open another

If we do it in a way so that there is no overlap with src/libs/invoke, I am open to it. Maybe it can be done by improving src/libs/invoke?

I would prefer, however, if all can keep the same interface and the contract simply states how they are allowed to be mounted. It makes the tooling much easier.

AFAICT [kdb plugin-check] only does a open, get, close sequence.

How much it currently does or not does not change the fact that a generic tool that can check any plugin is much easier to write compared to a tool that needs to know about X different types of plugins, which all have different interfaces.

I agree. I'll create a separate one once we decide which of the open questions are actionable items that need something done.

I think I already created all the needed issues. The other points seem to me to be implementation details. If at all, we might discuss it in a PR with a specific solution. Do you agree?

@kodebach
Copy link
Member Author

kodebach commented Aug 28, 2022

You only need to restore errno on the exit points.

I guess storing errno at the start of a function and restoring it at the end isn't that much work. But we need to 1) document clearly that we want to do this in Elektra (not just a decision, but in all relevant developer guides) 2) check for this in code reviews.

We also need to clarify, where we do want the errno preserving to happen. For code reviews, it would clearly be easier, if the errno saving must happen in all functions that call other functions that set errno. The last thing we want to happen is having save/restore errno could all over the place even in the most trivial functions that could never affect errno, just because somewhere we decided that all public APIs must preserve errno.

If you see a way to simplify it, I am very open to suggestions.

Adding the global keyset argument to the elektraPluginOpen was my suggestion to improve the API...

Maybe it can be done by improving src/libs/invoke?

Yes, this change would mostly be a replacement of the invoke lib. The idea is basically, that originally libelektra-kdb expected most plugins to be standalone, so the "load another plugin and call it" feature was a separate lib. But now with backend plugins, this is a more core feature and providing it directly through libelektra-kdb would make sense IMO.

a generic tool that can check any plugin is much easier to write

To me this again seems like the almost obsessive need to make everything in Elektra generic and fit every use case.

  1. It may be easier to write in theory, but the fact that it didn't happen yet, even though plugins right now are all the same (in this way), suggests it may not be that easy.
  2. A check tool that has (some) specific knowledge about the thing it's checking can do much more in-depth tests. For example, if I know a backend plugin is file-based I can do some checks based on that (is the file written to, what happens if the file is deleted between read/write, etc). But if I just know it's a backend plugin but it could be file-based, or database-based or maybe even some sort of web-protocol based remote thing, I can do basically nothing.
  3. There would still be fairly general groups of plugins that could undergo the same kinds of checks. But in this specific case of hook plugins, I think they are so different from everything else that reusing the same interface just causes more problems limitations then it solves.

@markus2330
Copy link
Contributor

I guess storing errno at the start of a function and restoring it at the end isn't that much work. But we need to 1) document clearly that we want to do this in Elektra (not just a decision, but in all relevant developer guides) 2) check for this in code reviews.

Exactly. Actually the goal is even more general: we do not want to have any side-effect. Only our data structures are allowed to be modified; no thread-local or global variable. I created #4453

this is a more core feature and providing it directly through libelektra-kdb would make sense IMO.

This is very debatable. Actually, most of the plugins don't need to load other plugins. But I agree that with backend=plugin it shifts a bit. It must be very clear, however, what problem the new API actually solves which the old ones (libinvoke and the modules) did not. To do something in this area would be a separate decision, though. I would prefer to do only minimal changes (or no changes), as there were no troubles with these parts at all.

To me this again seems like the almost obsessive need to make everything in Elektra generic and fit every use case.

I don't know what you are talking about. I was talking about making "implementing the tools" easier for us. E.g. imho kdb plugin-check is complicated enough (half of the code in src/libs/tools/src/plugin.cpp is about checking). Adding more plugin types would make it (much) worse.

but the fact that it didn't happen yet,

Also here I do not know what you are talking about. AFAIK kdb plugin-check is complete: it checks for all clauses in the contract and all symbols plugins provide. If there is something missing, please open a bug report.

@kodebach
Copy link
Member Author

I would prefer to do only minimal changes (or no changes), as there were no troubles with these parts at all.

The trouble IMO is that we're duplicating the helper data (like the modules keyset) without any need for it. This will also probably get worse, if we switch from the exported plugin struct with function pointers to directly using the contract keyset (this was an idea at some point).

half of the code in src/libs/tools/src/plugin.cpp is about checking

Seems like I missed that part of the code. The code in src/tools/kdb was pretty short, so I didn't think much about it. However, the code for void Plugin::check (vector<string> & warnings) is pretty repetitive. It's basically checking that 1) a certain set of keys is present 2) some key's values have a certain form and 3) the standard plugin functions are in the contract and the struct. I think this could be shortened quite a bit.

If we switch from the plugin struct to just the contract, the checks for the exported functions are not needed anymore, except for checking that the value can be cast to a function pointer. At that point using special functions for hook plugins would just mean adding a few entries to the list of "must contain function pointer" keys.

AFAIK kdb plugin-check is complete: it checks for all clauses in the contract and all symbols plugins provide.

As stated above, it seems I missed part of the code. However, there are definitely more checks that could be done. For example, you could declare the same plugin in needs and recommends or you could define some non-sensical placements like just commit (thereby breaking the resolver functionality).

@kodebach
Copy link
Member Author

kodebach commented Oct 8, 2022

Continued in #4521

@kodebach kodebach closed this as not planned Won't fix, can't repro, duplicate, stale Oct 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants