Skip to content
This repository has been archived by the owner on Oct 15, 2024. It is now read-only.

General validation/normalization/conversion plugin #3497

Closed
kodebach opened this issue Sep 22, 2020 · 11 comments
Closed

General validation/normalization/conversion plugin #3497

kodebach opened this issue Sep 22, 2020 · 11 comments

Comments

@kodebach
Copy link
Member

This was just an idea I had, and its not very concrete, but I'll post it anyway (@markus2330 maybe this is a topic for a thesis)

The idea of Elektra's plugins is very much the Unix philosophy of KISS. A plugin should just do a single job. For example the ipaddr plugin checks that a key is a valid IP address, the macaddr plugin does the same for MAC addresses and rgbcolor deals with colors. All of these plugins will share some basic code.

Some plugins also normalize values into a standard representation, so that the config file can be more flexible without application code (e.g. convert HTML colors like #aabbcc into unsigned integers). This is generally a lot more complicated than simple validation. And again all of these plugin will share some code.

My proposal now is to create a generic validation and normalization plugin that deals with all the general logic and delegates to other specialized plugins for the actual validation and normalization of key values.

The get and set functions of specialized plugins would be simple NOPs, since the generic plugin will call some other functions the plugin exports.

We could also extend the generic plugin to allow more general conversion plugins (which don't convert to a normalized form directly). For example dns plugin, that resolves DNS names into IP addresses, which are then validated by the ipaddr plugin. The dns plugin would simply tell the generic plugin that it outputs IP addresses and the ipaddr plugin indicates that it outputs a normalized value. Of course that makes the plugin very complex, but it also makes it very powerful.


NOTE: this issue should probably done post-1.0. Feel free to close it an mark the issue appropriately.

@mpranj
Copy link
Member

mpranj commented Sep 22, 2020

The get and set functions of specialized plugins would be simple NOPs, since the generic plugin will call some other functions the plugin exports.

What is the benefit of implementing this as a (classical) Elektra plugin at all? If the plugin does nothing in get or set, maybe it can be a library or a different kind of plugin interface?

EDIT: What I am thinking is maybe we do not need to limit it with the current plugin API.

@markus2330
Copy link
Contributor

It is a good idea and also good grounds for a Master thesis. So if you you are sure this should be your topic, go ahead.

From Elektra's perspective, however, I think there are more urgent topics. Until now, nobody complained that the validation or transformation capabilities of Elektra are not good enough. From the 1.0 goals (https://github.com/ElektraInitiative/libelektra/milestone/12) the default behavior of Elektra #690 is the single biggest untackled problem (together with the cleanup, which basically goes hand in hand). And this is also exactly what you found again #2724 and again #2738 and again #2760 to be annoying. So I encourage you to tackle this bigger problem. From the written text it is probably not so much difference, it is both investigations of programming languages/database interfaces with questions like plugins vs. libraries. But #690 is more courageous and practical. And you are so talented that it would be a waste if you produce something which is a nice concept but never gets actually realized.

this issue should probably done post-1.0

Whatever you do, it should be definitely within 1.0, any improvement done now will be very valuable to make Elektra 1.0 a success.

@kodebach
Copy link
Member Author

What is the benefit of implementing this as a (classical) Elektra plugin at all?

We can use the existing mounting and module loading system. Including the tools library and kdb.

Also, all plugins would still use the same API. If we introduce a new API for this, we need to either rewrite the core, such that it can deal with multiple different plugin APIs or duplicate some of the plugin loading into the generic plugin.

The global keyset could (maybe) be used to indicate to the generic plugin, which plugins should be called, i.e. a validation plugin would put some key into the global keyset to tell the generic plugin when it should be called.

If the plugin does nothing in get or set, maybe it can be a library or a different kind of plugin interface?

EDIT: What I am thinking is maybe we do not need to limit it with the current plugin API.

The interface between the generic and the other plugins would essentially be a completely new and separate API. elektra*Open and elektra*Get and elektra*Set would just exist so that the existing the current elektraPluginOpen can be used.

So if you you are sure this should be your topic, go ahead.

That was just a general, remark. I'm not planning on starting my thesis for at least another semester or two (not enough time right now).

Whatever you do, it should be definitely within 1.0

This very surprising to me, normally you are in favour of releasing 1.0 in the sooner rather than later. However, this proposal is kind of irrelevant to the 1.0 release, since it does not touch the core and would not introduce any breaking changes.


If we do indeed want to change the Plugin API, I would suggest extracting the system/elektra/modules contract part from the elektra*Get function into e.g. elektra*Contract. Having this weird if at the top of every plugins get function can be quite confusing to newcomers.

@markus2330
Copy link
Contributor

That was just a general, remark. I'm not planning on starting my thesis for at least another semester or two (not enough time right now).

Ok, then it is hopefully post-1.0 😉

This very surprising to me, normally you are in favour of releasing 1.0 in the sooner rather than later.

I did not know when you want to start.

However, this proposal is kind of irrelevant to the 1.0 release, since it does not touch the core and would not introduce any breaking changes.

It introduces a big breaking change for the overall picture: Some plugins would become outdated, get replaced and so on. And probably you will find out, that some plugin API changes actually make sense for every plugin, and not only for generic plugins.

@kodebach
Copy link
Member Author

kodebach commented Oct 6, 2020

The issues between toml and type discussed in #3491 lead me to the conclusion that it would probably be best to further subdivide the postgetstorage position. Whether we should do it through a generic postgestorage plugin as suggested here or explicitly in the core/backend plugin, I'm not sure.

My basic conclusion is that the full processes for kdbGet and kdbSet should look like this:

  • kdbGet:
    • Pre-processing on a file level (e.g. decryption of whole files, line-ending conversion etc.)
    • Load storage file
    • Decoding of values: e.g. base64 decoding, key-level decryption, ..
    • Generation of new data: Any plugin that creates new keys, metakeys or assigns values to previously empty keys e.g. spec
    • Normalization: Plugins that convert values into a standardized form. New new keys are generated, only values are modified. Metadata may also be generated or modified, but those metakeys are not meant as input to Generation or other Normalization plugins.
    • Validation: Plugins that only do validation, but do not modify the KeySet in any way.
  • kdbSet:
    • Generation of new data
    • Normalization
    • Validation
    • Encoding (the kdbSet step of Decoding plugins)
    • Store storage file
    • Post-processing on a file level

As you can see the kdbSet procedure is not the exact reverse of kdbGet, but also not the same. This is the main problem. Generation, Normalization and Validation must always happen in this order, but Decoding must happen immediately after Loading and Encoding just before Storing.

Generation must be a separate step, so that at the start of Normalization, we know about all the keys. This is needed for plugins like reference that consider the relation between multiple keys.
Normalization must also be separate from Validation. (Yes that means splitting up type into two plugins or a plugin with two phases.) The separation enforces that e.g. range always runs after e.g. hexnumber and therefore only has to deal with standardized decimal integers.

This also assumes, that Normalization plugins always convert directly into the standardized form. It would not be possible to use e.g. rgbcolor to convert red into 0xFFFF0000 and then hexnumber to convert that into a decimal integer. This wouldn't work, because we would need proper dependencies and topological sorting between the plugins to ensure the correct order. This is what was suggested in the original issue for ipaddr and dns.

Global plugins should probably only be allowed between Decoding and Generation (for generating new data) and after the Validation step (to do more validation). Anywhere else probably doesn't make a lot of sense (apart from cache and resolver).

While looking at the current plugins, I also found a surprising amount of logging/tracing plugins. These are kind of separate from the rest and should probably be called before and after every position, but not be allowed to make any modifications.


PS. Like the rest of this issue, this is a long-term proposal and not meant to be implemented immediately. Although this part might be more urgent then the generic plugin.

@markus2330
Copy link
Contributor

be best to further subdivide the postgetstorage position.

Good observation.

Normalization must also be separate from Validation.

Yes. The separation in the type plugin might be a short-time goal.

While looking at the current plugins, I also found a surprising amount of logging/tracing plugins. These are kind of separate from the rest and should probably be called before and after every position

They were already written before global plugins exist. I do not know if anyone still uses them. I would not put effort into this now.

@kodebach
Copy link
Member Author

I do not know if anyone still uses them.

Then we should maybe remove them.

I would not put effort into this now.

If we do change the plugin positions, we should at least take into account the possibility of logging plugins.

@markus2330 markus2330 mentioned this issue Oct 19, 2020
16 tasks
@markus2330
Copy link
Contributor

Then we should maybe remove them.

Which ones? Maybe counter and logchange will not be missed. But e.g. timeofday or tracer seem to be quite harmless to me and they are used as good examples at a few places.

If we do change the plugin positions, we should at least take into account the possibility of logging plugins.

The positioning accounted for logging plugins. E.g. src/plugins/timeofday/README.md uses positions suitable to benchmark standalone storage plugins.

@kodebach
Copy link
Member Author

But e.g. timeofday or tracer seem to be quite harmless to me and they are used as good examples at a few places.

Then are not unused. You said "I do not know if anyone still uses them", so I thought there was something we could get rid of.

The positioning accounted for logging plugins.

I meant, in a redesign of positions, we should think about logging plugins. A generic validation plugin for example would have to know about logging plugins, so it can call them when appropriate.

@stale
Copy link

stale bot commented Oct 20, 2021

I mark this issue stale as it did not have any activity for one year. I'll close it in two weeks if no further activity occurs. If you want it to be alive again, ping the issue by writing a message here or create a new issue with the remainder of this issue.
Thank you for your contributions 💖

@stale stale bot added the stale label Oct 20, 2021
@stale
Copy link

stale bot commented Nov 4, 2021

I closed this issue now because it has been inactive for more than one year. If I closed it by mistake, please do not hesitate to reopen it or create a new issue with the remainder of this issue.
Thank you for your contributions 💖

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants