Add property for documentation of data transformation to sources #134
Labels
other: help wanted 🙋
Extra attention is needed
part: backend 🧱
Backend component
priority: low 🦥
Low priority
status: active 🚧
Work in progress
type: question ❓
Further information is requested
Description of the issue
We have currently no reasonable way of documenting transformation steps applied to the referred sources. So far what I have been doing is add my transformation scripts and software as a further source. I think what I am doing is not entirely useful as there is no way of associating the added scripts to its respective source
Some data sources are in formats that are well documented and standarized. For example tabular data and RDF graphs. These can be transformed used querying languages as SQL and SPARQL. The documentation of these transformations can be done using the languages themselves! And in case of non structured data like excel files, the documentation can be done by adding urls to the repositories transforming them, this repository can be a python script for example.
Ideas of solution
I propose adding a new property to the sources items, namely
transformations
or something similar that refers to the operations done to convert the original resource. The transformations should be an list of items with properties:path
,name
,title
,description
,query
/code
andresource
where each item should have either apath
or aquery
/code
property. When more than one items are provided is understood that the output of the first item is given to the second and the last item produces a resource in the current metadata, the latter should be referred using the name.I do not know how risky is to add SQL into an instance of the OEMetadata, that can be discussed, if its really a problem we can take the necessary precausions
Workflow checklist
The text was updated successfully, but these errors were encountered: