-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data transformer update #2182
Comments
|
In Stockholm meeting, we discussed about adding more manipulation functionality to the data transformer. First, these issues are linked:
First, we should enable all different use cases for data transformer:
Then there is the manipulation chain:
The set selection should create sets of "entities" with their parameters that can then be used in the math operations that will be done in the next step. The most common selection is just to take all the entities of a class with selected parameters. There should be a possibility to filter entities (using regex). However, there can also be a need to make filtering using simple set operations like intersection, union, difference, and symmetric difference (typically this filtering would be done to entities from the same class, but using different regex filters for each selection of entities).
A more complicated selection is to link entities from two classes using a third multidimensional class to establish the link (entities from A and B based on C that is actually a A__B relationship -- the resulting set gets its name from A__B and the parameters from all A, B and C can be available for the math operations). This could also be extended to more than 2 source classes. This could also be done without C if it is just assumed that the operation is a cartesian product of A and B resulting in A__B where A1__B1 gets parameters both from A1 and from B1 (to be used in math operations).
Similarly, there could be splitting operations: C, which is a subset of A, is used to pick elements from A__B. (Lot of this can be done with filters, but this formalized splitting would allow hand-made selections.
A third category of selection would include aggregation: A__B is used to create set of Bs where all As are aggregated. For any parameters of A there should then be an aggregation function: sum, average, count, max, min, median, first, last and maybe more.
A fourth category of selection is not really selection but reordering, adding and removing dimensions from the entity. And it might actually better to be considered as a preliminary step in the set selection. Cartesian product is a form of adding dimensions. A new dimension could also be pulled from multi-dimensional parameter values. Similarly, a dimension could be pushed inside parameter by adding that dimension there.
Possibly helpful clarifications:
Math operations
As a result of the set selections, there should be several sets of entities and their associated parameter data. The math operations would not take place between entity sets (because they can have different entities and math operations would not be defined in those cases). So, each set should have all the parameters needed for the math operation stage.
Math operations should be presented by a formula based on parameters. In addition to regular math operations (add, multiply, etc) there could be things like sin, cos, exp, log, ln, mod.
There is a further complication when the parameter data is multi-dimensional. The math formula should be able to handle that. If the data has always a particular dimensionality, single formula should be able to do it (e.g. [annual_demand] x [hourly_demand(time)] where time is a dimension name.
naming and renaming
This would be for giving a name for new dimensions and parameters created or renaming existing dimensions and parameters. Maybe should be already in the step of set selection ("SELECT * FROM UNIT AS PLANT")
The text was updated successfully, but these errors were encountered: