-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a generic convert function strategy #163
Conversation
…pi-dlite into convert-function-strategy
…pi-dlite into convert-function-strategy
Codecov ReportPatch coverage:
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the GitHub App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## master #163 +/- ##
===========================================
+ Coverage 68.13% 87.35% +19.21%
===========================================
Files 15 15
Lines 408 419 +11
===========================================
+ Hits 278 366 +88
+ Misses 130 53 -77
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a hard time understanding the purpose of this converter strategy, and how to use it. What are the imput instances, what are the output instances?
Can you be more explicit in the documentation? I have not properly reviewed the code yet bacuase I am confused about what is supposed to do what here.
…pi-dlite into convert-function-strategy
@@ -119,6 +126,8 @@ jobs: | |||
strategy: | |||
fail-fast: false | |||
matrix: | |||
# There seems to be an issue with module search in Python 3.11 | |||
# python-version: ["3.9", "3.10", "3.11"] | |||
python-version: ["3.9", "3.10"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an issue for addressing the 3.11 problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added issue #168
@@ -135,7 +144,8 @@ jobs: | |||
run: | | |||
python -m pip install -U pip | |||
pip install -U setuptools wheel | |||
pip install -e .[dev] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are 'requirements.*' preferred over setup.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy of my answer to Francesca:
I splittet it into two lines, because I wanted to install the requirements with the update (-U) option, which doesn't make sense for development (-e .) installation.
But it might not be necessary. I tried to change a lot of things before the CI on GitHub finally went through. It could very well be the change in line 91-92 that did the work...
Returns: | ||
SessionUpdate instance. | ||
""" | ||
config = self.convert_config.configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember to update the config with the relevant fields from the session
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point. That is needed if we want a later filter to have an effect.
I was thinking about encouraging not specifying labels in the configuration. But fetching the label from the session could be useful. However, a single label
in the session will not do, since each input and output has an optional label, so we need to be more specific when specifying labels in the session.
Would it make sense to allow variable substitutions in the configuration of a partial pipeline, like
https://www.ntnu.edu/physmet/data#image_analyser:
function:
functionType: application/vnd.dlite-convert
configuration:
module_name: temdata.image_analyser
function_name: image_analyser
inputs:
- datamodel: http://onto-ns.com/meta/0.1/TEMImage
label: ${temimage}
outputs:
- datamodel: http://onto-ns.com/meta/0.1/PrecipitateStatistics
label: ${precipitate_statistics}
mapping:
mappingType: mappings
prefixes:
...
triples:
...
where ${temimage}
and ${precipitate_statistics}
are substituted from the session.
On the pros side, this would allow templated partial pipelines with improved re-usability.
On the conc side, it will be an extra layer of complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not completely sold on the idea of labels. Also, if we start creating declarative partial pipelines with variables, they become hard to exchange/share between instances. Why not refer to a resource directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a good point. I agree that a partial pipeline should document a single data source or sink and in general not contain variables. However, there are cases where fetching parameters from the configuration are useful. One case is a partial pipeline documenting a SQL database. The documentation of the database should be fix, while the query may vary each time we execute the pipeline. Another possible use case is to specify the label a parser should use when storing an newly created instance into the collection and correspondingly, the label a generator should use when fetching an instance from the collection. In this case the labels may be variables when documenting the partial pipelines, but must be assigned and internally consistent before executing the full pipeline.
While variables is an easy and flexible way to assign consistent labels across partial pipelines, it may also open a can of worms of potential misuse.
Furthermore, in the common case that we only have one instance of a given entity in the collection, we don't need labels, since we can refer to the instance by specifying the entity in the configuration.
So if we solve the issue of assigning the labels in strategies that may refer to multiple instances (like the convert strategy) without variables, it might be a good idea to avoid variables in the partial pipelines stored in the knowledge base.
However, for populate the knowledge base with partial pipelines of a set of similar data sources, I think that templates with variables would be very useful. In this case, all substitutions should be done before storing into the knowledge base. Such a template utility may in this case live outside oteapi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not convinced we need labels or variables in the declarative pipelines, but this might be a discussion for the strategy meeting. Otherwise, merge as you wish
Returns: | ||
SessionUpdate instance. | ||
""" | ||
config = self.convert_config.configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not completely sold on the idea of labels. Also, if we start creating declarative partial pipelines with variables, they become hard to exchange/share between instances. Why not refer to a resource directly?
# Description Example with OTEAPI and OTELib using TEM data. This example currently depends on a set of other PRs: * #633 (already merged into this branch) * EMMC-ASBL/oteapi-core#318 * EMMC-ASBL/tripper#129 * EMMC-ASBL/oteapi-dlite#163 ## Type of change - [ ] Bug fix & code cleanup - [ ] New feature - [x] Documentation update - [ ] Test update ## Checklist for the reviewer This checklist should be used as a help for the reviewer. - [ ] Is the change limited to one issue? - [ ] Does this PR close the issue? - [ ] Is the code easy to read and understand? - [ ] Do all new feature have an accompanying new test? - [ ] Has the documentation been updated as necessary?
Description:
Added a generic convert function strategy.
It calls a Python function to convert zero or more input instances to zero or more output instances.
The module containing the Python function must exists in your PYTHONPATH.
Type of change:
Checklist for the reviewer:
This checklist should be used as a help for the reviewer.