Added a generic convert function strategy #163

jesper-friis · 2023-08-29T14:09:21Z

Description:

Added a generic convert function strategy.

It calls a Python function to convert zero or more input instances to zero or more output instances.

The module containing the Python function must exists in your PYTHONPATH.

Type of change:

Bug fix.
New feature.
Documentation update.

Checklist for the reviewer:

This checklist should be used as a help for the reviewer.

Is the change limited to one issue?
Does this PR close the issue?
Is the code easy to read and understand, including clearly named variables?
Do all new feature have an accompanying new test?
Has the documentation been updated as necessary?

…pi-dlite into convert-function-strategy

codecov-commenter · 2023-08-29T16:18:18Z

Codecov Report

Patch coverage: 87.30% and project coverage change: +19.21% 🎉

Comparison is base (e10eaf2) 68.13% compared to head (58bfc73) 87.35%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the GitHub App Integration for your organization. Read more.

Additional details and impacted files

@@             Coverage Diff             @@
##           master     #163       +/-   ##
===========================================
+ Coverage   68.13%   87.35%   +19.21%     
===========================================
  Files          15       15               
  Lines         408      419       +11     
===========================================
+ Hits          278      366       +88     
+ Misses        130       53       -77

Flag	Coverage Δ
linux	`87.35% <87.30%> (+19.21%)`	⬆️
windows	`87.25% <87.30%> (+19.35%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
oteapi_dlite/utils/utils.py	`69.81% <0.00%> (ø)`
oteapi_dlite/strategies/generate.py	`78.43% <50.00%> (ø)`
oteapi_dlite/strategies/parse.py	`89.13% <71.42%> (+89.13%)`	⬆️
oteapi_dlite/strategies/convert.py	`92.15% <92.15%> (ø)`
oteapi_dlite/strategies/mapping.py	`100.00% <100.00%> (ø)`
oteapi_dlite/strategies/parse_excel.py	`89.85% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

.github/workflows/ci_tests.yml

requirements.txt

tests/strategies/paths.py

francescalb

I have a hard time understanding the purpose of this converter strategy, and how to use it. What are the imput instances, what are the output instances?

Can you be more explicit in the documentation? I have not properly reviewed the code yet bacuase I am confused about what is supposed to do what here.

…pi-dlite into convert-function-strategy

…decov

quaat · 2023-09-04T06:50:23Z

.github/workflows/ci_tests.yml

@@ -119,6 +126,8 @@ jobs:
    strategy:
      fail-fast: false
      matrix:
+        # There seems to be an issue with module search in Python 3.11
+        # python-version: ["3.9", "3.10", "3.11"]
        python-version: ["3.9", "3.10"]


Is there an issue for addressing the 3.11 problems?

Added issue #168

quaat · 2023-09-04T06:51:53Z

.github/workflows/ci_tests.yml

@@ -135,7 +144,8 @@ jobs:
      run: |
        python -m pip install -U pip
        pip install -U setuptools wheel
-        pip install -e .[dev]


why are 'requirements.*' preferred over setup.py?

Copy of my answer to Francesca:

I splittet it into two lines, because I wanted to install the requirements with the update (-U) option, which doesn't make sense for development (-e .) installation.

But it might not be necessary. I tried to change a lot of things before the CI on GitHub finally went through. It could very well be the change in line 91-92 that did the work...

quaat · 2023-09-04T06:54:54Z

oteapi_dlite/strategies/convert.py

+        Returns:
+            SessionUpdate instance.
+        """
+        config = self.convert_config.configuration


Remember to update the config with the relevant fields from the session

Yes, good point. That is needed if we want a later filter to have an effect.

I was thinking about encouraging not specifying labels in the configuration. But fetching the label from the session could be useful. However, a single label in the session will not do, since each input and output has an optional label, so we need to be more specific when specifying labels in the session.

Would it make sense to allow variable substitutions in the configuration of a partial pipeline, like

https://www.ntnu.edu/physmet/data#image_analyser: function: functionType: application/vnd.dlite-convert configuration: module_name: temdata.image_analyser function_name: image_analyser inputs: - datamodel: http://onto-ns.com/meta/0.1/TEMImage label: ${temimage} outputs: - datamodel: http://onto-ns.com/meta/0.1/PrecipitateStatistics label: ${precipitate_statistics} mapping: mappingType: mappings prefixes: ... triples: ...

where ${temimage} and ${precipitate_statistics} are substituted from the session.

On the pros side, this would allow templated partial pipelines with improved re-usability.

On the conc side, it will be an extra layer of complexity.

I am not completely sold on the idea of labels. Also, if we start creating declarative partial pipelines with variables, they become hard to exchange/share between instances. Why not refer to a resource directly?

You have a good point. I agree that a partial pipeline should document a single data source or sink and in general not contain variables. However, there are cases where fetching parameters from the configuration are useful. One case is a partial pipeline documenting a SQL database. The documentation of the database should be fix, while the query may vary each time we execute the pipeline. Another possible use case is to specify the label a parser should use when storing an newly created instance into the collection and correspondingly, the label a generator should use when fetching an instance from the collection. In this case the labels may be variables when documenting the partial pipelines, but must be assigned and internally consistent before executing the full pipeline.

While variables is an easy and flexible way to assign consistent labels across partial pipelines, it may also open a can of worms of potential misuse.

Furthermore, in the common case that we only have one instance of a given entity in the collection, we don't need labels, since we can refer to the instance by specifying the entity in the configuration.

So if we solve the issue of assigning the labels in strategies that may refer to multiple instances (like the convert strategy) without variables, it might be a good idea to avoid variables in the partial pipelines stored in the knowledge base.

However, for populate the knowledge base with partial pipelines of a set of similar data sources, I think that templates with variables would be very useful. In this case, all substitutions should be done before storing into the knowledge base. Such a template utility may in this case live outside oteapi.

quaat

I am not convinced we need labels or variables in the declarative pipelines, but this might be a discussion for the strategy meeting. Otherwise, merge as you wish

quaat · 2023-09-19T07:08:26Z

oteapi_dlite/strategies/convert.py

+        Returns:
+            SessionUpdate instance.
+        """
+        config = self.convert_config.configuration


I am not completely sold on the idea of labels. Also, if we start creating declarative partial pipelines with variables, they become hard to exchange/share between instances. Why not refer to a resource directly?

# Description Example with OTEAPI and OTELib using TEM data. This example currently depends on a set of other PRs: * #633 (already merged into this branch) * EMMC-ASBL/oteapi-core#318 * EMMC-ASBL/tripper#129 * EMMC-ASBL/oteapi-dlite#163 ## Type of change - [ ] Bug fix & code cleanup - [ ] New feature - [x] Documentation update - [ ] Test update ## Checklist for the reviewer This checklist should be used as a help for the reviewer. - [ ] Is the change limited to one issue? - [ ] Does this PR close the issue? - [ ] Is the code easy to read and understand? - [ ] Do all new feature have an accompanying new test? - [ ] Has the documentation been updated as necessary?

jesper-friis and others added 16 commits August 29, 2023 16:03

Added a generic convert function strategy

9bcb26c

Added missing file

15d4b3e

Merge branch 'master' into convert-function-strategy

dd82900

Added otelib and pyyaml to requirements_dev.txt

8cd02b6

Merge branch 'convert-function-strategy' of github.com:EMMC-ASBL/otea…

68870b9

…pi-dlite into convert-function-strategy

Moved requirements to pyproject.toml

9176ef6

Make sure that CI tests uses requirements_dev.txt

5e8aa9e

Merge branch 'convert-function-strategy' of github.com:EMMC-ASBL/otea…

721da44

…pi-dlite into convert-function-strategy

Reverted back to use requirements*.txt

2c9c3ce

Made sure that requirements.txt is installed in ci_tests

e768132

Try to require dlite-python<0.4

0021739

Added Python 3.11 to tests

01f3bf1

Do not test Python 3.11 for now...

521b87c

Support Windows paths in test_convert.py

6e2d71e

List installed packages in ci_test for easier debugging...

66e7de5

Try do require dlite-python<0.4

3441483

jesper-friis added 2 commits August 29, 2023 18:38

Removed unnessesary imports

fcab3fa

Ensure that files fetched from the datacache retain their suffix

2a589ac

jesper-friis requested a review from francescalb August 31, 2023 11:32

Fixed some docstrings.

fc5f24c