Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a generic convert function strategy #163

Merged
merged 23 commits into from
Sep 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
9bcb26c
Added a generic convert function strategy
jesper-friis Aug 29, 2023
15d4b3e
Added missing file
jesper-friis Aug 29, 2023
dd82900
Merge branch 'master' into convert-function-strategy
jesper-friis Aug 29, 2023
8cd02b6
Added otelib and pyyaml to requirements_dev.txt
jesper-friis Aug 29, 2023
68870b9
Merge branch 'convert-function-strategy' of github.com:EMMC-ASBL/otea…
jesper-friis Aug 29, 2023
9176ef6
Moved requirements to pyproject.toml
jesper-friis Aug 29, 2023
5e8aa9e
Make sure that CI tests uses requirements_dev.txt
jesper-friis Aug 29, 2023
721da44
Merge branch 'convert-function-strategy' of github.com:EMMC-ASBL/otea…
jesper-friis Aug 29, 2023
2c9c3ce
Reverted back to use requirements*.txt
jesper-friis Aug 29, 2023
e768132
Made sure that requirements.txt is installed in ci_tests
jesper-friis Aug 29, 2023
0021739
Try to require dlite-python<0.4
jesper-friis Aug 29, 2023
01f3bf1
Added Python 3.11 to tests
jesper-friis Aug 29, 2023
521b87c
Do not test Python 3.11 for now...
jesper-friis Aug 29, 2023
6e2d71e
Support Windows paths in test_convert.py
jesper-friis Aug 29, 2023
66e7de5
List installed packages in ci_test for easier debugging...
jesper-friis Aug 29, 2023
3441483
Try do require dlite-python<0.4
jesper-friis Aug 29, 2023
fcab3fa
Removed unnessesary imports
jesper-friis Aug 29, 2023
2a589ac
Ensure that files fetched from the datacache retain their suffix
jesper-friis Aug 29, 2023
fc5f24c
Fixed some docstrings.
jesper-friis Aug 31, 2023
e27e824
Allowed latest dlite-python
jesper-friis Sep 1, 2023
3ed520c
Merge branch 'convert-function-strategy' of github.com:EMMC-ASBL/otea…
jesper-friis Sep 1, 2023
62939d7
Exclude latest DLite-python for now because it leads to failure in co…
jesper-friis Sep 1, 2023
58bfc73
Added pypi_package configuration field to convert strategy.
jesper-friis Sep 3, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 14 additions & 3 deletions .github/workflows/ci_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,10 @@ jobs:
pip install -e .
pip install safety

- name: List installed packages
run: |
pip list
jesper-friis marked this conversation as resolved.
Show resolved Hide resolved

- name: Run pylint
run: pylint --rcfile=pyproject.toml --ignore-paths=tests/ --extension-pkg-whitelist='pydantic' *.py oteapi_dlite

Expand All @@ -84,6 +88,8 @@ jobs:
strategy:
fail-fast: false
matrix:
# There seems to be an issue with module search in Python 3.11
# python-version: ["3.9", "3.10", "3.11"]
python-version: ["3.9", "3.10"]

steps:
Expand All @@ -100,7 +106,8 @@ jobs:
run: |
python -m pip install -U pip
pip install -U setuptools wheel
pip install -e .[dev]
pip install -U -r requirements.txt -r requirements_dev.txt
pip install -e .
francescalb marked this conversation as resolved.
Show resolved Hide resolved

- name: Test with pytest
run: pytest -vvv --cov-report=xml
Expand All @@ -119,6 +126,8 @@ jobs:
strategy:
fail-fast: false
matrix:
# There seems to be an issue with module search in Python 3.11
# python-version: ["3.9", "3.10", "3.11"]
python-version: ["3.9", "3.10"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an issue for addressing the 3.11 problems?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added issue #168


steps:
Expand All @@ -135,7 +144,8 @@ jobs:
run: |
python -m pip install -U pip
pip install -U setuptools wheel
pip install -e .[dev]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are 'requirements.*' preferred over setup.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy of my answer to Francesca:

I splittet it into two lines, because I wanted to install the requirements with the update (-U) option, which doesn't make sense for development (-e .) installation.

But it might not be necessary. I tried to change a lot of things before the CI on GitHub finally went through. It could very well be the change in line 91-92 that did the work...

pip install -U -r requirements.txt -r requirements_dev.txt
pip install -e .

- name: Test with pytest
run: pytest -vvv --cov-report=xml
Expand Down Expand Up @@ -184,7 +194,8 @@ jobs:
run: |
python -m pip install -U pip
pip install -U setuptools wheel
pip install -e .[docs]
pip install -U -r requirements_docs.txt
pip install -e .
francescalb marked this conversation as resolved.
Show resolved Hide resolved

- name: Build
run: |
Expand Down
3 changes: 3 additions & 0 deletions docs/api_reference/strategies/convert.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# convert

::: oteapi_dlite.strategies.convert
3 changes: 0 additions & 3 deletions docs/api_reference/strategies/function.md

This file was deleted.

168 changes: 168 additions & 0 deletions oteapi_dlite/strategies/convert.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
"""Generic function strategy that converts zero or more input instances
to zero or more new output instances.

"""
# pylint: disable=unused-argument
import importlib
from typing import TYPE_CHECKING, Optional, Sequence

import dlite
from oteapi.models import AttrDict, FunctionConfig
from pydantic import Field
from pydantic.dataclasses import dataclass

from oteapi_dlite.models import DLiteSessionUpdate
from oteapi_dlite.utils import get_collection, update_collection

if TYPE_CHECKING:
from typing import Any, Dict


class DLiteConvertInputConfig(AttrDict):
"""Configuration for input instance to generic DLite converter.

At least one of `label` or `datamodel` should be given.
"""

label: Optional[str] = Field(
None,
description="Label of the instance.",
)
datamodel: Optional[str] = Field(
None,
description="URI of data model.",
)
property_mappings: bool = Field(
False,
description="Whether to infer instance from property mappings.",
)


class DLiteConvertOutputConfig(AttrDict):
"""Configuration for output instance to generic DLite converter."""

label: str = Field(
None,
description="Label to use when storing the instance.",
)
datamodel: Optional[str] = Field(
None,
description="URI of data model. Used for documentation.",
)


class DLiteConvertStrategyConfig(AttrDict):
"""Configuration for generic DLite converter."""

function_name: str = Field(
None,
description="Name of convert function. It will be pased the input "
"instances as arguments and should return a sequence of output "
"instances.",
)
module_name: str = Field(
None,
description="Name of Python module containing the convertion function.",
)
package: Optional[str] = Field(
None,
description="Used when performing a relative import of the converter "
"function. It specifies the package to use as the anchor point from "
"which to resolve the relative import to an absolute import.",
)
pypi_package: Optional[str] = Field(
None,
description="Package name on PyPI. This field is currently only "
"informative, but might be used in the future for automatic package "
"installation.",
)
inputs: Sequence[DLiteConvertInputConfig] = Field(
None,
description="Input instances.",
)
outputs: Sequence[DLiteConvertOutputConfig] = Field(
None,
description="Output instances.",
)


class DLiteConvertConfig(FunctionConfig):
"""DLite convert strategy resource config."""

configuration: DLiteConvertStrategyConfig = Field(
..., description="DLite convert strategy-specific configuration."
)


@dataclass
class DLiteConvertStrategy:
"""Generic DLite convert strategy for converting zero or more input
instances to zero or more output instances.

**Registers strategies**:

- `("functionType", "application/vnd.dlite-convert")`

"""

convert_config: DLiteConvertConfig

def initialize(
self,
session: "Optional[Dict[str, Any]]" = None,
) -> DLiteSessionUpdate:
"""Initialize."""
return DLiteSessionUpdate(collection_id=get_collection(session).uuid)

def get(
self, session: "Optional[Dict[str, Any]]" = None
) -> DLiteSessionUpdate:
"""Execute the strategy.

This method will be called through the strategy-specific endpoint
of the OTE-API Services.

Parameters:
session: A session-specific dictionary context.

Returns:
SessionUpdate instance.
"""
config = self.convert_config.configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember to update the config with the relevant fields from the session

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. That is needed if we want a later filter to have an effect.

I was thinking about encouraging not specifying labels in the configuration. But fetching the label from the session could be useful. However, a single label in the session will not do, since each input and output has an optional label, so we need to be more specific when specifying labels in the session.

Would it make sense to allow variable substitutions in the configuration of a partial pipeline, like

  https://www.ntnu.edu/physmet/data#image_analyser:
    function:
      functionType: application/vnd.dlite-convert
      configuration:
        module_name: temdata.image_analyser
        function_name: image_analyser
        inputs:
          - datamodel: http://onto-ns.com/meta/0.1/TEMImage
            label: ${temimage}
        outputs:
          - datamodel: http://onto-ns.com/meta/0.1/PrecipitateStatistics
            label: ${precipitate_statistics}
    mapping:
      mappingType: mappings
      prefixes:
       ...
      triples:
        ...

where ${temimage} and ${precipitate_statistics} are substituted from the session.

On the pros side, this would allow templated partial pipelines with improved re-usability.

On the conc side, it will be an extra layer of complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not completely sold on the idea of labels. Also, if we start creating declarative partial pipelines with variables, they become hard to exchange/share between instances. Why not refer to a resource directly?

Copy link
Contributor Author

@jesper-friis jesper-friis Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a good point. I agree that a partial pipeline should document a single data source or sink and in general not contain variables. However, there are cases where fetching parameters from the configuration are useful. One case is a partial pipeline documenting a SQL database. The documentation of the database should be fix, while the query may vary each time we execute the pipeline. Another possible use case is to specify the label a parser should use when storing an newly created instance into the collection and correspondingly, the label a generator should use when fetching an instance from the collection. In this case the labels may be variables when documenting the partial pipelines, but must be assigned and internally consistent before executing the full pipeline.

While variables is an easy and flexible way to assign consistent labels across partial pipelines, it may also open a can of worms of potential misuse.

Furthermore, in the common case that we only have one instance of a given entity in the collection, we don't need labels, since we can refer to the instance by specifying the entity in the configuration.

So if we solve the issue of assigning the labels in strategies that may refer to multiple instances (like the convert strategy) without variables, it might be a good idea to avoid variables in the partial pipelines stored in the knowledge base.

However, for populate the knowledge base with partial pipelines of a set of similar data sources, I think that templates with variables would be very useful. In this case, all substitutions should be done before storing into the knowledge base. Such a template utility may in this case live outside oteapi.


module = importlib.import_module(config.module_name, config.package)
function = getattr(module, config.function_name)

coll = get_collection(session)

instances = []
for i, input_config in enumerate(config.inputs):
input_config = config.inputs[i]
if input_config.label:
instances.append(
coll.get(input_config.label, input_config.datamodel)
)
elif input_config.datamodel:
inst = coll.get_instances(
metaid=input_config.datamodel,
property_mappings=input_config.property_mappings,
# More to do: add more arguments...
)
else:
raise ValueError(
"either `label` or `datamodel` must be specified in "
"inputs[{i}]"
)

outputs = function(*instances)
if isinstance(outputs, dlite.Instance):
outputs = [outputs]

for inst, output_config in zip(outputs, config.outputs):
coll.add(output_config.label, inst)

update_collection(coll)
return DLiteSessionUpdate(collection_id=coll.uuid)


# DLiteConvertConfig.update_forward_refs()
145 changes: 0 additions & 145 deletions oteapi_dlite/strategies/function.py

This file was deleted.

Loading