Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: descritpion.md are now optional #417

Merged
merged 11 commits into from
Jun 19, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changes/417.changed
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
`description.md` are now `Optional` in `DatasampleSpec`. If no description file is given, `Substra` will generate a template indicating how to add a custom description file during dataset registration.
4 changes: 3 additions & 1 deletion references/sdk_schemas.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,12 @@ the 'paths' field.
Specification for creating a dataset

note : metadata field does not accept strings containing '__' as dict key

note : If no description markdown file is given, create an empty one on the data_opener folder.
```text
- name: <class 'str'>
- data_opener: <class 'pathlib.Path'>
- description: <class 'pathlib.Path'>
- description: typing.Optional[pathlib.Path]
- permissions: <class 'substra.sdk.schemas.Permissions'>
- metadata: typing.Optional[typing.Dict[str, str]]
- logs_permission: <class 'substra.sdk.schemas.Permissions'>
Expand Down
22 changes: 21 additions & 1 deletion substra/sdk/schemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,13 @@
"summary_task": "task",
}

GENERATED_DESCRIPTION_CONTENT = """
# No description given

To add a dataset description, create a markdown file and pass it to your
`substra.sdk.schemas.DatasetSpec` on your dataset opener registration.
"""


class BackendType(str, enum.Enum):
REMOTE = "remote"
Expand Down Expand Up @@ -278,17 +285,30 @@ class DatasetSpec(_Spec):
"""Specification for creating a dataset

note : metadata field does not accept strings containing '__' as dict key

note : If no description markdown file is given, create an empty one on the data_opener folder.
"""

name: str
data_opener: pathlib.Path # Path to the data opener
description: pathlib.Path # Path to the description file
description: Optional[pathlib.Path] = None # Path to the description file
permissions: Permissions
metadata: Optional[Dict[str, str]] = None
logs_permission: Permissions

type_: typing.ClassVar[Type] = Type.Dataset

@pydantic.model_validator(mode="before")
@classmethod
def _check_description(cls, values):
if "description" not in values:
parent_path = pathlib.Path(values["data_opener"]).parent
description_path = parent_path / "generated_description.md"
with description_path.open("w", encoding="utf-8") as f:
f.write(GENERATED_DESCRIPTION_CONTENT)
thbcmlowk marked this conversation as resolved.
Show resolved Hide resolved
values["description"] = description_path
return values

class Meta:
file_attributes = (
"data_opener",
Expand Down
20 changes: 20 additions & 0 deletions tests/sdk/test_schemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
import pytest

from substra.sdk.schemas import DataSampleSpec
from substra.sdk.schemas import DatasetSpec
from substra.sdk.schemas import Permissions


@pytest.mark.parametrize("path", [pathlib.Path() / "data", "./data", pathlib.Path().cwd() / "data"])
Expand Down Expand Up @@ -38,3 +40,21 @@ def test_datasample_spec_paths_set_to_none():
def test_datasample_spec_path_set_to_none():
with pytest.raises(ValueError):
DataSampleSpec(path=None, data_manager_keys=[str(uuid.uuid4())])


def test_dataset_spec_no_description(tmpdir):

opener_path = tmpdir / "fake_opener.py"
with open(opener_path, "w") as f:
f.write("print('I'm opening your data')")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether we can mock this or not but it is usually considered as a good practice to limit fs interactions in unit tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even in the pytest tmpdir ? But your right, I don't need to write the file, just to pass the path, good catch :)


permissions = Permissions(public=True, authorized_ids=[])

DatasetSpec(
name="Fake Dataset",
data_opener=str(opener_path),
permissions=permissions,
logs_permission=permissions,
)

assert (pathlib.Path(opener_path).parent / "generated_description.md").exists
Loading