Skip to content

Commit

Permalink
[DataCatalog2.0]: KedroDataCatalog with dict interface (#4218)
Browse files Browse the repository at this point in the history
* Added a skeleton for AbstractDataCatalog and KedroDataCatalog

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed from_config method

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Implemented _init_datasets method

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Implemented get dataset

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Started resolve_patterns implementation

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Implemented resolve_patterns

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed credentials resolving

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated match pattern

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Implemented add from dict method

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated io __init__

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added list method

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Implemented _validate_missing_keys

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added datasets access logic

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added __contains__ and comments on lazy loading

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed dataset_name to ds_name

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated some docstrings

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed _update_ds_configs

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed _init_datasets

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Implemented add_runtime_patterns

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed runtime patterns usage

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Moved pattern logic out of data catalog, implemented KedroDataCatalog

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* KedroDataCatalog updates

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added property to return config

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added list patterns method

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed and moved ConfigResolver

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed ConfigResolver

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Cleaned KedroDataCatalog

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Cleaned up DataCatalogConfigResolver

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Docs build fix attempt

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* KedroDataCatalog draft

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed KedroDataCatalog

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated from_config method

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated constructor and add methods

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated _get_dataset method

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated __contains__

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated __eq__ and shallow_copy

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added __iter__ and __getitem__

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed unused imports

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added TODO

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated runner.run()

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated session

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added confil_resolver property

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated catalog list command

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated catalog create command

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated catalog rank command

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated catalog resolve command

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Remove some methods

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed ds configs from catalog

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed lint

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed typo

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added module docstring

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renaming methods

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed None from Pattern type

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed docs failing to find class reference

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed docs failing to find class reference

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated Patterns type

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fix tests (#4149)

* Fix most tests

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Fix most tests

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

---------

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Returned constants to avoid breaking changes

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Udapted KedroDataCatalog for recent changes

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Minor fix

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated test_sorting_order_with_other_dataset_through_extra_pattern

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed odd properties

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated tests

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed None from _fetch_credentials input

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated specs and context

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated runners

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated default catalog validation

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated default catalog validation

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated contains and added exists methods for KedroDataCatalog

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed docs

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixing docs and lint

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed docs

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed docs

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed unit tests

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added __eq__

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed DataCatalogConfigResolver to CatalogConfigResolver

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed _init_configs to _resolve_config_credentials

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Moved functions to the class

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Refactored resolve_dataset_pattern

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed refactored part

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Changed the order of arguments for DataCatalog constructor

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Replaced __getitem__ with .get()

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated catalog commands

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Moved warm up block outside of the try block

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed linter

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed odd copying

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed DataCatalogConfigResolver to CatalogConfigResolver

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed AbstractDataCatalog to BaseDataCatalog

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Moved validate_dataset_config inside catalog

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed _init_dataset to _add_from_config

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fix lint

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated release notes

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Returned DatasetError

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added _dataset_patterns and _default_pattern to _config_resolver to avoid breaking change

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Made resolve_dataset_pattern return just dict

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed linter

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added Catalogprotocol draft

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Implemented CatalogProtocol

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated types

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed linter

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added _ImplementsCatalogProtocolValidator

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated docstrings

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed tests

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed docs

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Excluded Potocol from coverage

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed docs

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed catalog source to kedro_data_catalog

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed data set to dataset in docstrings

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated add_from_dict

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Revised comments and TODOs

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated error message to point to specific catalog type

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed tests

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Merged with protocol

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed reference to DataCatalog in docstrings

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed docs

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Reordered methods

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed add_all from protocol

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Changed the order of arguments

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated docstrings

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated docstrings

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added __repr__

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Made __getitem__ return deepcopy

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed bug in get_dataset()

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed __eq__

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed docstrings

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added __setitem__

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Unit tests for `KedroDataCatalog` (#4171)

* Added KedroDataCatlog tests template

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added test save/load unregistered dataset

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added test_feed_dict

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added exists tests

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added tests for list()

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added test_eq

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added test init/add datasets

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated test_adding_datasets_not_allowed

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added shallow copy tests

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added TestKedroDataCatalogFromConfig

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added missing tests

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

---------

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated RELEASE.md

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed deep copies

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed some interface that will be changed in the next version

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed key completions

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixinf typos

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed key completions test

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Replaced data set with dataset

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added docstring for get_dataset() method

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed pytest fixture

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Addressed review comments

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated _assert_requirements_ok starters test

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Revert "Updated _assert_requirements_ok starters test"

This reverts commit 5208321.

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated error message

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Replaced typo

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Replaced data set with dataset in docstrings

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated tests

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Made KedroDataCatalog subclass from CatalogProtocol

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated release notes

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Implemented iter, getitem, setitem

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated add_data and TODOs

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added key completions

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Maded behavior dict like

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Merged with main

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed add_data() method

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added usage example and updated docstrings with experimental feature note

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added len and get

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Implemented unit tests

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Update RELEASE.md

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: ElenaKhaustova <157851531+ElenaKhaustova@users.noreply.github.com>

* Update kedro/io/kedro_data_catalog.py

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: ElenaKhaustova <157851531+ElenaKhaustova@users.noreply.github.com>

* Fixed lint

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated load_data and save_data to use new interface

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated load_data and save_data to use new interface

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Returned usage of get_dataset()

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed lint

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated __getitem__ to use old get_dataset() method

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed regex_search from values()

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed type annotation for __iter__

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed linter

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Revert lint fix

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Returned short names for save and load

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed regex_search from keys and items

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated release notes

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Maded regex_search non optional

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Changed default for regex_flags

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Returned list() method

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed __iter__ return type

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

---------

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: ElenaKhaustova <157851531+ElenaKhaustova@users.noreply.github.com>
Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
  • Loading branch information
3 people authored Oct 18, 2024
1 parent 2e950a2 commit 3fe61a0
Show file tree
Hide file tree
Showing 3 changed files with 162 additions and 25 deletions.
4 changes: 4 additions & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# Upcoming Release

## Major features and improvements
* Implemented dict-like interface for `KedroDataCatalog`.

**Note:** ``KedroDataCatalog`` is an experimental feature and is under active development. Therefore, it is possible we'll introduce breaking changes to this class, so be mindful of that if you decide to use it already. Let us know if you have any feedback about the ``KedroDataCatalog`` or ideas for new features.

## Bug fixes and other changes
## Breaking changes to the API
## Documentation changes
Expand Down
148 changes: 124 additions & 24 deletions kedro/io/kedro_data_catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
import difflib
import logging
import re
from typing import Any
from typing import Any, Iterator, List # noqa: UP035

from kedro.io.catalog_config_resolver import CatalogConfigResolver, Patterns
from kedro.io.core import (
Expand Down Expand Up @@ -84,10 +84,12 @@ def __init__(

@property
def datasets(self) -> dict[str, Any]:
# TODO: remove when removing old catalog
return copy.copy(self._datasets)

@datasets.setter
def datasets(self, value: Any) -> None:
# TODO: remove when removing old catalog
raise AttributeError(
"Operation not allowed. Please use KedroDataCatalog.add() instead."
)
Expand All @@ -112,6 +114,49 @@ def __eq__(self, other) -> bool: # type: ignore[no-untyped-def]
other.config_resolver.list_patterns(),
)

def keys(self) -> List[str]: # noqa: UP006
return list(self.__iter__())

def values(self) -> List[AbstractDataset]: # noqa: UP006
return [self._datasets[key] for key in self]

def items(self) -> List[tuple[str, AbstractDataset]]: # noqa: UP006
return [(key, self._datasets[key]) for key in self]

def __iter__(self) -> Iterator[str]:
yield from self._datasets.keys()

def __getitem__(self, ds_name: str) -> AbstractDataset:
return self.get_dataset(ds_name)

def __setitem__(self, key: str, value: Any) -> None:
if key in self._datasets:
self._logger.warning("Replacing dataset '%s'", key)
if isinstance(value, AbstractDataset):
self._datasets[key] = value
else:
self._logger.info(f"Adding input data as a MemoryDataset - {key}")
self._datasets[key] = MemoryDataset(data=value) # type: ignore[abstract]

def __len__(self) -> int:
return len(self.keys())

def get(
self, key: str, default: AbstractDataset | None = None
) -> AbstractDataset | None:
"""Get a dataset by name from an internal collection of datasets."""
if key not in self._datasets:
ds_config = self._config_resolver.resolve_pattern(key)
if ds_config:
self._add_from_config(key, ds_config)

dataset = self._datasets.get(key, None)

return dataset or default

def _ipython_key_completions_(self) -> list[str]:
return list(self._datasets.keys())

@property
def _logger(self) -> logging.Logger:
return logging.getLogger(__name__)
Expand Down Expand Up @@ -178,6 +223,7 @@ def _add_from_config(self, ds_name: str, ds_config: dict[str, Any]) -> None:
def get_dataset(
self, ds_name: str, version: Version | None = None, suggest: bool = True
) -> AbstractDataset:
# TODO: remove when removing old catalog
"""Get a dataset by name from an internal collection of datasets.
If a dataset is not in the collection but matches any pattern
Expand All @@ -197,12 +243,7 @@ def get_dataset(
DatasetNotFoundError: When a dataset with the given name
is not in the collection and do not match patterns.
"""
if ds_name not in self._datasets:
ds_config = self._config_resolver.resolve_pattern(ds_name)
if ds_config:
self._add_from_config(ds_name, ds_config)

dataset = self._datasets.get(ds_name, None)
dataset = self.get(ds_name)

if dataset is None:
error_msg = f"Dataset '{ds_name}' not found in the catalog"
Expand Down Expand Up @@ -231,40 +272,71 @@ def _get_dataset(
def add(
self, ds_name: str, dataset: AbstractDataset, replace: bool = False
) -> None:
# TODO: remove when removing old catalog
"""Adds a new ``AbstractDataset`` object to the ``KedroDataCatalog``."""
if ds_name in self._datasets:
if replace:
self._logger.warning("Replacing dataset '%s'", ds_name)
else:
raise DatasetAlreadyExistsError(
f"Dataset '{ds_name}' has already been registered"
)
self._datasets[ds_name] = dataset

def list(self, regex_search: str | None = None) -> list[str]:
if ds_name in self._datasets and not replace:
raise DatasetAlreadyExistsError(
f"Dataset '{ds_name}' has already been registered"
)
self.__setitem__(ds_name, dataset)

def list(
self, regex_search: str | None = None, regex_flags: int | re.RegexFlag = 0
) -> List[str]: # noqa: UP006
# TODO: rename depending on the solution for https://github.com/kedro-org/kedro/issues/3917
"""
List of all dataset names registered in the catalog.
This can be filtered by providing an optional regular expression
which will only return matching keys.
"""

if regex_search is None:
return list(self._datasets.keys())
return self.keys()

if not regex_search.strip():
if regex_search == "":
self._logger.warning("The empty string will not match any datasets")
return []

if not regex_flags:
regex_flags = re.IGNORECASE

try:
pattern = re.compile(regex_search, flags=re.IGNORECASE)
pattern = re.compile(regex_search, flags=regex_flags)
except re.error as exc:
raise SyntaxError(
f"Invalid regular expression provided: '{regex_search}'"
) from exc
return [ds_name for ds_name in self._datasets if pattern.search(ds_name)]
return [ds_name for ds_name in self.__iter__() if pattern.search(ds_name)]

def save(self, name: str, data: Any) -> None:
"""Save data to a registered dataset."""
# TODO: rename input argument when breaking change: name -> ds_name
"""Save data to a registered dataset.
Args:
name: A dataset to be saved to.
data: A data object to be saved as configured in the registered
dataset.
Raises:
DatasetNotFoundError: When a dataset with the given name
has not yet been registered.
Example:
::
>>> import pandas as pd
>>>
>>> from kedro_datasets.pandas import CSVDataset
>>>
>>> cars = CSVDataset(filepath="cars.csv",
>>> load_args=None,
>>> save_args={"index": False})
>>> catalog = DataCatalog(datasets={'cars': cars})
>>>
>>> df = pd.DataFrame({'col1': [1, 2],
>>> 'col2': [4, 5],
>>> 'col3': [5, 6]})
>>> catalog.save("cars", df)
"""
dataset = self.get_dataset(name)

self._logger.info(
Expand All @@ -277,7 +349,35 @@ def save(self, name: str, data: Any) -> None:
dataset.save(data)

def load(self, name: str, version: str | None = None) -> Any:
"""Loads a registered dataset."""
# TODO: rename input argument when breaking change: name -> ds_name
# TODO: remove version from input arguments when breaking change
"""Loads a registered dataset.
Args:
name: A dataset to be loaded.
version: Optional argument for concrete data version to be loaded.
Works only with versioned datasets.
Returns:
The loaded data as configured.
Raises:
DatasetNotFoundError: When a dataset with the given name
has not yet been registered.
Example:
::
>>> from kedro.io import DataCatalog
>>> from kedro_datasets.pandas import CSVDataset
>>>
>>> cars = CSVDataset(filepath="cars.csv",
>>> load_args=None,
>>> save_args={"index": False})
>>> catalog = DataCatalog(datasets={'cars': cars})
>>>
>>> df = catalog.load("cars")
"""
load_version = Version(version, None) if version else None
dataset = self.get_dataset(name, version=load_version)

Expand Down
35 changes: 34 additions & 1 deletion tests/io/test_kedro_data_catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -379,7 +379,7 @@ def test_config_invalid_dataset_config(self, correct_config):

def test_empty_config(self):
"""Test empty config"""
assert KedroDataCatalog.from_config(None)
assert len(KedroDataCatalog.from_config(None)) == 0

def test_missing_credentials(self, correct_config):
"""Check the error if credentials can't be located"""
Expand Down Expand Up @@ -502,6 +502,39 @@ def test_bad_confirm(self, correct_config, dataset_name, pattern):
with pytest.raises(DatasetError, match=re.escape(pattern)):
data_catalog.confirm(dataset_name)

def test_iteration(self, correct_config):
"""Test iterate through keys, values and items."""
data_catalog = KedroDataCatalog.from_config(**correct_config)

for ds_name_cat, ds_name_config in zip(
data_catalog, correct_config["catalog"]
):
assert ds_name_cat == ds_name_config

for ds_name_cat, ds_name_config in zip(
data_catalog.keys(), correct_config["catalog"]
):
assert ds_name_cat == ds_name_config

for ds in data_catalog.values():
assert isinstance(ds, CSVDataset)

for ds_name, ds in data_catalog.items():
assert isinstance(ds, CSVDataset)
assert ds_name in correct_config["catalog"]

def test_getitem_setitem(self, correct_config):
"""Test get and set item."""
data_catalog = KedroDataCatalog.from_config(**correct_config)
data_catalog["test"] = 123
assert isinstance(data_catalog["test"], MemoryDataset)

def test_ipython_key_completions(self, correct_config):
data_catalog = KedroDataCatalog.from_config(**correct_config)
assert data_catalog._ipython_key_completions_() == list(
correct_config["catalog"].keys()
)

class TestDataCatalogVersioned:
def test_from_correct_config_versioned(self, correct_config, dummy_dataframe):
"""Test load and save of versioned datasets from config"""
Expand Down

0 comments on commit 3fe61a0

Please sign in to comment.