Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Kedro instantiate datasets from kedro_datasetwith higher priority than kedro.extras.datasets #1734

Merged
merged 16 commits into from
Nov 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,9 @@
# Upcoming Release 0.18.4

## Major features and improvements
* The config loader objects now implement `UserDict` and the configuration is accessed through `conf_loader['catalog']`
* You can configure config file patterns through `settings.py` without creating a custom config loader
* Make Kedro instantiate datasets from `kedro_datasets` with higher priority than `kedro.extras.datasets`. `kedro_datasets` is the namespace for the new `kedro-datasets` python package.
* The config loader objects now implement `UserDict` and the configuration is accessed through `conf_loader['catalog']`.
* You can configure config file patterns through `settings.py` without creating a custom config loader.

## Bug fixes and other changes
* Fixed `kedro micropkg pull` for packages on PyPI.
Expand Down
2 changes: 1 addition & 1 deletion kedro/io/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -347,7 +347,7 @@ class Version(namedtuple("Version", ["load", "save"])):
"intermediate data sets where possible to avoid this warning."
)

_DEFAULT_PACKAGES = ["kedro.io.", "kedro.extras.datasets.", ""]
_DEFAULT_PACKAGES = ["kedro.io.", "kedro_datasets.", "kedro.extras.datasets.", ""]


def parse_dataset_definition(
Expand Down
22 changes: 21 additions & 1 deletion tests/io/test_data_catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,13 @@
LambdaDataSet,
MemoryDataSet,
)
from kedro.io.core import VERSION_FORMAT, Version, generate_timestamp
from kedro.io.core import (
_DEFAULT_PACKAGES,
VERSION_FORMAT,
Version,
generate_timestamp,
parse_dataset_definition,
)


@pytest.fixture
Expand Down Expand Up @@ -373,6 +379,20 @@ def test_config_relative_import(self, sane_config):
with pytest.raises(DataSetError, match=re.escape(pattern)):
DataCatalog.from_config(**sane_config)

def test_config_import_kedro_datasets(self, sane_config, mocker):
"""Test kedro.extras.datasets default path to the dataset class"""
# Spy _load_obj because kedro_datasets is not installed and we can't import it.

import kedro.io.core # pylint: disable=import-outside-toplevel

spy = mocker.spy(kedro.io.core, "_load_obj")
parse_dataset_definition(sane_config["catalog"]["boats"])
for prefix, call_args in zip(_DEFAULT_PACKAGES, spy.call_args_list):
# In Python 3.7 call_args.args is not available thus we access the call
# arguments with less meaningful index.
# The 1st index returns a tuple, the 2nd index return the name of module.
assert call_args[0][0] == f"{prefix}pandas.CSVDataSet"

def test_config_import_extras(self, sane_config):
"""Test kedro.extras.datasets default path to the dataset class"""
sane_config["catalog"]["boats"]["type"] = "pandas.CSVDataSet"
Expand Down