Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make core config accessible in dict get way #1870

Merged
merged 16 commits into from
Oct 11, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .circleci/continue_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -371,7 +371,7 @@ jobs:
docs_linkcheck:
executor:
name: docker
python_version: "3.7"
python_version: "3.8"
merelcht marked this conversation as resolved.
Show resolved Hide resolved
steps:
- setup
- run:
Expand Down
2 changes: 2 additions & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
# Upcoming Release 0.18.4

## Major features and improvements
* The config loader objects now implement `UserDict` and the configuration is accessed through `conf_loader['catalog']`
* You can configure config file patterns through `settings.py` without creating a custom config loader

## Bug fixes and other changes
* Fixed `kedro micropkg pull` for packages on PyPI.
Expand Down
11 changes: 11 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,17 @@
"pluggy._manager.PluginManager",
"_DI",
"_DO",
# The statements below were added after subclassing UserDict in AbstractConfigLoader.
"None. Remove all items from D.",
"a shallow copy of D",
"a set-like object providing a view on D's items",
"a set-like object providing a view on D's keys",
"v, remove specified key and return the corresponding value.",
"None. Update D from dict/iterable E and F.",
"an object providing a view on D's values",
"(k, v), remove and return some (key, value) pair",
"D.get(k,d), also set D[k]=d if k not in D",
"None. Update D from mapping/iterable E and F.",
),
"py:data": (
"typing.Any",
Expand Down
14 changes: 7 additions & 7 deletions docs/source/kedro_project_setup/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ from kedro.framework.project import settings

conf_path = str(project_path / settings.CONF_SOURCE)
conf_loader = ConfigLoader(conf_source=conf_path, env="local")
conf_catalog = conf_loader.get("catalog*", "catalog*/**")
conf_catalog = conf_loader["catalog"]
```

This recursively scans for configuration files firstly in the `conf/base/` (`base` being the default environment) and then in the `conf/local/` (`local` being the designated overriding environment) directory according to the following rules:
Expand Down Expand Up @@ -180,7 +180,7 @@ from kedro.framework.project import settings

conf_path = str(project_path / settings.CONF_SOURCE)
conf_loader = ConfigLoader(conf_source=conf_path, env="local")
parameters = conf_loader.get("parameters*", "parameters*/**")
parameters = conf_loader["parameters"]
```

This will load configuration files from any subdirectories in `conf` that have a filename starting with `parameters`, or are located inside a folder with name starting with `parameters`.
Expand All @@ -189,7 +189,7 @@ This will load configuration files from any subdirectories in `conf` that have a
Since `local` is set as the environment, the configuration path `conf/local` takes precedence in the example above. Hence any overlapping top-level keys from `conf/base` will be overwritten by the ones from `conf/local`.
```

Calling `conf_loader.get()` in the example above will throw a `MissingConfigException` error if no configuration files match the given patterns in any of the specified paths. If this is a valid workflow for your application, you can handle it as follows:
Calling `conf_loader[key]` in the example above will throw a `MissingConfigException` error if no configuration files match the given key. If this is a valid workflow for your application, you can handle it as follows:

```python
from kedro.config import ConfigLoader, MissingConfigException
Expand All @@ -199,7 +199,7 @@ conf_path = str(project_path / settings.CONF_SOURCE)
conf_loader = ConfigLoader(conf_source=conf_path, env="local")

try:
parameters = conf_loader.get("parameters*", "parameters*/**", "**/parameters*")
parameters = conf_loader["parameters"]
except MissingConfigException:
parameters = {}
```
Expand Down Expand Up @@ -315,7 +315,7 @@ from kedro.framework.project import settings

conf_path = str(project_path / settings.CONF_SOURCE)
conf_loader = ConfigLoader(conf_source=conf_path, env="local")
credentials = conf_loader.get("credentials*", "credentials*/**")
credentials = conf_loader["credentials"]
```

This will load configuration files from `conf/base` and `conf/local` whose filenames start with `credentials`, or that are located inside a folder with a name that starts with `credentials`.
Expand All @@ -324,7 +324,7 @@ This will load configuration files from `conf/base` and `conf/local` whose filen
Since `local` is set as the environment, the configuration path `conf/local` takes precedence in the example above. Hence, any overlapping top-level keys from `conf/base` will be overwritten by the ones from `conf/local`.
```

Calling `conf_loader.get()` in the example above throws a `MissingConfigException` error if no configuration files match the given patterns in any of the specified paths. If this is a valid workflow for your application, you can handle it as follows:
Calling `conf_loader[key]` in the example above throws a `MissingConfigException` error if no configuration files match the given key. If this is a valid workflow for your application, you can handle it as follows:

```python
from kedro.config import ConfigLoader, MissingConfigException
Expand All @@ -334,7 +334,7 @@ conf_path = str(project_path / settings.CONF_SOURCE)
conf_loader = ConfigLoader(conf_source=conf_path, env="local")

try:
credentials = conf_loader.get("credentials*", "credentials*/**")
credentials = conf_loader["credentials"]
except MissingConfigException:
credentials = {}
```
Expand Down
10 changes: 3 additions & 7 deletions kedro/config/abstract_config.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
"""This module provides ``kedro.abstract_config`` with the baseline
class model for a `ConfigLoader` implementation.
"""
from abc import ABC, abstractmethod
from collections import UserDict
from typing import Any, Dict


class AbstractConfigLoader(ABC):
class AbstractConfigLoader(UserDict):
"""``AbstractConfigLoader`` is the abstract base class
for all `ConfigLoader` implementations.
All user-defined `ConfigLoader` implementations should inherit
Expand All @@ -19,15 +19,11 @@ def __init__(
runtime_params: Dict[str, Any] = None,
**kwargs # pylint: disable=unused-argument
):
super().__init__()
self.conf_source = conf_source
self.env = env
self.runtime_params = runtime_params

@abstractmethod # pragma: no cover
def get(self) -> Dict[str, Any]:
"""Required method to get all configurations."""
pass


class BadConfigException(Exception):
"""Raised when a configuration file cannot be loaded, for instance
Expand Down
31 changes: 23 additions & 8 deletions kedro/config/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
or more configuration files from specified paths.
"""
from pathlib import Path
from typing import Any, Dict, Iterable
from typing import Any, Dict, Iterable, List

from kedro.config import AbstractConfigLoader
from kedro.config.common import _get_config_from_patterns, _remove_duplicates
Expand Down Expand Up @@ -56,11 +56,11 @@ class ConfigLoader(AbstractConfigLoader):
>>> conf_path = str(project_path / settings.CONF_SOURCE)
>>> conf_loader = ConfigLoader(conf_source=conf_path, env="local")
>>>
>>> conf_logging = conf_loader.get('logging*')
>>> conf_logging = conf_loader["logging"]
>>> logging.config.dictConfig(conf_logging) # set logging conf
>>>
>>> conf_catalog = conf_loader.get('catalog*', 'catalog*/**')
>>> conf_params = conf_loader.get('**/parameters.yml')
>>> conf_catalog = conf_loader["catalog"]
>>> conf_params = conf_loader["parameters"]

"""

Expand All @@ -69,6 +69,7 @@ def __init__(
conf_source: str,
env: str = None,
runtime_params: Dict[str, Any] = None,
config_patterns: Dict[str, List[str]] = None,
*,
base_env: str = "base",
default_run_env: str = "local",
Expand All @@ -86,18 +87,32 @@ def __init__(
This is used in the `conf_paths` property method to construct
the configuration paths. Can be overriden by supplying the `env` argument.
"""
super().__init__(
conf_source=conf_source, env=env, runtime_params=runtime_params
)
self.base_env = base_env
self.default_run_env = default_run_env

self.config_patterns = {
"catalog": ["catalog*", "catalog*/**", "**/catalog*"],
"parameters": ["parameters*", "parameters*/**", "**/parameters*"],
"credentials": ["credentials*", "credentials*/**", "**/credentials*"],
"logging": ["logging*", "logging*/**", "**/logging*"],
}
self.config_patterns.update(config_patterns or {})

super().__init__(
conf_source=conf_source,
env=env,
runtime_params=runtime_params,
)

def __getitem__(self, key):
return self.get(*self.config_patterns[key])

@property
def conf_paths(self):
"""Property method to return deduplicated configuration paths."""
return _remove_duplicates(self._build_conf_paths())

def get(self, *patterns: str) -> Dict[str, Any]:
def get(self, *patterns: str) -> Dict[str, Any]: # type: ignore
return _get_config_from_patterns(
conf_paths=self.conf_paths, patterns=list(patterns)
)
Expand Down
16 changes: 14 additions & 2 deletions kedro/config/templated_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import re
from copy import deepcopy
from pathlib import Path
from typing import Any, Dict, Iterable, Optional
from typing import Any, Dict, Iterable, List, Optional

import jmespath

Expand Down Expand Up @@ -92,6 +92,7 @@ def __init__(
conf_source: str,
env: str = None,
runtime_params: Dict[str, Any] = None,
config_patterns: Dict[str, List[str]] = None,
*,
base_env: str = "base",
default_run_env: str = "local",
Expand All @@ -114,6 +115,14 @@ def __init__(
obtained from the globals_pattern. In case of duplicate keys, the
``globals_dict`` keys take precedence.
"""
self.config_patterns = {
"catalog": ["catalog*", "catalog*/**", "**/catalog*"],
"parameters": ["parameters*", "parameters*/**", "**/parameters*"],
"credentials": ["credentials*", "credentials*/**", "**/credentials*"],
"logging": ["logging*", "logging*/**", "**/logging*"],
}
self.config_patterns.update(config_patterns or {})

super().__init__(
conf_source=conf_source, env=env, runtime_params=runtime_params
)
Expand All @@ -132,12 +141,15 @@ def __init__(
globals_dict = deepcopy(globals_dict) or {}
self._config_mapping = {**self._config_mapping, **globals_dict}

def __getitem__(self, key):
return self.get(*self.config_patterns[key])

@property
def conf_paths(self):
"""Property method to return deduplicated configuration paths."""
return _remove_duplicates(self._build_conf_paths())

def get(self, *patterns: str) -> Dict[str, Any]:
def get(self, *patterns: str) -> Dict[str, Any]: # type: ignore
"""Tries to resolve the template variables in the config dictionary
provided by the ``ConfigLoader`` (super class) ``get`` method using the
dictionary of replacement values obtained in the ``__init__`` method.
Expand Down
11 changes: 3 additions & 8 deletions kedro/framework/context/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -240,10 +240,7 @@ def params(self) -> Dict[str, Any]:
extra parameters passed at initialization.
"""
try:
# '**/parameters*' reads modular pipeline configs
params = self.config_loader.get(
"parameters*", "parameters*/**", "**/parameters*"
)
params = self.config_loader["parameters"]
merelcht marked this conversation as resolved.
Show resolved Hide resolved
except MissingConfigException as exc:
warn(f"Parameters not found in your Kedro project config.\n{str(exc)}")
params = {}
Expand Down Expand Up @@ -275,7 +272,7 @@ def _get_catalog(

"""
# '**/catalog*' reads modular pipeline configs
conf_catalog = self.config_loader.get("catalog*", "catalog*/**", "**/catalog*")
conf_catalog = self.config_loader["catalog"]
# turn relative paths in conf_catalog into absolute paths
# before initializing the catalog
conf_catalog = _convert_paths_to_absolute_posix(
Expand Down Expand Up @@ -337,9 +334,7 @@ def _add_param_to_feed_dict(param_name, param_value):
def _get_config_credentials(self) -> Dict[str, Any]:
"""Getter for credentials specified in credentials directory."""
try:
conf_creds = self.config_loader.get(
"credentials*", "credentials*/**", "**/credentials*"
)
conf_creds = self.config_loader["credentials"]
except MissingConfigException as exc:
warn(f"Credentials not found in your Kedro project config.\n{str(exc)}")
conf_creds = {}
Expand Down
4 changes: 1 addition & 3 deletions kedro/framework/session/session.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,9 +182,7 @@ def create( # pylint: disable=too-many-arguments
return session

def _get_logging_config(self) -> Dict[str, Any]:
logging_config = self._get_config_loader().get(
"logging*", "logging*/**", "**/logging*"
)
logging_config = self._get_config_loader()["logging"]
# turn relative paths in logging config into absolute path
# before initialising loggers
logging_config = _convert_paths_to_absolute_posix(
Expand Down
34 changes: 34 additions & 0 deletions tests/config/test_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,19 @@ def proj_catalog_nested(tmp_path):


class TestConfigLoader:
@use_config_dir
def test_load_core_config_dict_get(self, tmp_path):
"""Make sure core config can be fetched with a dict [] access."""
conf = ConfigLoader(str(tmp_path), _DEFAULT_RUN_ENV)
params = conf["parameters"]
catalog = conf["catalog"]

assert params["param1"] == 1
assert catalog["trains"]["type"] == "MemoryDataSet"
assert catalog["cars"]["type"] == "pandas.CSVDataSet"
assert catalog["boats"]["type"] == "MemoryDataSet"
assert not catalog["cars"]["save_args"]["index"]

@use_config_dir
def test_load_local_config(self, tmp_path):
"""Make sure that configs from `local/` override the ones
Expand Down Expand Up @@ -239,6 +252,27 @@ def test_no_files_found(self, tmp_path):
with pytest.raises(MissingConfigException, match=pattern):
ConfigLoader(str(tmp_path), _DEFAULT_RUN_ENV).get("non-existent-pattern")

@use_config_dir
def test_key_not_found_dict_get(self, tmp_path):
"""Check the error if no config files satisfy a given pattern"""
with pytest.raises(KeyError):
# pylint: disable=expression-not-assigned
ConfigLoader(str(tmp_path), _DEFAULT_RUN_ENV)["non-existent-pattern"]

@use_config_dir
def test_no_files_found_dict_get(self, tmp_path):
"""Check the error if no config files satisfy a given pattern"""
pattern = (
r"No files found in "
r"\[\'.*base\', "
r"\'.*local\'\] "
r"matching the glob pattern\(s\): "
r"\[\'credentials\*\', \'credentials\*/\**\', \'\**/credentials\*\'\]"
)
with pytest.raises(MissingConfigException, match=pattern):
# pylint: disable=expression-not-assigned
ConfigLoader(str(tmp_path), _DEFAULT_RUN_ENV)["credentials"]

def test_duplicate_paths(self, tmp_path, caplog):
"""Check that trying to load the same environment config multiple times logs a
warning and skips the reload"""
Expand Down
9 changes: 9 additions & 0 deletions tests/config/test_templated_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,15 @@ def proj_catalog_param_with_default(tmp_path, param_config_with_default):


class TestTemplatedConfigLoader:
@pytest.mark.usefixtures("proj_catalog_param")
def test_get_catalog_config_with_dict_get(self, tmp_path, template_config):
config_loader = TemplatedConfigLoader(
str(tmp_path), globals_dict=template_config
)
config_loader.default_run_env = ""
catalog = config_loader["catalog"]
assert catalog["boats"]["type"] == "SparkDataSet"

@pytest.mark.usefixtures("proj_catalog_param")
def test_catalog_parameterized_w_dict(self, tmp_path, template_config):
"""Test parameterized config with input from dictionary with values"""
Expand Down