diff --git a/RELEASE.md b/RELEASE.md index 09416a5b36..f6bfa3b60c 100644 --- a/RELEASE.md +++ b/RELEASE.md @@ -8,6 +8,17 @@ ## Migration guide from Kedro 0.18.* to 0.19.* +# Upcoming Release 0.18.10 + +## Major features and improvements +* Added support for variable interpolation in the catalog with the `OmegaConfigLoader`. + +## Bug fixes and other changes + +## Breaking changes to the API + +## Upcoming deprecations for Kedro 0.19.0 + # Release 0.18.9 ## Major features and improvements @@ -35,7 +46,6 @@ Many thanks to the following Kedroids for contributing PRs to this release: ## Upcoming deprecations for Kedro 0.19.0 - # Release 0.18.8 ## Major features and improvements diff --git a/docs/source/configuration/advanced_configuration.md b/docs/source/configuration/advanced_configuration.md index 4e5a8b9b3a..efd71a8564 100644 --- a/docs/source/configuration/advanced_configuration.md +++ b/docs/source/configuration/advanced_configuration.md @@ -218,6 +218,7 @@ Although Jinja2 is a very powerful and extremely flexible template engine, which ### How to do templating with the `OmegaConfigLoader` +#### Parameters Templating or [variable interpolation](https://omegaconf.readthedocs.io/en/2.3_branch/usage.html#variable-interpolation), as it's called in `OmegaConf`, for parameters works out of the box if the template values are within the parameter files or the name of the file that contains the template values follows the same config pattern specified for parameters. By default, the config pattern for parameters is: `["parameters*", "parameters*/**", "**/parameters*"]`. Suppose you have one parameters file called `parameters.yml` containing parameters with `omegaconf` placeholders like this: @@ -236,10 +237,31 @@ data: Since both of the file names (`parameters.yml` and `parameters_globals.yml`) match the config pattern for parameters, the `OmegaConfigLoader` will load the files and resolve the placeholders correctly. -```{note} -Templating currently only works for parameter files, but not for catalog files. +#### Catalog +From Kedro `0.18.10` templating also works for catalog files. To enable templating in the catalog you need to ensure that the template values are within the catalog files or the name of the file that contains the template values follows the same config pattern specified for catalogs. +By default, the config pattern for catalogs is: `["catalog*", "catalog*/**", "**/catalog*"]`. + +Additionally, any template values in the catalog need to start with an underscore `_`. This is because of how catalog entries are validated. Templated values will neither trigger a key duplication error nor appear in the resulting configuration dictionary. + +Suppose you have one catalog file called `catalog.yml` containing entries with `omegaconf` placeholders like this: + +```yaml +companies: + type: ${_pandas.type} + filepath: data/01_raw/companies.csv ``` +and a file containing the template values called `catalog_globals.yml`: +```yaml +_pandas: + type: pandas.CSVDataSet +``` + +Since both of the file names (`catalog.yml` and `catalog_globals.yml`) match the config pattern for catalogs, the `OmegaConfigLoader` will load the files and resolve the placeholders correctly. + +#### Other configuration files +It's also possible to use variable interpolation in configuration files other than parameters and catalog, such as custom spark or mlflow configuration. This works in the same way as variable interpolation in parameter files. You can still use the underscore for the templated values if you want, but it's not mandatory like it is for catalog files. + ### How to use custom resolvers in the `OmegaConfigLoader` `Omegaconf` provides functionality to [register custom resolvers](https://omegaconf.readthedocs.io/en/2.3_branch/usage.html#resolvers) for templated values. You can use these custom resolves within Kedro by extending the [`OmegaConfigLoader`](/kedro.config.OmegaConfigLoader) class. The example below illustrates this: diff --git a/docs/source/configuration/configuration_basics.md b/docs/source/configuration/configuration_basics.md index 54a1396ddf..197d8b2478 100644 --- a/docs/source/configuration/configuration_basics.md +++ b/docs/source/configuration/configuration_basics.md @@ -45,7 +45,8 @@ Kedro merges configuration information and returns a configuration dictionary ac * If any two configuration files located inside the **same** environment path (such as `conf/base/`) contain the same top-level key, the configuration loader raises a `ValueError` indicating that duplicates are not allowed. * If two configuration files contain the same top-level key but are in **different** environment paths (for example, one in `conf/base/`, another in `conf/local/`) then the last loaded path (`conf/local/`) takes precedence as the key value. `ConfigLoader.get` does not raise any errors but a `DEBUG` level log message is emitted with information on the overridden keys. -When using the default `ConfigLoader` or the `TemplatedConfigLoader`, any top-level keys that start with `_` are considered hidden (or reserved) and are ignored. Those keys will neither trigger a key duplication error nor appear in the resulting configuration dictionary. However, you can still use such keys, for example, as [YAML anchors and aliases](https://www.educative.io/blog/advanced-yaml-syntax-cheatsheet#anchors). +When using any of the configuration loaders, any top-level keys that start with `_` are considered hidden (or reserved) and are ignored. Those keys will neither trigger a key duplication error nor appear in the resulting configuration dictionary. However, you can still use such keys, for example, as [YAML anchors and aliases](https://www.educative.io/blog/advanced-yaml-syntax-cheatsheet#anchors) +or [to enable templating in the catalog when using the `OmegaConfigLoader`](advanced_configuration.md#how-to-do-templating-with-the-omegaconfigloader). ### Configuration file names Configuration files will be matched according to file name and type rules. Suppose the config loader needs to fetch the catalog configuration, it will search according to the following rules: diff --git a/kedro/config/omegaconf_config.py b/kedro/config/omegaconf_config.py index ac4e2fc56d..75303e2902 100644 --- a/kedro/config/omegaconf_config.py +++ b/kedro/config/omegaconf_config.py @@ -286,7 +286,13 @@ def load_and_merge_dir_config( # pylint: disable=too-many-arguments return OmegaConf.to_container( OmegaConf.merge(*aggregate_config, self.runtime_params), resolve=True ) - return OmegaConf.to_container(OmegaConf.merge(*aggregate_config), resolve=True) + return { + k: v + for k, v in OmegaConf.to_container( + OmegaConf.merge(*aggregate_config), resolve=True + ).items() + if not k.startswith("_") + } def _is_valid_config_path(self, path): """Check if given path is a file path and file type is yaml or json.""" @@ -307,7 +313,10 @@ def _check_duplicates(seen_files_to_keys: dict[Path, set[Any]]): for filepath2 in filepaths[i:]: config2 = seen_files_to_keys[filepath2] - overlapping_keys = config1 & config2 + combined_keys = config1 & config2 + overlapping_keys = { + key for key in combined_keys if not key.startswith("_") + } if overlapping_keys: sorted_keys = ", ".join(sorted(overlapping_keys)) diff --git a/tests/config/test_omegaconf_config.py b/tests/config/test_omegaconf_config.py index a0b152039a..dd49292019 100644 --- a/tests/config/test_omegaconf_config.py +++ b/tests/config/test_omegaconf_config.py @@ -73,7 +73,6 @@ def create_config_dir(tmp_path, base_config, local_config): base_catalog = tmp_path / _BASE_ENV / "catalog.yml" base_logging = tmp_path / _BASE_ENV / "logging.yml" base_spark = tmp_path / _BASE_ENV / "spark.yml" - base_catalog = tmp_path / _BASE_ENV / "catalog.yml" local_catalog = tmp_path / _DEFAULT_RUN_ENV / "catalog.yml" @@ -596,3 +595,57 @@ def test_runtime_params_not_propogate_non_parameters_config(self, tmp_path): assert key not in credentials assert key not in logging assert key not in spark + + def test_ignore_hidden_keys(self, tmp_path): + """Check that the config key starting with `_` are ignored and also + don't cause a config merge error""" + _write_yaml(tmp_path / _BASE_ENV / "catalog1.yml", {"k1": "v1", "_k2": "v2"}) + _write_yaml(tmp_path / _BASE_ENV / "catalog2.yml", {"k3": "v3", "_k2": "v4"}) + + conf = OmegaConfigLoader(str(tmp_path)) + conf.default_run_env = "" + catalog = conf["catalog"] + assert catalog.keys() == {"k1", "k3"} + + _write_yaml(tmp_path / _BASE_ENV / "catalog3.yml", {"k1": "dup", "_k2": "v5"}) + pattern = ( + r"Duplicate keys found in " + r"(.*catalog1\.yml and .*catalog3\.yml|.*catalog3\.yml and .*catalog1\.yml)" + r"\: k1" + ) + with pytest.raises(ValueError, match=pattern): + conf["catalog"] + + def test_variable_interpolation_in_catalog_with_templates(self, tmp_path): + base_catalog = tmp_path / _BASE_ENV / "catalog.yml" + catalog_config = { + "companies": { + "type": "${_pandas.type}", + "filepath": "data/01_raw/companies.csv", + }, + "_pandas": {"type": "pandas.CSVDataSet"}, + } + _write_yaml(base_catalog, catalog_config) + + conf = OmegaConfigLoader(str(tmp_path)) + conf.default_run_env = "" + assert conf["catalog"]["companies"]["type"] == "pandas.CSVDataSet" + + def test_variable_interpolation_in_catalog_with_separate_templates_file( + self, tmp_path + ): + base_catalog = tmp_path / _BASE_ENV / "catalog.yml" + catalog_config = { + "companies": { + "type": "${_pandas.type}", + "filepath": "data/01_raw/companies.csv", + } + } + tmp_catalog = tmp_path / _BASE_ENV / "catalog_temp.yml" + template = {"_pandas": {"type": "pandas.CSVDataSet"}} + _write_yaml(base_catalog, catalog_config) + _write_yaml(tmp_catalog, template) + + conf = OmegaConfigLoader(str(tmp_path)) + conf.default_run_env = "" + assert conf["catalog"]["companies"]["type"] == "pandas.CSVDataSet"