Skip to content

Commit

Permalink
Add globals feature for OmegaConfigLoader using a globals resolver (#…
Browse files Browse the repository at this point in the history
…2921)

* Refactor load_and_merge_dir()

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Try adding globals resolver

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Minor change

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Add globals resolver

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Revert refactoring

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Add test + remove self.globals

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Allow for nested variables in globals

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Add documentation

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Typo

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Add error message + test

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Apply suggestions from code review

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>

* Split test into multiple tests

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Restrict the globals config_patterns

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Release notes

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Update docs/source/configuration/advanced_configuration.md

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>

* Add helpful error message for keys starting with _

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Enable setting default value for globals resolver

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Typo

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Move test for keys starting with _ to the top

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Fix cross ref link in docs

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

---------

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
  • Loading branch information
4 people authored Aug 21, 2023
1 parent 74b2a88 commit c9fc80a
Show file tree
Hide file tree
Showing 6 changed files with 201 additions and 5 deletions.
2 changes: 2 additions & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
* Allowed registering of custom resolvers to `OmegaConfigLoader` through `CONFIG_LOADER_ARGS`.
* Added support for Python 3.11. This includes tackling challenges like dependency pinning and test adjustments to ensure a smooth experience. Detailed migration tips are provided below for further context.
* Added `kedro catalog resolve` CLI command that resolves dataset factories in the catalog with any explicit entries in the project pipeline.
* Added support for global variables to `OmegaConfigLoader`.


## Bug fixes and other changes
* Updated `kedro pipeline create` and `kedro catalog create` to use new `/conf` file structure.
Expand Down
35 changes: 34 additions & 1 deletion docs/source/configuration/advanced_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ folders:
fea: "04_feature"
```
To point your `TemplatedConfigLoader` to the globals file, add it to the the `CONFIG_LOADER_ARGS` variable in [`src/<package_name>/settings.py`](../kedro_project_setup/settings.md):
To point your `TemplatedConfigLoader` to the globals file, add it to the `CONFIG_LOADER_ARGS` variable in [`src/<package_name>/settings.py`](../kedro_project_setup/settings.md):

```python
CONFIG_LOADER_ARGS = {"globals_pattern": "*globals.yml"}
Expand Down Expand Up @@ -124,6 +124,7 @@ This section contains a set of guidance for advanced configuration requirements
* [How to bypass the configuration loading rules](#how-to-bypass-the-configuration-loading-rules)
* [How to use Jinja2 syntax in configuration](#how-to-use-jinja2-syntax-in-configuration)
* [How to do templating with the `OmegaConfigLoader`](#how-to-do-templating-with-the-omegaconfigloader)
* [How to use global variables with the `OmegaConfigLoader`](#how-to-use-global-variables-with-the-omegaconfigloader)
* [How to use resolvers in the `OmegaConfigLoader`](#how-to-use-resolvers-in-the-omegaconfigloader)
* [How to load credentials through environment variables](#how-to-load-credentials-through-environment-variables)

Expand Down Expand Up @@ -262,6 +263,38 @@ Since both of the file names (`catalog.yml` and `catalog_globals.yml`) match the
#### Other configuration files
It's also possible to use variable interpolation in configuration files other than parameters and catalog, such as custom spark or mlflow configuration. This works in the same way as variable interpolation in parameter files. You can still use the underscore for the templated values if you want, but it's not mandatory like it is for catalog files.

### How to use global variables with the `OmegaConfigLoader`
From Kedro `0.18.13`, you can use variable interpolation in your configurations using "globals" with `OmegaConfigLoader`.
The benefit of using globals over regular variable interpolation is that the global variables are shared across different configuration types, such as catalog and parameters.
By default, these global variables are assumed to be in files called `globals.yml` in any of your environments. If you want to configure the naming patterns for the files that contain your global variables,
you can do so [by overwriting the `globals` key in `config_patterns`](#how-to-change-which-configuration-files-are-loaded). You can also [bypass the configuration loading](#how-to-bypass-the-configuration-loading-rules)
to directly set the global variables in `OmegaConfigLoader`.

Suppose you have global variables located in the file `conf/base/globals.yml`:
```yaml
my_global_value: 45
dataset_type:
csv: pandas.CSVDataSet
```
You can access these global variables in your catalog or parameters config files with a `globals` resolver like this:
`conf/base/parameters.yml`:
```yaml
my_param : "${globals:my_global_value}"
```
`conf/base/catalog.yml`:
```yaml
companies:
filepath: data/01_raw/companies.csv
type: "${globals:dataset_type.csv}"
```
You can also provide a default value to be used in case the global variable does not exist:
```yaml
my_param: "${globals: nonexistent_global, 23}"
```
If there are duplicate keys in the globals files in your base and run time environments, the values in the run time environment
will overwrite the values in your base environment.


### How to use resolvers in the `OmegaConfigLoader`
Instead of hard-coding values in your configuration files, you can also dynamically compute them using [`OmegaConf`'s
resolvers functionality](https://omegaconf.readthedocs.io/en/2.3_branch/custom_resolvers.html#resolvers). You use resolvers to define custom
Expand Down
5 changes: 3 additions & 2 deletions docs/source/configuration/configuration_basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,18 +61,19 @@ Configuration files will be matched according to file name and type rules. Suppo
### Configuration patterns
Under the hood, the Kedro configuration loader loads files based on regex patterns that specify the naming convention for configuration files. These patterns are specified by `config_patterns` in the configuration loader classes.

By default those patterns are set as follows for the configuration of catalog, parameters, logging and credentials:
By default those patterns are set as follows for the configuration of catalog, parameters, logging, credentials, and globals:

```python
config_patterns = {
"catalog": ["catalog*", "catalog*/**", "**/catalog*"],
"parameters": ["parameters*", "parameters*/**", "**/parameters*"],
"credentials": ["credentials*", "credentials*/**", "**/credentials*"],
"logging": ["logging*", "logging*/**", "**/logging*"],
"globals": ["globals*", "globals*/**", "**/globals*"],
}
```

If you want to change change the way configuration is loaded, you can either [customise the config patterns](advanced_configuration.md#how-to-change-which-configuration-files-are-loaded) or [bypass the configuration loading](advanced_configuration.md#how-to-bypass-the-configuration-loading-rules) as described in the advanced configuration chapter.
If you want to change the way configuration is loaded, you can either [customise the config patterns](advanced_configuration.md#how-to-change-which-configuration-files-are-loaded) or [bypass the configuration loading](advanced_configuration.md#how-to-bypass-the-configuration-loading-rules) as described in the advanced configuration chapter.

## How to use Kedro configuration

Expand Down
1 change: 1 addition & 0 deletions docs/source/faq/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ This is a growing set of technical FAQs. The [product FAQs on the Kedro website]
* [How do I bypass the configuration loading rules](../configuration/advanced_configuration.md#how-to-bypass-the-configuration-loading-rules)?
* [How do I use Jinja2 syntax in configuration](../configuration/advanced_configuration.md#how-to-use-jinja2-syntax-in-configuration)?
* [How do I do templating with the `OmegaConfigLoader`](../configuration/advanced_configuration.md#how-to-do-templating-with-the-omegaconfigloader)?
* [How to use global variables with the `OmegaConfigLoader`](../configuration/advanced_configuration.md#how-to-use-global-variables-with-the-omegaconfigloader)?
* [How do I use resolvers in the `OmegaConfigLoader`](../configuration/advanced_configuration.md#how-to-use-resolvers-in-the-omegaconfigloader)?
* [How do I load credentials through environment variables](../configuration/advanced_configuration.md#how-to-load-credentials-through-environment-variables)?

Expand Down
37 changes: 35 additions & 2 deletions kedro/config/omegaconf_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

import fsspec
from omegaconf import OmegaConf
from omegaconf.errors import InterpolationResolutionError
from omegaconf.resolvers import oc
from yaml.parser import ParserError
from yaml.scanner import ScannerError
Expand Down Expand Up @@ -109,6 +110,7 @@ def __init__( # noqa: too-many-arguments
"parameters": ["parameters*", "parameters*/**", "**/parameters*"],
"credentials": ["credentials*", "credentials*/**", "**/credentials*"],
"logging": ["logging*", "logging*/**", "**/logging*"],
"globals": ["globals.yml"],
}
self.config_patterns.update(config_patterns or {})

Expand All @@ -117,7 +119,8 @@ def __init__( # noqa: too-many-arguments
# Register user provided custom resolvers
if custom_resolvers:
self._register_new_resolvers(custom_resolvers)

# Register globals resolver
self._register_globals_resolver()
file_mimetype, _ = mimetypes.guess_type(conf_source)
if file_mimetype == "application/x-tar":
self._protocol = "tar"
Expand Down Expand Up @@ -199,7 +202,7 @@ def __getitem__(self, key) -> dict[str, Any]:

config.update(env_config)

if not processed_files:
if not processed_files and key != "globals":
raise MissingConfigException(
f"No files of YAML or JSON format found in {base_path} or {env_path} matching"
f" the glob pattern(s): {[*self.config_patterns[key]]}"
Expand Down Expand Up @@ -308,6 +311,36 @@ def _is_valid_config_path(self, path):
".json",
]

def _register_globals_resolver(self):
"""Register the globals resolver"""
OmegaConf.register_new_resolver(
"globals",
lambda variable, default_value=None: self._get_globals_value(
variable, default_value
),
replace=True,
)

def _get_globals_value(self, variable, default_value):
"""Return the globals values to the resolver"""
if variable.startswith("_"):
raise InterpolationResolutionError(
"Keys starting with '_' are not supported for globals."
)
keys = variable.split(".")
value = self["globals"]
for k in keys:
value = value.get(k)
if not value:
if default_value:
_config_logger.debug(
f"Using the default value for the global variable {variable}."
)
return default_value
msg = f"Globals key '{variable}' not found and no default value provided. "
raise InterpolationResolutionError(msg)
return value

@staticmethod
def _register_new_resolvers(resolvers: dict[str, Callable]):
"""Register custom resolvers"""
Expand Down
126 changes: 126 additions & 0 deletions tests/config/test_omegaconf_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import pytest
import yaml
from omegaconf import OmegaConf, errors
from omegaconf.errors import InterpolationResolutionError
from omegaconf.resolvers import oc
from yaml.parser import ParserError

Expand Down Expand Up @@ -671,3 +672,128 @@ def test_custom_resolvers(self, tmp_path):
assert conf["parameters"]["model_options"]["param1"] == 7
assert conf["parameters"]["model_options"]["param2"] == 3
assert conf["parameters"]["model_options"]["param3"] == "my_env_variable"

def test_globals(self, tmp_path):
globals_params = tmp_path / _BASE_ENV / "globals.yml"
globals_config = {
"x": 34,
}
_write_yaml(globals_params, globals_config)
conf = OmegaConfigLoader(tmp_path, default_run_env="")
# OmegaConfigLoader has globals resolver
assert OmegaConf.has_resolver("globals")
# Globals is readable in a dict way
assert conf["globals"] == globals_config

def test_globals_resolution(self, tmp_path):
base_params = tmp_path / _BASE_ENV / "parameters.yml"
base_catalog = tmp_path / _BASE_ENV / "catalog.yml"
globals_params = tmp_path / _BASE_ENV / "globals.yml"
param_config = {
"my_param": "${globals:x}",
"my_param_default": "${globals:y,34}", # y does not exist in globals
}
catalog_config = {
"companies": {
"type": "${globals:dataset_type}",
"filepath": "data/01_raw/companies.csv",
},
}
globals_config = {"x": 34, "dataset_type": "pandas.CSVDataSet"}
_write_yaml(base_params, param_config)
_write_yaml(globals_params, globals_config)
_write_yaml(base_catalog, catalog_config)
conf = OmegaConfigLoader(tmp_path, default_run_env="")
assert OmegaConf.has_resolver("globals")
# Globals are resolved correctly in parameter
assert conf["parameters"]["my_param"] == globals_config["x"]
# The default value is used if the key does not exist
assert conf["parameters"]["my_param_default"] == 34
# Globals are resolved correctly in catalog
assert conf["catalog"]["companies"]["type"] == globals_config["dataset_type"]

def test_globals_nested(self, tmp_path):
base_params = tmp_path / _BASE_ENV / "parameters.yml"
globals_params = tmp_path / _BASE_ENV / "globals.yml"
param_config = {
"my_param": "${globals:x}",
"my_nested_param": "${globals:nested.y}",
}
globals_config = {
"x": 34,
"nested": {
"y": 42,
},
}
_write_yaml(base_params, param_config)
_write_yaml(globals_params, globals_config)
conf = OmegaConfigLoader(tmp_path, default_run_env="")
assert conf["parameters"]["my_param"] == globals_config["x"]
# Nested globals are accessible with dot notation
assert conf["parameters"]["my_nested_param"] == globals_config["nested"]["y"]

def test_globals_across_env(self, tmp_path):
base_params = tmp_path / _BASE_ENV / "parameters.yml"
local_params = tmp_path / _DEFAULT_RUN_ENV / "parameters.yml"
base_globals = tmp_path / _BASE_ENV / "globals.yml"
local_globals = tmp_path / _DEFAULT_RUN_ENV / "globals.yml"
base_param_config = {
"param1": "${globals:y}",
}
local_param_config = {
"param2": "${globals:x}",
}
base_globals_config = {
"x": 34,
"y": 25,
}
local_globals_config = {
"y": 99,
}
_write_yaml(base_params, base_param_config)
_write_yaml(local_params, local_param_config)
_write_yaml(base_globals, base_globals_config)
_write_yaml(local_globals, local_globals_config)
conf = OmegaConfigLoader(tmp_path)
# Local global overwrites the base global value
assert conf["parameters"]["param1"] == local_globals_config["y"]
# Base global value is accessible to local params
assert conf["parameters"]["param2"] == base_globals_config["x"]

def test_bad_globals(self, tmp_path):
base_params = tmp_path / _BASE_ENV / "parameters.yml"
base_globals = tmp_path / _BASE_ENV / "globals.yml"
base_param_config = {
"param1": "${globals:x.y}",
}
base_globals_config = {
"x": {
"z": 23,
},
}
_write_yaml(base_params, base_param_config)
_write_yaml(base_globals, base_globals_config)
conf = OmegaConfigLoader(tmp_path, default_run_env="")
with pytest.raises(
InterpolationResolutionError,
match=r"Globals key 'x.y' not found and no default value provided.",
):
conf["parameters"]["param1"]

def test_bad_globals_underscore(self, tmp_path):
base_params = tmp_path / _BASE_ENV / "parameters.yml"
base_globals = tmp_path / _BASE_ENV / "globals.yml"
base_param_config = {
"param2": "${globals:_ignore}",
}
base_globals_config = {
"_ignore": 45,
}
_write_yaml(base_params, base_param_config)
_write_yaml(base_globals, base_globals_config)
conf = OmegaConfigLoader(tmp_path, default_run_env="")
with pytest.raises(
InterpolationResolutionError,
match=r"Keys starting with '_' are not supported for globals.",
):
conf["parameters"]["param2"]

0 comments on commit c9fc80a

Please sign in to comment.