Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add globals feature for OmegaConfigLoader using a globals resolver #2921

Merged
merged 29 commits into from
Aug 21, 2023
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
3e2e73e
Refactor load_and_merge_dir()
ankatiyar Aug 9, 2023
9223f85
Try adding globals resolver
ankatiyar Aug 9, 2023
44bc9e7
Minor change
ankatiyar Aug 9, 2023
f4ffa30
Add globals resolver
ankatiyar Aug 11, 2023
5648999
Merge branch 'main' into feat/globals
ankatiyar Aug 11, 2023
ced46c5
Revert refactoring
ankatiyar Aug 14, 2023
ee285f4
Add test + remove self.globals
ankatiyar Aug 15, 2023
7221a16
Allow for nested variables in globals
ankatiyar Aug 15, 2023
6ad693f
Add documentation
ankatiyar Aug 15, 2023
e49f72f
Merge branch 'main' into feat/globals
ankatiyar Aug 15, 2023
4fd5da0
Typo
ankatiyar Aug 15, 2023
84bf3d1
Merge branch 'feat/globals' of https://github.com/kedro-org/kedro int…
ankatiyar Aug 15, 2023
bd84d0a
Add error message + test
ankatiyar Aug 16, 2023
b004b87
Apply suggestions from code review
ankatiyar Aug 17, 2023
c099422
Split test into multiple tests
ankatiyar Aug 17, 2023
6cef54b
Restrict the globals config_patterns
ankatiyar Aug 17, 2023
0d5d95d
Release notes
ankatiyar Aug 17, 2023
d159cdf
Update docs/source/configuration/advanced_configuration.md
ankatiyar Aug 17, 2023
78793ef
Add helpful error message for keys starting with _
ankatiyar Aug 17, 2023
17789a7
Enable setting default value for globals resolver
ankatiyar Aug 18, 2023
b8b066d
Merge branch 'main' into feat/globals
ankatiyar Aug 18, 2023
d76c022
Typo
ankatiyar Aug 18, 2023
6e9c8b0
Merge branch 'feat/globals' of https://github.com/kedro-org/kedro int…
ankatiyar Aug 18, 2023
bed4106
Merge branch 'main' into feat/globals
astrojuanlu Aug 18, 2023
4b1b6f4
Merge branch 'main' into feat/globals
noklam Aug 21, 2023
01af470
Move test for keys starting with _ to the top
ankatiyar Aug 21, 2023
92ab551
Merge branch 'main' into feat/globals
ankatiyar Aug 21, 2023
ed91395
Fix cross ref link in docs
ankatiyar Aug 21, 2023
ca622b7
Merge branch 'feat/globals' of https://github.com/kedro-org/kedro int…
ankatiyar Aug 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 30 additions & 1 deletion docs/source/configuration/advanced_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ folders:
fea: "04_feature"
```

To point your `TemplatedConfigLoader` to the globals file, add it to the the `CONFIG_LOADER_ARGS` variable in [`src/<package_name>/settings.py`](../kedro_project_setup/settings.md):
To point your `TemplatedConfigLoader` to the globals file, add it to the `CONFIG_LOADER_ARGS` variable in [`src/<package_name>/settings.py`](../kedro_project_setup/settings.md):

```python
CONFIG_LOADER_ARGS = {"globals_pattern": "*globals.yml"}
Expand Down Expand Up @@ -124,6 +124,7 @@ This section contains a set of guidance for advanced configuration requirements
* [How to bypass the configuration loading rules](#how-to-bypass-the-configuration-loading-rules)
* [How to use Jinja2 syntax in configuration](#how-to-use-jinja2-syntax-in-configuration)
* [How to do templating with the `OmegaConfigLoader`](#how-to-do-templating-with-the-omegaconfigloader)
* [How to use global variables with the `OmegaConfigLoader`](#how-to-use-global-variables-with-the-omegaconfigloader)
* [How to use resolvers in the `OmegaConfigLoader`](#how-to-use-resolvers-in-the-omegaconfigloader)
* [How to load credentials through environment variables](#how-to-load-credentials-through-environment-variables)

Expand Down Expand Up @@ -262,6 +263,34 @@ Since both of the file names (`catalog.yml` and `catalog_globals.yml`) match the
#### Other configuration files
It's also possible to use variable interpolation in configuration files other than parameters and catalog, such as custom spark or mlflow configuration. This works in the same way as variable interpolation in parameter files. You can still use the underscore for the templated values if you want, but it's not mandatory like it is for catalog files.

### How to use global variables with the `OmegaConfigLoader`
From Kedro `0.18.13`, you can also use variable interpolation in your configurations using "globals" with `OmegaConfigLoader`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: Change version depending on when this is merged.

ankatiyar marked this conversation as resolved.
Show resolved Hide resolved
The benefit of using globals over regular variable interpolation is that the global variables are shared across different configurations.
ankatiyar marked this conversation as resolved.
Show resolved Hide resolved
By default, these global variables are assumed to be in files that follow the naming convention specified by `globals` key in `OmegaConfigLoader`'s
`config_patterns`: `["globals*", "globals*/**", "**/globals*"]`. To change these patterns, you can either [customise the config patterns](#how-to-change-which-configuration-files-are-loaded)
or [bypass the configuration loading](#how-to-bypass-the-configuration-loading-rules).

Suppose you have global variables located in the file `conf/base/globals.yml`:
```yaml
my_global_value: 45
dataset_type:
csv: pandas.CSVDataSet
```
You can access these global variables in your catalog or parameters config files with a `globals` resolver like this:
`conf/base/parameters.yml`:
```yaml
my_param : "${globals:my_global_value}"
```
`conf/base/catalog.yml`:
```yaml
companies:
filepath: data/01_raw/companies.csv
type: "${globals:dataset_type.csv}"
```
If there are duplicate keys in the globals files in your base and run time environments, the values in the run time environment
will overwrite the values in your base environment.


### How to use resolvers in the `OmegaConfigLoader`
Instead of hard-coding values in your configuration files, you can also dynamically compute them using [`OmegaConf`'s
resolvers functionality](https://omegaconf.readthedocs.io/en/2.3_branch/custom_resolvers.html#resolvers). You use resolvers to define custom
Expand Down
5 changes: 3 additions & 2 deletions docs/source/configuration/configuration_basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,18 +61,19 @@ Configuration files will be matched according to file name and type rules. Suppo
### Configuration patterns
Under the hood, the Kedro configuration loader loads files based on regex patterns that specify the naming convention for configuration files. These patterns are specified by `config_patterns` in the configuration loader classes.

By default those patterns are set as follows for the configuration of catalog, parameters, logging and credentials:
By default those patterns are set as follows for the configuration of catalog, parameters, logging, credentials, and globals:

```python
config_patterns = {
"catalog": ["catalog*", "catalog*/**", "**/catalog*"],
"parameters": ["parameters*", "parameters*/**", "**/parameters*"],
"credentials": ["credentials*", "credentials*/**", "**/credentials*"],
"logging": ["logging*", "logging*/**", "**/logging*"],
"globals": ["globals*", "globals*/**", "**/globals*"],
}
```

If you want to change change the way configuration is loaded, you can either [customise the config patterns](advanced_configuration.md#how-to-change-which-configuration-files-are-loaded) or [bypass the configuration loading](advanced_configuration.md#how-to-bypass-the-configuration-loading-rules) as described in the advanced configuration chapter.
If you want to change the way configuration is loaded, you can either [customise the config patterns](advanced_configuration.md#how-to-change-which-configuration-files-are-loaded) or [bypass the configuration loading](advanced_configuration.md#how-to-bypass-the-configuration-loading-rules) as described in the advanced configuration chapter.

## How to use Kedro configuration

Expand Down
1 change: 1 addition & 0 deletions docs/source/faq/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ This is a growing set of technical FAQs. The [product FAQs on the Kedro website]
* [How do I bypass the configuration loading rules](../configuration/advanced_configuration.md#how-to-bypass-the-configuration-loading-rules)?
* [How do I use Jinja2 syntax in configuration](../configuration/advanced_configuration.md#how-to-use-jinja2-syntax-in-configuration)?
* [How do I do templating with the `OmegaConfigLoader`](../configuration/advanced_configuration.md#how-to-do-templating-with-the-omegaconfigloader)?
* [How to use global variables with the `OmegaConfigLoader`](../configuration/advanced_configuration.m#how-to-use-global-variables-with-the-omegaconfigloader)?
* [How do I use resolvers in the `OmegaConfigLoader`](../configuration/advanced_configuration.md#how-to-use-resolvers-in-the-omegaconfigloader)?
* [How do I load credentials through environment variables](../configuration/advanced_configuration.md#how-to-load-credentials-through-environment-variables)?

Expand Down
25 changes: 23 additions & 2 deletions kedro/config/omegaconf_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

import fsspec
from omegaconf import OmegaConf
from omegaconf.errors import InterpolationResolutionError
from omegaconf.resolvers import oc
from yaml.parser import ParserError
from yaml.scanner import ScannerError
Expand Down Expand Up @@ -109,6 +110,7 @@ def __init__( # noqa: too-many-arguments
"parameters": ["parameters*", "parameters*/**", "**/parameters*"],
"credentials": ["credentials*", "credentials*/**", "**/credentials*"],
"logging": ["logging*", "logging*/**", "**/logging*"],
"globals": ["globals*", "globals*/**", "**/globals*"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        >>> # in settings.py
        >>> from kedro.config import TemplatedConfigLoader
        >>>
        >>> CONFIG_LOADER_CLASS = TemplatedConfigLoader
        >>> CONFIG_LOADER_ARGS = {
        >>>     "globals_pattern": "*globals.yml",
        >>> }
Suggested change
"globals": ["globals*", "globals*/**", "**/globals*"],
"globals": ["globals*"],

I suggest keep this pattern simple.

  1. Align with the current TemplatedConfigLoader pattern, the above code block is copy from our template
  2. Most people don't need nested globals, putting globals in folder should be minority, if so they should change the settings instead.
  3. **/globals* is quite dangerous, as this could match parameters/globals_parameters.yml - although we usually suggest people using parameter_globals instead.

I may even suggest a more conservative default "globals": "globals.yml", forcing the default to be just globals.yml. If we change this after release it will be breaking change.

WDYT? @ankatiyar @merelcht , cc @stichbury because I really think we should separate the terminology for globals, it's being overloaded a lot now and this can lead to weird bugs.

Copy link
Contributor

@noklam noklam Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offline conversation - agree with using default with globals.yml. Cc @merelcht @ankatiyar

}
self.config_patterns.update(config_patterns or {})

Expand All @@ -117,7 +119,8 @@ def __init__( # noqa: too-many-arguments
# Register user provided custom resolvers
if custom_resolvers:
self._register_new_resolvers(custom_resolvers)

# Register globals resolver
self._register_globals_resolver()
file_mimetype, _ = mimetypes.guess_type(conf_source)
if file_mimetype == "application/x-tar":
self._protocol = "tar"
Expand Down Expand Up @@ -199,7 +202,7 @@ def __getitem__(self, key) -> dict[str, Any]:

config.update(env_config)

if not processed_files:
if not processed_files and key != "globals":
raise MissingConfigException(
f"No files of YAML or JSON format found in {base_path} or {env_path} matching"
f" the glob pattern(s): {[*self.config_patterns[key]]}"
Expand Down Expand Up @@ -308,6 +311,24 @@ def _is_valid_config_path(self, path):
".json",
]

def _register_globals_resolver(self):
"""Register the globals resolver"""
OmegaConf.register_new_resolver(
"globals", lambda x: self._get_globals_value(x), replace=True
)

def _get_globals_value(self, variable):
"""Return the globals values to the resolver"""
keys = variable.split(".")
value = self["globals"]
for k in keys:
value = value.get(k)
if not value:
raise InterpolationResolutionError(
f"Globals key '{variable}' not found."
)
return value

@staticmethod
def _register_new_resolvers(resolvers: dict[str, Callable]):
"""Register custom resolvers"""
Expand Down
87 changes: 87 additions & 0 deletions tests/config/test_omegaconf_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import pytest
import yaml
from omegaconf import OmegaConf, errors
from omegaconf.errors import InterpolationResolutionError
from omegaconf.resolvers import oc
from yaml.parser import ParserError

Expand Down Expand Up @@ -671,3 +672,89 @@ def test_custom_resolvers(self, tmp_path):
assert conf["parameters"]["model_options"]["param1"] == 7
assert conf["parameters"]["model_options"]["param2"] == 3
assert conf["parameters"]["model_options"]["param3"] == "my_env_variable"

def test_globals(self, tmp_path):
ankatiyar marked this conversation as resolved.
Show resolved Hide resolved
base_params = tmp_path / _BASE_ENV / "parameters.yml"
base_catalog = tmp_path / _BASE_ENV / "catalog.yml"
globals_params = tmp_path / _BASE_ENV / "globals.yml"
globals_params_folder = tmp_path / _BASE_ENV / "globals" / "my_globals.yml"
param_config = {
"my_param": "${globals:x}",
"my_nested_param": "${globals:nested.y}",
}
catalog_config = {
"companies": {
"type": "${globals:dataset_type}",
"filepath": "data/01_raw/companies.csv",
},
}
globals_config_1 = {
"x": 34,
"nested": {
"y": 42,
},
}
globals_config_2 = {"dataset_type": "pandas.CSVDataSet"}
_write_yaml(base_params, param_config)
_write_yaml(globals_params, globals_config_1)
_write_yaml(globals_params_folder, globals_config_2)
_write_yaml(base_catalog, catalog_config)
conf = OmegaConfigLoader(tmp_path, default_run_env="")
assert OmegaConf.has_resolver("globals")
globals_config = {**globals_config_1, **globals_config_2}
# Globals is readable in a dict way
assert conf["globals"] == globals_config
# Globals are resolved correctly in parameters
assert conf["parameters"]["my_param"] == globals_config_1["x"]
# Nested globals are accessible with dot notation
assert conf["parameters"]["my_nested_param"] == globals_config_1["nested"]["y"]
# Globals are resolved correctly in catalog
assert conf["catalog"]["companies"]["type"] == globals_config_2["dataset_type"]

def test_globals_across_env(self, tmp_path):
base_params = tmp_path / _BASE_ENV / "parameters.yml"
local_params = tmp_path / _DEFAULT_RUN_ENV / "parameters.yml"
base_globals = tmp_path / _BASE_ENV / "globals.yml"
local_globals = tmp_path / _DEFAULT_RUN_ENV / "globals.yml"
base_param_config = {
"param1": "${globals:y}",
}
local_param_config = {
"param2": "${globals:x}",
}
base_globals_config = {
"x": 34,
"y": 25,
}
local_globals_config = {
"y": 99,
marrrcin marked this conversation as resolved.
Show resolved Hide resolved
}
_write_yaml(base_params, base_param_config)
_write_yaml(local_params, local_param_config)
_write_yaml(base_globals, base_globals_config)
_write_yaml(local_globals, local_globals_config)
conf = OmegaConfigLoader(tmp_path)
# Local global overwrites the base global value
assert conf["parameters"]["param1"] == local_globals_config["y"]
# Base global value is accessible to local params
assert conf["parameters"]["param2"] == base_globals_config["x"]

def test_bad_globals(self, tmp_path):
base_params = tmp_path / _BASE_ENV / "parameters.yml"
base_globals = tmp_path / _BASE_ENV / "globals.yml"
base_param_config = {
"param1": "${globals:x.y}",
}
base_globals_config = {
"x": {
"z": 23,
}
}
_write_yaml(base_params, base_param_config)
_write_yaml(base_globals, base_globals_config)
conf = OmegaConfigLoader(tmp_path, default_run_env="")
# Base global value is accessible to local params
with pytest.raises(
InterpolationResolutionError, match=r"Globals key 'x.y' not found."
):
conf["parameters"]["param1"]