Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs for the new OmegaConfLoader #2177

Merged
merged 12 commits into from
Jan 10, 2023
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,7 @@
"CircularDependencyError",
"OutputNotUniqueError",
"ConfirmNotUniqueError",
"ParserError",
),
}
# https://stackoverflow.com/questions/61770698/sphinx-nit-picky-mode-but-only-for-links-i-explicitly-wrote
Expand Down
1 change: 1 addition & 0 deletions docs/source/api_docs/kedro.config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ kedro.config

kedro.config.ConfigLoader
kedro.config.TemplatedConfigLoader
kedro.config.OmegaConfLoader

.. rubric:: Exceptions

Expand Down
51 changes: 51 additions & 0 deletions docs/source/kedro_project_setup/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,19 @@ CONFIG_LOADER_ARGS = {
}
```

You can also bypass the configuration patterns and set configuration directly on the instance of a config loader class. You can bypass the default configuration (catalog, parameters, credentials, and logging) as well as additional configuration.

```python
from kedro.config import ConfigLoader
from kedro.framework.project import settings

conf_path = str(project_path / settings.CONF_SOURCE)
conf_loader = ConfigLoader(conf_source=conf_path, env="local")

# Bypass configuration patterns by setting the key and values directly on the config loader instance.
conf_loader["catalog"] = {"catalog_config": "something_new"}
```

Configuration information from files stored in `base` or `local` that match these rules is merged at runtime and returned as a config dictionary:

* If any two configuration files located inside the same environment path (`conf/base/` or `conf/local/` in this example) contain the same top-level key, `load_config` will raise a `ValueError` indicating that the duplicates are not allowed.
Expand Down Expand Up @@ -218,6 +231,44 @@ The output Python dictionary will look as follows:
Although Jinja2 is a very powerful and extremely flexible template engine, which comes with a wide range of features, we do not recommend using it to template your configuration unless absolutely necessary. The flexibility of dynamic configuration comes at a cost of significantly reduced readability and much higher maintenance overhead. We believe that, for the majority of analytics projects, dynamically compiled configuration does more harm than good.
```

## Configuration with OmegaConf

[OmegaConf](https://omegaconf.readthedocs.io/) is a Python library for configuration. It is a YAML-based hierarchical configuration system with support for merging configurations from multiple sources.
From Kedro 0.18.5 you can use the [`OmegaConfLoader`](/kedro.config.OmegaConfLoader) which uses `OmegaConf` under the hood to load data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Kedro 0.18.5 you can use the OmegaConfLoader which uses OmegaConf under the hood to load data.

Does this mean it's not yet available? So we'd put docs out ahead of feature going into a release? Does it make sense to flag that you need the development version of kedro and explain how to get it?


```{note}
`OmegaConfLoader` is under active development and will be available from Kedro 0.18.5. New features will be added in future releases. Let us know if you have any feedback about the `OmegaConfLoader` or ideas for new features.
```

The `OmegaConfLoader` can load `YAML` and `JSON` files. Acceptable file extensions are `.yml`, `.yaml`, and `.json`. By default, any configuration files used by the config loaders in Kedro are `.yml` files.

To use the `OmegaConfLoader` in your project, set the `CONFIG_LOADER_CLASS` constant in your [`src/<package_name>/settings.py`](settings.md):

```python
from kedro.config import OmegaConfLoader # new import

CONFIG_LOADER_CLASS = OmegaConfLoader
```

### Templating for parameters
Templating or [variable interpolation](https://omegaconf.readthedocs.io/en/2.3_branch/usage.html#variable-interpolation), as it's called in `OmegaConf`, for parameters works out of the box if one condition is met: the name of the file that contains the template values must follow the same config pattern specified for parameters.
By default, the config pattern for parameters is: `["parameters*", "parameters*/**", "**/parameters*"]`.
Suppose you have one parameters file called `parameters.yml` containing parameters with `omegaconf` placeholders like this:

```yaml
model_options:
test_size: ${data.size}
random_state: 3
```

and a file containing the template values called `parameters_globals.yml`:
```yaml
data:
size: 0.2
```

Since both of the file names (`parameters.yml` and `parameters_globals.yml`) match the config pattern for parameters, the `OmegaConfLoader` will load the files and resolve the placeholders correctly.


## Parameters

Expand Down
4 changes: 2 additions & 2 deletions kedro/extras/datasets/video/video_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,13 +202,13 @@ class VideoDataSet(AbstractDataSet[AbstractVideo, AbstractVideo]):
data_catalog.html#use-the-data-catalog-with-the-yaml-api>`_:

.. code-block:: yaml

>>> cars:
>>> type: video.VideoDataSet
>>> filepath: data/01_raw/cars.mp4
>>>
>>> cars:
>>> motorbikes:
>>> type: video.VideoDataSet
>>> filepath: data/01_raw/cars.mp4
>>> filepath: s3://your_bucket/data/02_intermediate/company/motorbikes.mp4
>>> credentials: dev_s3
>>>
Expand Down
2 changes: 1 addition & 1 deletion kedro/runner/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,7 @@ def run_node(

Raises:
ValueError: Raised if is_async is set to True for nodes wrapping
generator functions.
generator functions.

Returns:
The node argument.
Expand Down