Skip to content

Commit

Permalink
Add docs on difference between OmegaConf and OmegaConfigLoader (#…
Browse files Browse the repository at this point in the history
…3352)

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com>
  • Loading branch information
merelcht and stichbury authored Nov 30, 2023
1 parent 118408a commit 5475e74
Show file tree
Hide file tree
Showing 2 changed files with 73 additions and 0 deletions.
25 changes: 25 additions & 0 deletions docs/source/configuration/advanced_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ This page also contains a set of guidance for advanced configuration requirement
* [How to ensure non default configuration files get loaded](#how-to-ensure-non-default-configuration-files-get-loaded)
* [How to bypass the configuration loading rules](#how-to-bypass-the-configuration-loading-rules)
* [How to do templating with the `OmegaConfigLoader`](#how-to-do-templating-with-the-omegaconfigloader)
* [How to load a data catalog with templating in code?](#how-to-load-a-data-catalog-with-templating-in-code)
* [How to use global variables with the `OmegaConfigLoader`](#how-to-use-global-variables-with-the-omegaconfigloader)
* [How to override configuration with runtime parameters with the `OmegaConfigLoader`](#how-to-override-configuration-with-runtime-parameters-with-the-omegaconfigloader)
* [How to use resolvers in the `OmegaConfigLoader`](#how-to-use-resolvers-in-the-omegaconfigloader)
Expand Down Expand Up @@ -133,6 +134,30 @@ Since both of the file names (`catalog.yml` and `catalog_globals.yml`) match the
#### Other configuration files
It's also possible to use variable interpolation in configuration files other than parameters and catalog, such as custom spark or mlflow configuration. This works in the same way as variable interpolation in parameter files. You can still use the underscore for the templated values if you want, but it's not mandatory like it is for catalog files.

### How to load a data catalog with templating in code?
You can use the `OmegaConfigLoader` to directly load a data catalog that contains templating in code. Under the hood the `OmegaConfigLoader` will resolve any templates, so no further steps are required to load catalog entries properly.
```yaml
# Example catalog with templating
companies:
type: ${_dataset_type}
filepath: data/01_raw/companies.csv
_dataset_type: pandas.CSVDataset
```

```python
from kedro.config import OmegaConfigLoader
from kedro.framework.project import settings
# Instantiate an `OmegaConfigLoader` instance with the location of your project configuration.
conf_path = str(project_path / settings.CONF_SOURCE)
conf_loader = OmegaConfigLoader(conf_source=conf_path)

conf_catalog = conf_loader["catalog"]
# conf_catalog["companies"]
# Will result in: {'type': 'pandas.CSVDataset', 'filepath': 'data/01_raw/companies.csv'}
```

### How to use global variables with the `OmegaConfigLoader`
From Kedro `0.18.13`, you can use variable interpolation in your configurations using "globals" with `OmegaConfigLoader`.
The benefit of using globals over regular variable interpolation is that the global variables are shared across different configuration types, such as catalog and parameters.
Expand Down
48 changes: 48 additions & 0 deletions docs/source/configuration/configuration_basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,28 @@ From Kedro 0.18.5 you can use the [`OmegaConfigLoader`](/kedro.config.OmegaConfi

`OmegaConfigLoader` can load `YAML` and `JSON` files. Acceptable file extensions are `.yml`, `.yaml`, and `.json`. By default, any configuration files used by the config loaders in Kedro are `.yml` files.

### `OmegaConf` vs. Kedro's `OmegaConfigLoader`
`OmegaConf` is a configuration management library in Python that allows you to manage hierarchical configurations. Kedro's `OmegaConfigLoader` uses `OmegaConf` for handling configurations.
This means that when you work with `OmegaConfigLoader` in Kedro, you are using the capabilities of `OmegaConf` without directly interacting with it.

`OmegaConfigLoader` in Kedro is designed to handle more complex configuration setups commonly used in Kedro projects. It automates the process of merging configuration files, such as those for catalogs, and accounts for different environments to make it convenient to manage configurations in a structured way.

When you need to load configurations manually, such as for exploration in a notebook, you have two options:
1. Use the `OmegaConfigLoader` class provided by Kedro.
2. Directly use the `OmegaConf` library.

Kedro's `OmegaConfigLoader` is designed to handle complex project environments. If your use case involves loading only one configuration file and is straightforward, it may be simpler to use `OmegaConf` directly.

```python
from omegaconf import OmegaConf

parameters = OmegaConf.load("/path/to/parameters.yml")
```

When your configuration files are complex and contain credentials or templating, Kedro's `OmegaConfigLoader` is more suitable, as described in more detail in [How to load a data catalog with credentials in code?](#how-to-load-a-data-catalog-with-credentials-in-code) and [How to load a data catalog with templating in code?](advanced_configuration.md#how-to-load-a-data-catalog-with-templating-in-code).

In summary, while both `OmegaConf` and Kedro's `OmegaConfigLoader` provide ways to manage configurations, your choice depends on the complexity of your configuration and whether you are working within the context of the Kedro framework.

## Configuration source
The configuration source folder is [`conf`](../get_started/kedro_concepts.md#conf) by default. We recommend that you keep all configuration files in the default `conf` folder of a Kedro project.

Expand Down Expand Up @@ -86,6 +108,7 @@ This section contains a set of guidance for the most common configuration requir
* [How to change the configuration source folder at runtime](#how-to-change-the-configuration-source-folder-at-runtime)
* [How to read configuration from a compressed file](#how-to-read-configuration-from-a-compressed-file)
* [How to access configuration in code](#how-to-access-configuration-in-code)
* [How to load a data catalog with credentials in code?](#how-to-load-a-data-catalog-with-credentials-in-code)
* [How to specify additional configuration environments](#how-to-specify-additional-configuration-environments)
* [How to change the default overriding environment](#how-to-change-the-default-overriding-environment)
* [How to use only one configuration environment](#how-to-use-only-one-configuration-environment)
Expand Down Expand Up @@ -159,6 +182,31 @@ conf_loader = OmegaConfigLoader(conf_source=conf_path)
conf_catalog = conf_loader["catalog"]
```

### How to load a data catalog with credentials in code?
```{note}
We do not recommend that you load and manipulate a data catalog directly in a Kedro node. Nodes are designed to be pure functions and thus should remain agnostic of I/O.
```

Assuming your project contains a catalog and credentials file, each located in `base` and `local` environments respectively, you can use the `OmegaConfigLoader` to load these configurations, and pass them to a `DataCatalog` object to access the catalog entries with resolved credentials.
```python
from kedro.config import OmegaConfigLoader
from kedro.framework.project import settings
from kedro.io import DataCatalog

# Instantiate an `OmegaConfigLoader` instance with the location of your project configuration.
conf_path = str(project_path / settings.CONF_SOURCE)
conf_loader = OmegaConfigLoader(
conf_source=conf_path, base_env="base", default_run_env="local"
)

# These lines show how to access the catalog and credentials configurations.
conf_catalog = conf_loader["catalog"]
conf_credentials = conf_loader["credentials"]

# Fetch the catalog with resolved credentials from the configuration.
catalog = DataCatalog.from_config(catalog=conf_catalog, credentials=conf_credentials)
```

### How to specify additional configuration environments
In addition to the two built-in `local` and `base` configuration environments, you can create your own. Your project loads `conf/base/` as the bottom-level configuration environment but allows you to overwrite it with any other environments that you create, such as `conf/server/` or `conf/test/`. To use additional configuration environments, run the following command:

Expand Down

0 comments on commit 5475e74

Please sign in to comment.