From dc6a667412781cebae1960c89ad92f425f8b9996 Mon Sep 17 00:00:00 2001 From: Merel Theisen Date: Tue, 28 Nov 2023 10:50:37 +0000 Subject: [PATCH 1/5] Add docs on difference between OmegaConf and OmegaConfigLoader + examples Signed-off-by: Merel Theisen --- .../configuration/advanced_configuration.md | 25 +++++++++++ .../configuration/configuration_basics.md | 44 +++++++++++++++++++ 2 files changed, 69 insertions(+) diff --git a/docs/source/configuration/advanced_configuration.md b/docs/source/configuration/advanced_configuration.md index 94427b19b0..03b74dba34 100644 --- a/docs/source/configuration/advanced_configuration.md +++ b/docs/source/configuration/advanced_configuration.md @@ -10,6 +10,7 @@ This page also contains a set of guidance for advanced configuration requirement * [How to ensure non default configuration files get loaded](#how-to-ensure-non-default-configuration-files-get-loaded) * [How to bypass the configuration loading rules](#how-to-bypass-the-configuration-loading-rules) * [How to do templating with the `OmegaConfigLoader`](#how-to-do-templating-with-the-omegaconfigloader) +* [How to load a data catalog with templating in code?](#how-to-load-a-data-catalog-with-templating-in-code) * [How to use global variables with the `OmegaConfigLoader`](#how-to-use-global-variables-with-the-omegaconfigloader) * [How to override configuration with runtime parameters with the `OmegaConfigLoader`](#how-to-override-configuration-with-runtime-parameters-with-the-omegaconfigloader) * [How to use resolvers in the `OmegaConfigLoader`](#how-to-use-resolvers-in-the-omegaconfigloader) @@ -133,6 +134,30 @@ Since both of the file names (`catalog.yml` and `catalog_globals.yml`) match the #### Other configuration files It's also possible to use variable interpolation in configuration files other than parameters and catalog, such as custom spark or mlflow configuration. This works in the same way as variable interpolation in parameter files. You can still use the underscore for the templated values if you want, but it's not mandatory like it is for catalog files. +### How to load a data catalog with templating in code? +If you want to directly load a data catalog that contains templating in code you can leverage the `OmegaConfigLoader`. Under the hood the `OmegaConfigLoader` will resolve any templates, so no further steps are required to load catalog entries properly. +```yaml +# Example catalog with templating +companies: + type: ${_dataset_type} + filepath: data/01_raw/companies.csv + +_dataset_type: pandas.CSVDataset +``` + +```python +from kedro.config import OmegaConfigLoader +from kedro.framework.project import settings + +# Instantiate an `OmegaConfigLoader` instance with the location of your project configuration. +conf_path = str(project_path / settings.CONF_SOURCE) +conf_loader = OmegaConfigLoader(conf_source=conf_path) + +conf_catalog = conf_loader["catalog"] +# conf_catalog["companies"] +# Will result in: {'type': 'pandas.CSVDataset', 'filepath': 'data/01_raw/companies.csv'} +``` + ### How to use global variables with the `OmegaConfigLoader` From Kedro `0.18.13`, you can use variable interpolation in your configurations using "globals" with `OmegaConfigLoader`. The benefit of using globals over regular variable interpolation is that the global variables are shared across different configuration types, such as catalog and parameters. diff --git a/docs/source/configuration/configuration_basics.md b/docs/source/configuration/configuration_basics.md index 58ed579444..66de454c45 100644 --- a/docs/source/configuration/configuration_basics.md +++ b/docs/source/configuration/configuration_basics.md @@ -16,6 +16,28 @@ From Kedro 0.18.5 you can use the [`OmegaConfigLoader`](/kedro.config.OmegaConfi `OmegaConfigLoader` can load `YAML` and `JSON` files. Acceptable file extensions are `.yml`, `.yaml`, and `.json`. By default, any configuration files used by the config loaders in Kedro are `.yml` files. +### `OmegaConf` vs. Kedro's `OmegaConfigLoader` +`OmegaConf` is a configuration management library in Python that allows you to manage hierarchical configurations. On the other hand, Kedro's `OmegaConfigLoader` is a component within the Kedro framework that utilises `OmegaConf` for handling configurations. +This means that when you work with `OmegaConfigLoader` in Kedro, you are leveraging the capabilities of `OmegaConf` without directly interacting with it. + +`OmegaConfigLoader` in Kedro is designed to handle more complex configuration setups commonly used in Kedro projects. It automates the process of merging configuration files, such as those for catalogs, and takes into account different environments, making it convenient for managing configurations in a structured way. + +When you need to load configurations manually, such as for exploration in a notebook, you have two options: +1. Use the `OmegaConfigLoader` class provided by Kedro. +2. Directly use the `OmegaConf` library. + +If your use case involves loading only one configuration file and you don't have the complexity that Kedro's `OmegaConfigLoader` is designed to handle, it may be simpler to use `OmegaConf` directly. + +```python +from omegaconf import OmegaConf + +parameters = OmegaConf.load("/path/to/parameters.yml") +``` + +When your configuration files are more complex and contain credentials or templating Kedro's `OmegaConfigLoader` is better suited to load configuration, as described in more detail in [How to load a data catalog with credentials in code?](#how-to-load-a-data-catalog-with-credentials-in-code) and [How to load a data catalog with templating in code?](advanced_configuration.md#how-to-load-a-data-catalog-with-templating-in-code). + +In summary, while both `OmegaConf` and Kedro's `OmegaConfigLoader` provide ways to manage configurations, the latter is specifically tailored for Kedro projects with a focus on handling more intricate configuration structures and environments. The choice between them depends on the complexity of your configuration needs and whether you are working within the context of the Kedro framework. + ## Configuration source The configuration source folder is [`conf`](../get_started/kedro_concepts.md#conf) by default. We recommend that you keep all configuration files in the default `conf` folder of a Kedro project. @@ -86,6 +108,7 @@ This section contains a set of guidance for the most common configuration requir * [How to change the configuration source folder at runtime](#how-to-change-the-configuration-source-folder-at-runtime) * [How to read configuration from a compressed file](#how-to-read-configuration-from-a-compressed-file) * [How to access configuration in code](#how-to-access-configuration-in-code) +* [How to load a data catalog with credentials in code?](#how-to-load-a-data-catalog-with-credentials-in-code) * [How to specify additional configuration environments](#how-to-specify-additional-configuration-environments) * [How to change the default overriding environment](#how-to-change-the-default-overriding-environment) * [How to use only one configuration environment](#how-to-use-only-one-configuration-environment) @@ -159,6 +182,27 @@ conf_loader = OmegaConfigLoader(conf_source=conf_path) conf_catalog = conf_loader["catalog"] ``` +### How to load a data catalog with credentials in code? +Assuming your project contains a catalog and credentials file each located in a `base` and `local` environment respectively, you can use the `OmegaConfigLoader` to load these configurations and then pass them on to a `DataCatalog` object to get access to the catalog entries with resolved credentials. +```python +from kedro.config import OmegaConfigLoader +from kedro.framework.project import settings +from kedro.io import DataCatalog + +# Instantiate an `OmegaConfigLoader` instance with the location of your project configuration. +conf_path = str(project_path / settings.CONF_SOURCE) +conf_loader = OmegaConfigLoader( + conf_source=conf_path, base_env="base", default_run_env="local" +) + +# These lines show how to access the catalog and credentials configurations. +conf_catalog = conf_loader["catalog"] +conf_credentials = conf_loader["credentials"] + +# Fetch the catalog with resolved credentials from the configuration. +catalog = DataCatalog.from_config(catalog=conf_catalog, credentials=conf_credentials) +``` + ### How to specify additional configuration environments In addition to the two built-in `local` and `base` configuration environments, you can create your own. Your project loads `conf/base/` as the bottom-level configuration environment but allows you to overwrite it with any other environments that you create, such as `conf/server/` or `conf/test/`. To use additional configuration environments, run the following command: From a0c3688421166ed58c1221fe062a3351aa688df3 Mon Sep 17 00:00:00 2001 From: Merel Theisen <49397448+merelcht@users.noreply.github.com> Date: Tue, 28 Nov 2023 14:56:55 +0000 Subject: [PATCH 2/5] Apply suggestions from code review Co-authored-by: Jo Stichbury Signed-off-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> --- docs/source/configuration/advanced_configuration.md | 2 +- docs/source/configuration/configuration_basics.md | 12 ++++++------ 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/source/configuration/advanced_configuration.md b/docs/source/configuration/advanced_configuration.md index 03b74dba34..041f23cd6e 100644 --- a/docs/source/configuration/advanced_configuration.md +++ b/docs/source/configuration/advanced_configuration.md @@ -135,7 +135,7 @@ Since both of the file names (`catalog.yml` and `catalog_globals.yml`) match the It's also possible to use variable interpolation in configuration files other than parameters and catalog, such as custom spark or mlflow configuration. This works in the same way as variable interpolation in parameter files. You can still use the underscore for the templated values if you want, but it's not mandatory like it is for catalog files. ### How to load a data catalog with templating in code? -If you want to directly load a data catalog that contains templating in code you can leverage the `OmegaConfigLoader`. Under the hood the `OmegaConfigLoader` will resolve any templates, so no further steps are required to load catalog entries properly. +You can use the `OmegaConfigLoader` to directly load a data catalog that contains templating in code. Under the hood the `OmegaConfigLoader` will resolve any templates, so no further steps are required to load catalog entries properly. ```yaml # Example catalog with templating companies: diff --git a/docs/source/configuration/configuration_basics.md b/docs/source/configuration/configuration_basics.md index 66de454c45..cda1efd9f0 100644 --- a/docs/source/configuration/configuration_basics.md +++ b/docs/source/configuration/configuration_basics.md @@ -17,16 +17,16 @@ From Kedro 0.18.5 you can use the [`OmegaConfigLoader`](/kedro.config.OmegaConfi `OmegaConfigLoader` can load `YAML` and `JSON` files. Acceptable file extensions are `.yml`, `.yaml`, and `.json`. By default, any configuration files used by the config loaders in Kedro are `.yml` files. ### `OmegaConf` vs. Kedro's `OmegaConfigLoader` -`OmegaConf` is a configuration management library in Python that allows you to manage hierarchical configurations. On the other hand, Kedro's `OmegaConfigLoader` is a component within the Kedro framework that utilises `OmegaConf` for handling configurations. -This means that when you work with `OmegaConfigLoader` in Kedro, you are leveraging the capabilities of `OmegaConf` without directly interacting with it. +`OmegaConf` is a configuration management library in Python that allows you to manage hierarchical configurations. Kedro's `OmegaConfigLoader` uses `OmegaConf` for handling configurations. +This means that when you work with `OmegaConfigLoader` in Kedro, you are using the capabilities of `OmegaConf` without directly interacting with it. -`OmegaConfigLoader` in Kedro is designed to handle more complex configuration setups commonly used in Kedro projects. It automates the process of merging configuration files, such as those for catalogs, and takes into account different environments, making it convenient for managing configurations in a structured way. +`OmegaConfigLoader` in Kedro is designed to handle more complex configuration setups commonly used in Kedro projects. It automates the process of merging configuration files, such as those for catalogs, and accounts for different environments to make it convenient to manage configurations in a structured way. When you need to load configurations manually, such as for exploration in a notebook, you have two options: 1. Use the `OmegaConfigLoader` class provided by Kedro. 2. Directly use the `OmegaConf` library. -If your use case involves loading only one configuration file and you don't have the complexity that Kedro's `OmegaConfigLoader` is designed to handle, it may be simpler to use `OmegaConf` directly. +Kedro's `OmegaConfigLoader` is designed to handle complex project environments. If your use case involves loading only one configuration file and is straightforward, it may be simpler to use `OmegaConf` directly. ```python from omegaconf import OmegaConf @@ -34,9 +34,9 @@ from omegaconf import OmegaConf parameters = OmegaConf.load("/path/to/parameters.yml") ``` -When your configuration files are more complex and contain credentials or templating Kedro's `OmegaConfigLoader` is better suited to load configuration, as described in more detail in [How to load a data catalog with credentials in code?](#how-to-load-a-data-catalog-with-credentials-in-code) and [How to load a data catalog with templating in code?](advanced_configuration.md#how-to-load-a-data-catalog-with-templating-in-code). +When your configuration files are complex and contain credentials or templating, Kedro's `OmegaConfigLoader` is more suitable, as described in more detail in [How to load a data catalog with credentials in code?](#how-to-load-a-data-catalog-with-credentials-in-code) and [How to load a data catalog with templating in code?](advanced_configuration.md#how-to-load-a-data-catalog-with-templating-in-code). -In summary, while both `OmegaConf` and Kedro's `OmegaConfigLoader` provide ways to manage configurations, the latter is specifically tailored for Kedro projects with a focus on handling more intricate configuration structures and environments. The choice between them depends on the complexity of your configuration needs and whether you are working within the context of the Kedro framework. +In summary, while both `OmegaConf` and Kedro's `OmegaConfigLoader` provide ways to manage configurations, your choice depends on the complexity of your configuration and whether you are working within the context of the Kedro framework. ## Configuration source The configuration source folder is [`conf`](../get_started/kedro_concepts.md#conf) by default. We recommend that you keep all configuration files in the default `conf` folder of a Kedro project. From 15c33fcc1f55fd921834d1762bf0c84359ffc327 Mon Sep 17 00:00:00 2001 From: Merel Theisen Date: Tue, 28 Nov 2023 16:14:50 +0000 Subject: [PATCH 3/5] Fix lint Signed-off-by: Merel Theisen --- docs/source/configuration/configuration_basics.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/configuration/configuration_basics.md b/docs/source/configuration/configuration_basics.md index cda1efd9f0..e2eacc3e6c 100644 --- a/docs/source/configuration/configuration_basics.md +++ b/docs/source/configuration/configuration_basics.md @@ -26,7 +26,7 @@ When you need to load configurations manually, such as for exploration in a note 1. Use the `OmegaConfigLoader` class provided by Kedro. 2. Directly use the `OmegaConf` library. -Kedro's `OmegaConfigLoader` is designed to handle complex project environments. If your use case involves loading only one configuration file and is straightforward, it may be simpler to use `OmegaConf` directly. +Kedro's `OmegaConfigLoader` is designed to handle complex project environments. If your use case involves loading only one configuration file and is straightforward, it may be simpler to use `OmegaConf` directly. ```python from omegaconf import OmegaConf @@ -36,7 +36,7 @@ parameters = OmegaConf.load("/path/to/parameters.yml") When your configuration files are complex and contain credentials or templating, Kedro's `OmegaConfigLoader` is more suitable, as described in more detail in [How to load a data catalog with credentials in code?](#how-to-load-a-data-catalog-with-credentials-in-code) and [How to load a data catalog with templating in code?](advanced_configuration.md#how-to-load-a-data-catalog-with-templating-in-code). -In summary, while both `OmegaConf` and Kedro's `OmegaConfigLoader` provide ways to manage configurations, your choice depends on the complexity of your configuration and whether you are working within the context of the Kedro framework. +In summary, while both `OmegaConf` and Kedro's `OmegaConfigLoader` provide ways to manage configurations, your choice depends on the complexity of your configuration and whether you are working within the context of the Kedro framework. ## Configuration source The configuration source folder is [`conf`](../get_started/kedro_concepts.md#conf) by default. We recommend that you keep all configuration files in the default `conf` folder of a Kedro project. From 608b9c35e731259c78adf7ca36dd6c1b1ed97c32 Mon Sep 17 00:00:00 2001 From: Merel Theisen Date: Wed, 29 Nov 2023 11:54:35 +0000 Subject: [PATCH 4/5] Address review comments Signed-off-by: Merel Theisen --- docs/source/configuration/configuration_basics.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/source/configuration/configuration_basics.md b/docs/source/configuration/configuration_basics.md index e2eacc3e6c..6000c38acd 100644 --- a/docs/source/configuration/configuration_basics.md +++ b/docs/source/configuration/configuration_basics.md @@ -183,6 +183,10 @@ conf_catalog = conf_loader["catalog"] ``` ### How to load a data catalog with credentials in code? +```{note} +It is not recommended to do load and manipulate a data catalog directly in a Kedro node. Nodes are designed to be pure functions and thus should remain agnostic of I/O. +``` + Assuming your project contains a catalog and credentials file each located in a `base` and `local` environment respectively, you can use the `OmegaConfigLoader` to load these configurations and then pass them on to a `DataCatalog` object to get access to the catalog entries with resolved credentials. ```python from kedro.config import OmegaConfigLoader From bfd589a4a4f4fd9a548bdfcb2ffe13f1cddd61e7 Mon Sep 17 00:00:00 2001 From: Merel Theisen <49397448+merelcht@users.noreply.github.com> Date: Thu, 30 Nov 2023 10:57:35 +0000 Subject: [PATCH 5/5] Apply suggestions from code review Co-authored-by: Jo Stichbury Signed-off-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> --- docs/source/configuration/configuration_basics.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/configuration/configuration_basics.md b/docs/source/configuration/configuration_basics.md index 6000c38acd..188d8c9c17 100644 --- a/docs/source/configuration/configuration_basics.md +++ b/docs/source/configuration/configuration_basics.md @@ -184,10 +184,10 @@ conf_catalog = conf_loader["catalog"] ### How to load a data catalog with credentials in code? ```{note} -It is not recommended to do load and manipulate a data catalog directly in a Kedro node. Nodes are designed to be pure functions and thus should remain agnostic of I/O. +We do not recommend that you load and manipulate a data catalog directly in a Kedro node. Nodes are designed to be pure functions and thus should remain agnostic of I/O. ``` -Assuming your project contains a catalog and credentials file each located in a `base` and `local` environment respectively, you can use the `OmegaConfigLoader` to load these configurations and then pass them on to a `DataCatalog` object to get access to the catalog entries with resolved credentials. +Assuming your project contains a catalog and credentials file, each located in `base` and `local` environments respectively, you can use the `OmegaConfigLoader` to load these configurations, and pass them to a `DataCatalog` object to access the catalog entries with resolved credentials. ```python from kedro.config import OmegaConfigLoader from kedro.framework.project import settings