Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easier to use Kedro as a library #3029

Closed
merelcht opened this issue Sep 13, 2023 · 5 comments
Closed

Make it easier to use Kedro as a library #3029

merelcht opened this issue Sep 13, 2023 · 5 comments
Assignees
Labels
Stage: User Research 🔬 Ticket needs to undergo user research before implementation Type: Parent Issue

Comments

@merelcht
Copy link
Member

merelcht commented Sep 13, 2023

Introduction

To use Kedro as a library and specifically the DataCatalog and OmegaConfigLoader (or any config loader for that matter) a user needs to have significant knowledge and understanding of Kedro as a Framework.
These two core components of Kedro contain assumptions that hold when they're used within the Framework workflow, e.g. when using Kedro through the CLI or KedroSession. However, these components have a use case outside of the Framework as well and could in theory be used as just a data catalog to easily load and save your data and as a configuration loader that loads any yml type configuration. In practice, using these components has proven to be difficult if the user doesn't know the Framework intimately.

Background

For more detailed background see:

Proposals

To use the DataCatalog independently of the framework:

Add argument to provide data source to DataCatalog #2965

conf_catalog = config_loader["catalog"] # config
- conf_catalog = _convert_paths_to_absolute_posix(Path("../../").resolve(), conf_catalog) # config/catalog
- catalog = DataCatalog.from_config(conf_catalog) # catalog
+ catalog = DataCatalog.from_config(source="../data", conf_catalog)
catalog.load("example_data")

To use the OmegaConfigLoader independently of the framework:

Remove the environment default for the OmegaConfigLoader #2971

Currently, unless you have a base and local environment you are required to provide a base_env and default_run_env to your configuration loader. We propose to remove that assumption and set both base_env and default_run_env to be inside the specified conf_source (conf by default).

- conf_loader = OmegaConfigLoader(conf_source="conf", base_env=".", default_run_env=".")
+ conf_loader = OmegaConfigLoader(conf_source="conf")

From using just the OmegaConfigLoader to more of the Kedro Framework:

Opt-out of using environments.

By default the framework will assume you have a "base" and "local" environment. This is set in settings.py and you will have to change it if you don't want to use these environments:

Default settings

# settings.py 
CONFIG_LOADER_ARGS = {"base_env": "base", "default_run_env": "local"}

How to change the settings to not use environments/different environments

# settings.py 
CONFIG_LOADER_ARGS = {"base_env": ".", "default_run_env": "."} 
@merelcht merelcht added Stage: User Research 🔬 Ticket needs to undergo user research before implementation Type: Parent Issue labels Sep 13, 2023
@merelcht merelcht self-assigned this Sep 13, 2023
@merelcht
Copy link
Member Author

merelcht commented Sep 13, 2023

Feedback from the internal session:

To use the DataCatalog independently of the framework:

1. Add argument to provide data source to DataCatalog #2965

The users like this suggestion, but will need more clarity on what "source" means. In the user session there was some confusion about this referencing the data source and how it would then work with remote data sources such as databases/S3.

2. Add a DataCatalog.from_file() method #2967

The users really liked this suggestion. One remaining question is whether from_file() would also get the source/root argument just like from_config().

To use the OmegaConfigLoader independently of the framework:

1. Remove the environment default for the OmegaConfigLoader #2971

Not a lot of vocal feedback on this suggestion. The users in the session were fine with it.

@astrojuanlu
Copy link
Member

Update: #2967 has been discarded

@merelcht
Copy link
Member Author

merelcht commented Oct 5, 2023

Feedback from the external session:

To use the DataCatalog independently of the framework:

1. Add argument to provide data source to DataCatalog #2965

Users in the group mentioned they use the Kedro ipython extension and so haven't struggled with using the DataCatalog. Nevertheless, they understood the proposal and generally liked it. There were some questions about how this worked with credentials and remote datasets.

To use the OmegaConfigLoader independently of the framework:

1. Remove the environment default for the OmegaConfigLoader #2971

There was a mix of users who really liked this proposal and some who didn't. The main points raised where:

  • This proposal makes it a lot clearer what happens with the OmegaConfigLoader and removes the "magic" assumptions of environments.
  • We should definitely keep the environment design, but this proposal makes sense for people getting started.
  • settings.py is a good place to keep the defaults, because it's visible to users and this group has experience with updating that file.
  • Environments are great and the fact that Kedro is opinionated is really nice, so actually it would be better to keep the assumption of environments in the OmegaConfigLoader. It forces users to learn about environment from the start and if they only want 1 environment they can just add an empty local folder.

@merelcht
Copy link
Member Author

After more discussion it was decided to only implement #2971. We will not add the extra argument to #2965 but instead improve the error messaging in that workflow.

@astrojuanlu
Copy link
Member

astrojuanlu commented Oct 30, 2023

Summary:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stage: User Research 🔬 Ticket needs to undergo user research before implementation Type: Parent Issue
Projects
Archived in project
Development

No branches or pull requests

3 participants