Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make registering custom resolvers easy #2407

Closed
merelcht opened this issue Mar 9, 2023 · 15 comments
Closed

Make registering custom resolvers easy #2407

merelcht opened this issue Mar 9, 2023 · 15 comments
Assignees
Labels
Component: Configuration Issue: Feature Request New feature or improvement to existing feature Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation

Comments

@merelcht
Copy link
Member

merelcht commented Mar 9, 2023

Description

Omegaconf provides functionality to register custom config resolvers: https://omegaconf.readthedocs.io/en/2.3_branch/usage.html#resolvers
At the moment it's possible to do this by subclassing the OmegaConfigLoader. Ideally it should be possible to do this without having to make a custom class.

Context

This feature would allows users to do more advanced variable interpolation in their config.

Possible Implementation

Allow users to provide custom resolvers through settings.py, adding a new constructor arg like custom_resolvers : Optional[Dict[str, Callable]]

Possible Alternatives

Perhaps users can achieve this as well by using hooks.

@merelcht merelcht added the Issue: Feature Request New feature or improvement to existing feature label Mar 9, 2023
@merelcht merelcht added the Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation label Mar 9, 2023
@datajoely
Copy link
Contributor

Awesome, here is a really scrappy version of what I think settings.py could look like:

import os
import polars as pl

CONFIG_LOADER_ARGS = {
   "custom_resolvers" : {
         "oc.env" : lambda x : os.environ[x],
         "polars" : lambda x : getattr(pl, x)
    }
}
  • oc.env we may want to actually enable by default since there is a built-in version of this.
  • polars is an attempt at addressing @astrojuanlu 's issue here where the user (I think could then do:
load_args:
     dtypes: 
       product_age: ${polars:Float64}
       group_identifier: ${polars:Utf8}

@marrrcin
Copy link
Contributor

A few thoughts:

  • should this setting enable all resolvers for all configs?
  • should this setting allow to "clear all and use only my resolvers", or "add my resolvers to the existing ones"?

I totally agree with:

oc.env we may want to actually enable by default since there is a built-in version of this.

This API would obviously allow to register e.g. oc.select as a resolver, but there might be a friction among users, because the oc.*s are more like "built-in", so people might ask.

@datajoely
Copy link
Contributor

Agreed on both points

  • I think we should protect the 'oc.*' namespace
  • We should have an additional option in CONFIG_LOADER_ARGS which handles the resolver overwriting question.

@datajoely
Copy link
Contributor

datajoely commented Mar 10, 2023

Could you elaborate on how you imagine this would work?

should this setting enable all resolvers for all configs?

Do you mean like scoping by env or something?

@merelcht
Copy link
Member Author

On the point of enabling oc.env by default: while I see valid reasons of using oc.env in other places than credentials, we also made a deliberate decision to not enable it for all config types, because we want to make it clear that you shouldn't just be using environment variables all over the place. Like for parameters etc.

When we conducted the config research in the summer (see #1847), the main use case for environment variables was credentials, and so we decided to enable it for just that to start with. An advanced user could overwrite this behaviour and just enable environment variables for everything, but I still believe it's a sensible choice not to enable it by default just to make sure users don't just abuse env vars because it's easy.

@marrrcin
Copy link
Contributor

Could you elaborate on how you imagine this would work?

should this setting enable all resolvers for all configs?

Do you mean like scoping by env or something?

I've meant what @merelcht is talking about above - right now, the oc.env is scoped only to credentials.


An advanced user could overwrite this behaviour and just enable environment variables for everything, but I still believe it's a sensible choice not to enable it by default just to make sure users don't just abuse env vars because it's easy.

+1 to that!

@MatthiasRoels
Copy link

Why not enable the built-in resolvers for all configuration types? I mean, if a user wants to go ahead and use environment variables in their config, why not? I do see added value for e.g. configuring different paths in catalog entries based on the env (dev vs qa vs prod).

@noklam
Copy link
Contributor

noklam commented Apr 24, 2023

In hydra they use a specific _target_ key for Dynamic Injection.

data_fab:
  type: data_fab.extras.DataSet
  save_args:
    if_exists: append
    dtype:
      simulation_payload:
        _target_: sqlalchemy.types.JSON
        _partial_: False

@astrojuanlu
Copy link
Member

By the way, registering custom resolvers does not even need subclassing OmegaConfigLoader:

# settings.py

if not OmegaConf.has_resolver("dummy"):
    OmegaConf.register_new_resolver("dummy", lambda x: x)

CONFIG_LOADER_CLASS = OmegaConfigLoader

apparently OmegaConf is a singleton 😬 or am I missing something?

@astrojuanlu
Copy link
Member

astrojuanlu commented May 16, 2023

(The if not OmegaConf.has_resolver(name) conditional is needed because apparently settings.py is executed twice by Kedro)

@noklam
Copy link
Contributor

noklam commented May 16, 2023

Why not enable the built-in resolvers for all configuration types? I mean, if a user wants to go ahead and use environment variables in their config, why not? I do see added value for e.g. configuring different paths in catalog entries based on the env (dev vs qa vs prod).

I guess this is exactly what Kedro's environment doing already.

@MatthiasRoels
Copy link

Why not enable the built-in resolvers for all configuration types? I mean, if a user wants to go ahead and use environment variables in their config, why not? I do see added value for e.g. configuring different paths in catalog entries based on the env (dev vs qa vs prod).

I guess this is exactly what Kedro's environment doing already.

I know, but if you can do it with an environment variable, you can simply add your catalog entry in conf/base without having to create "duplicates" in the different envs with only minor tweaks.

@noklam
Copy link
Contributor

noklam commented May 16, 2023

I see your point, however, we don't want to encourage users to abuse environment variables for everything. It's still a bit unclear how we should enable this feature.

I think for that purpose we should keep the discussion there. For this specific PR it's about how to enable registering custom resolver, and whether oc.env should be enabled by default is out of scope. I quite like @datajoely's suggestion.

An advanced user could overwrite this behaviour and just enable environment variables for everything, but I still believe it's a sensible choice not to enable it by default just to make sure users don't just abuse env vars because it's easy.

@ankatiyar
Copy link
Contributor

Summary of tech design session on 24/05:

Discussed two possible solutions -

  1. Document how users can register resolvers using OmegaConf singleton in settings.py -> Make registering custom resolvers easy #2407 (comment)
    Problems with this approach (comments by @noklam)
  1. It feels weird you need to import the underlying singleton from omegaconf to to change the settings, it works because it’s a singleton but doesn’t mean we should do it, in a way it’s a leaky abstraction
  2. In the context of a standalone component, the Config class should be functioning once it’s constructed. With this solution you need to first construct the class, then register the resolvers in a separate step, which feels wrong.
  1. Allow users to register custom resolvers using CONFIG_LOADER_ARGS -> Make registering custom resolvers easy #2407 (comment)
    This is the approach we'll go ahead with. The following things need to be done -
  • Activate all built-in resolvers by default for OmegaConfigLoader except for oc.env
  • Implement a very thin wrapper that will pass all the arguments straight to OmegaConf.register_new_resolver(). This will let users overwrite built-in resolvers using replace=True.
  • On the documentation side, add a Kedro specific example for using custom resolvers in config.

@ankatiyar
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Configuration Issue: Feature Request New feature or improvement to existing feature Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation
Projects
Archived in project
Development

No branches or pull requests

7 participants