Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OmegaConfigLoader does not follow ConfigLoader get() spec. #3092

Closed
antheas opened this issue Sep 28, 2023 · 10 comments
Closed

OmegaConfigLoader does not follow ConfigLoader get() spec. #3092

antheas opened this issue Sep 28, 2023 · 10 comments

Comments

@antheas
Copy link

antheas commented Sep 28, 2023

Description

The new OmegaConfigLoader does not follow the get() spec of the ConfigLoader and TemplatedConfigLoader and does not provide an alternative for loading custom config files or extending the file paths easily.

The spec is the following.

    def get(self, *patterns: str) -> dict[str, Any]:
        pass

The format of this get() function allows passing in file locations to load specific files from disk using the config loader of the project as well as custom patterns.

This forces hacks like the following for using custom patterns that can be overridden by the user:

if 'Omega' in str(type(context.config_loader)):
    if 'mlflow' not in context.config_loader.config_patterns:
        context.config_loader.config_patterns['mlflow'] = ["mlflow*", "mlflow*/**"]
    conf_mlflow_yml = context.config_loader.get("mlflow")
else:
    conf_mlflow_yml = context.config_loader.get("mlflow*", "mlflow*/**")

And loading custom files is impossible, since the OmegaConfigLoader is hardcoded to load from directory + pattern combinations. Ex. a variation of config_loader.get("/home/user/abc.yml") is impossible currently.

I package config files (catalog.yml and parameter.yml) as parts of modules so that they can be distributed to other projects and load them using paths. As it currently stands, when a Kedro project uses an OmegaConfigLoader I can not load them.

Not being able to use globals was another issue but I see that it has been fixed in the latest version.

Possible Solution

Restore the old get syntax with a deprecation warning when get() is called with multiple strings. Add a new format for loading patterns with a fallback and one for loading files. Example:

def get(self, *patterns: str):
    # raises deprecation warning

def get(self, files: Sequence[str]):
   # loads the files found in files. Throws an error if a file is not found (unlike with patterns where it's silent).

def get(self, tag: str, fallback: Sequence[str]):
   # If `tag` exists in `config_patterns` use the sequences found there, otherwise use the sequences in fallback.

The above spec should be supported by all config loaders, which should be easy enough as it's a variation of the existing get spec.
It will make the following possible while not being ambiguous and remaining extendable.

# Load configs for mlflow by using the fallback patterns that can be overridden by the user in their settings.py file.
config_loader.get("mlflow", ["mlflow*", "mlflow*/**"])

# Load files using the existing config loader
# The user can not override the resolution of abc.yml here with other variables unfortunately.
config_loader.get(["/home/user/abc.yml"])
  • Kedro version used (pip show kedro or kedro -V): 0.18.13
@merelcht
Copy link
Member

Hi @antheas , thank you for flagging this! Have you tried the solution using CONFIG_LOADER_ARGS described here: https://docs.kedro.org/en/stable/configuration/advanced_configuration.html#how-to-ensure-non-default-configuration-files-get-loaded ? This should work in the same way for OmegaConfigLoader.

@antheas
Copy link
Author

antheas commented Sep 28, 2023

Hi, @merelcht
For custom config patterns for a project yes that works. But if distributing a library or extension for kedro it is not ideal. All of the users would have to override their CONFIG_LOADER_ARGS as well.

It also doesn't solve for loading custom files. I'll be staying on TemplatedConfigLoader for now.

@noklam
Copy link
Contributor

noklam commented Sep 28, 2023

@antheas great point. However get is never an API of the Abstract config loader. OmegaConfigLoader follow the spec of the Abstract class but not subclassing ConfigLoader

https://github.com/kedro-org/kedro/blob/main/kedro/config/abstract_config.py

The config pattern is in the OmegaConfigLoader signature, you can set defaults if you need to distribute it in library.

It may be benefit to still have a similar get API and PRs and discussion are welcomed!

@antheas
Copy link
Author

antheas commented Sep 28, 2023

@noklam My issue is more to the fact that TemplatedConfigLoader is deprecated with a warning and OmegaConfigLoader is not a suitable replacement currently, for at least 2 use cases that I fall into: loading files from disk and adding custom config types without monkey patching. Both of which are supported by both ConfigLoader and TemplatedConfigLoader.

As for the points you raise, I looked at the AbstractConfigLoader before posting the issue. It's kind of a rough part of the API because as it currently stands KedroContext.config_loader is of type ConfigLoader, not AbstractConfigLoader.

So there is a conflict there. Either OmegaConfigLoader should subclass ConfigLoader or the type definition of KedroContext needs to have a breaking change. If it doesn't change, according to the API of ConfigLoader, get() should be implemented with varargs.

However get is never an API of the Abstract config loader

Unfortunately that's not true, get() is implemented as part of UserDict so you get the behavior of get(key, default=None). So right now, if you do context.config_loader.get("mlflow*", "mlflow*/**") with an OmegaConfigLoader you get back "mlflow*/**".

The new spec I suggested follows the pythonic way of calling get(key, default=None) with a fallback. And it would work regardless of the choice of ConfigLoader the user makes.

The config pattern is in the OmegaConfigLoader signature, you can set defaults if you need to distribute it in library.

I think it's preferable for the users of a library to be able to choose their own config loader. Nevertheless that's true.

Of course, a discussion is welcome.

@noklam
Copy link
Contributor

noklam commented Sep 28, 2023

Thanks @antheas, all solid comments and many useful insight.

My issue is more to the fact that TemplatedConfigLoader is deprecated with a warning and OmegaConfigLoader is not a suitable replacement currently, for at least 2 use cases that I fall into: loading files from disk and adding custom config types without monkey patching.

Checking with my understanding, the former you are referring to config_loader.get(<pattern>). Using the get API may be confusing though since it follows a UserDict interface, overloading this with the typing dict.get() behavior may be confusing too. It may still be a useful feature and is a reasonable ask. I was never a fan of the get method and prefer more explicit name.

adding custom config types without monkey patching

I am not sure if I understand this, can you show me an example how are you doing it currently and why do you need this? On the other hand, I am quite interested the use of these library component outside of the framework. How are you using it is it mainly in a notebook for interactive use case? Thank you🙏🏼.

So there is a conflict there. Either OmegaConfigLoader should subclass ConfigLoader or the type definition of KedroContext needs to have a breaking change. If it doesn't change, according to the API of ConfigLoader, get() should be implemented with varargs.

Agree, probably an oversight that we should update.

@property
def config_loader(self):
"""Read-only property referring to Kedro's ``ConfigLoader`` for this
context.
Returns:
Instance of `ConfigLoader`.
Raises:
KedroContextError: Incorrect ``ConfigLoader`` registered for the project.
"""
return self._config_loader

@merelcht
Copy link
Member

My issue is more to the fact that TemplatedConfigLoader is deprecated with a warning and OmegaConfigLoader is not a suitable replacement currently, for at least 2 use cases that I fall into: loading files from disk and adding custom config types without monkey patching.

Would you be able to expand on that a bit more @antheas ? How is OmegaConfigLoader not allowing you to load files from disk? And what's the monkey patching you have to do now?

To be fully transparent, we have put a lot of work in making OmegaConfigLoader a worthy replacement without it being a fully 1-1 feature replacement for the TemplatedConfigLoader. Long story short: TemplatedConfigLoader was not built by the core Kedro team and over time the config loader classes became pretty monstrous. There was an effort to make an AbstractConfigLoader class, however this wasn't designed in the best way. We're trying to get rid of this technical debt and allow us to move forward with a better designed solution. I am sorry you have found there to be conflicts in our new design. Hopefully, we can find a good solution for the issues you are facing without too much friction.

@antheas
Copy link
Author

antheas commented Sep 28, 2023

Sure. So for context the library I'm developing is this: https://github.com/pasteur-dev/pasteur

Which is an early research prototype

Early on in development I had a dependency on kedro-mlflow, but as the project developed I dropped it because I developed custom behavior for mlflow. I still depend on the mlflow configs from the previous project though, so in my code I have the following line:

conf_mlflow_yml = context.config_loader.get("mlflow*", "mlflow*/**")

It can be found here.

I agree that this line is not ideal, as the user can not override the paths. However, with the new OmegaConfigLoader I can't get the mlflow configs without the user updating CONFIG_LOADER_ARGS. I have not looked at how the new version of kedro-mlflow handles this though.

The solution I came up is this, which is not pretty:

if 'Omega' in str(type(context.config_loader)):
    if 'mlflow' not in context.config_loader.config_patterns:
        context.config_loader.config_patterns['mlflow'] = ["mlflow*", "mlflow*/**"]
    conf_mlflow_yml = context.config_loader.get("mlflow")
else:
    conf_mlflow_yml = context.config_loader.get("mlflow*", "mlflow*/**")

My library allows users to package their datasets and dataset permutations (as Views) and distribute them with other users. An example of this is found in the directories pasteur.extras.datasets and pasteur.extras.views.

The datasets package yml catalogs with them and the views parameter ymls with hyperparameters.

Since catalog yml files use a simpler format without the . notation, I can load them myself and do simple path replacement.

However, for the parameters I'm relying on the kedro config loader, which requires me to be able to load a file from disk through it by specifying the direct path of the file. I will admit that due to the way the code is written in the previous config loaders (ie a hack), when calling get if a file name was provided, it was resolved. This is not true anymore though.

It would be nice to have omega loader to be able to load files from file paths, so I won't have to move to a custom solution and maybe integrate some OmegaLoader specific behavior in the future. Although that's probably not needed.

@merelcht
Copy link
Member

Right I see what you mean. I've had a look at kedro-mlflow and there the problem is resolved with:

 if "mlflow" not in context.config_loader.config_patterns.keys():
                context.config_loader.config_patterns.update(
                    {"mlflow": ["mlflow*", "mlflow*/**", "**/mlflow*"]}
                )
            conf_mlflow_yml = context.config_loader["mlflow"]

I guess the main difference is that all config loaders are treated the same there, so no special check for OmegaConfigLoader. I understand this looks pretty messy for the plugin developer, but our aim was to be more explicit for the end Kedro user, which is why we've opted for letting users set their custom config paths inside settings.py.

@antheas
Copy link
Author

antheas commented Sep 28, 2023

Oh, I did not notice that config_loader.config_patterns was shared among all config loaders while doing my fix. Perhaps then the following is better:

patterns = getattr(context.config_loader, "config_patterns", {})
 if 'mlflow' not in patterns:
    patterns['mlflow'] = ["mlflow*", "mlflow*/**"]
conf_mlflow_yml = context.config_loader.get("mlflow")

config_patterns is not part of the AbstractConfigLoader API either, so this solution considers that. Of course, the user having a custom parser without config_patterns will still cause it to crash if they don't fill in the mlflow key.

Using a stable parameters parser instead of the ConfigLoader could be a solution to my other issue as well. So for now I will go with those two.

I will leave the issue open for discussion. You may close it in the future.

@merelcht
Copy link
Member

Closing this as there hasn't been any more activity recently.

@merelcht merelcht closed this as not planned Won't fix, can't repro, duplicate, stale Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants