-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KED-1740] Name-spaced Parameters #399
Comments
I've logged this in our backlog to take a look at. Thank you! @lorenabalan you might be interested in this/have a reply :) |
Thank you @benhorsburgh for such a thoughtful writeup!! Your use case makes sense. Here are some of my thoughts, I'd love to hear more on this. We've had conflicting feedback about this at the time, with some users saying that it's more often that modular pipelines will only differ by one or 2 parameters, and everything else is intact. I think it was also confusing because "." has a special meaning for parameters. Currently users can access nested parameters through I hear your point about the strings! Currently that'd be hard to do even if we got rid of the "params:" prefix, because we use a jmespath-like syntax for accessing nested parameters with "." notation, which |
This has now landed in |
Description
First of all - I love Kedro 0.16.x namespacing!
When I create a new modular pipeline, I can very easily re-use this across different name spaces, which is intuitive to use thanks to the new conf sub-directories, and pipeline structure. However, the name-spacing of parameters can only be achieved using messy work arounds.
Let me explain using an example.
Let's say I create
kedro pipeline create clean_timestamps
as a modular pipeline. In this, I create the following pipeline:In my
conf/base/pipelines/clean_timestamps/catalog.yml
file I can make the datasets very obvious, along with name-spaced versions of my datasets.This gives me very clean and clear flexibility to keep the data that I need, and set up new namespaces. Using this, I can create the default and namespaced pipelines very easily.
and everything magically works because of how my config has been set up.
Now if we try to do the same with parameters! Intuitively, I should set up my
conf/base/pipelines/clean_timestamps/parameters.yml
in exactly the same way, which might look like:and in theory everything should work! Kedro of course does not support this yet. The current workaround is that I make use of the
parameters
argument on pipelines.Context
This is important because failure to namespace parameters means that:
2.1. Increased learning curves for new team members on projects
2.2 Increased bugs due to not understanding a team convention
dict(param="my_param")
format. This is extremely useful, because it means the only strings in my code are things that are configurable, not placeholder variable names. It is much easier to read, and more maintainable.Adding namespaced parameters will:
Possible Implementation
Continuing from the above example, I would suggest namespacing is handled in exactly the same way as datasets, but appending a prefix to the expected parameter.
With no namespace:
"params:my_params"
With namespace:
"params:namespace.my_params"
Possible Alternatives
An alternative may be to suffix, but this would be confusing given the dataset implementation.
The text was updated successfully, but these errors were encountered: