Parameter variations during single kedro execution instance (multiple Pipeline executions with different parameters) #282

Mar1cX · 2020-03-11T14:08:24Z

What are you trying to do?

Hello. I will start with that I'm still new at Kedro and I haven't explored every part of it yet, but I have a grasp of how things overly work. During development of Pipeline architecture it looks like that there is no clear way of making defined parameter variations. What I mean by that (in parameters.yml file) I would want to define:

parameter1: [1, 2]
parameter2: [['a', 'aa'], ['b', 'bb']]

After that when executing kedro run I would have Pipeline executed four times, with those kind of parameter variations:

parameter1: 1
parameter2: ['a', 'aa']

parameter1: 1
parameter2: ['b', 'bb']

parameter1: 2
parameter2: ['a', 'aa']

parameter1: 2
parameter2: ['b', 'bb']

This kind of requirement happens sometimes when you try to simulate data manipulation with different parameters during multiple times of pipeline execution. Something similar as hyperparameters search which tools like sklearn GridSearch do. Also I can see small limitations with GridSearch usage for example. I might have missed the possible solutions during exploring of documentation.

Thanks for an answer in an advance.

The text was updated successfully, but these errors were encountered:

lorenabalan · 2020-03-12T09:22:56Z

Hi @Mar1cX , you're right that one set of static configuration corresponds to a single run. If there's magic happening, like 4 different runs with one config, it'd be hard to trace back in case of failure - which exact combination of parameters failed?
Instead you can define multiple environments, in which you overwrite the values of parameter1 and parameter2 (as per https://kedro.readthedocs.io/en/latest/04_user_guide/03_configuration.html#additional-configuration-environments).

combo1.yml

parameter1: 1
parameter2: ['a', 'aa']

etc.
and run kedro run --env combo1, kedro run --env combo2 ..., to trigger the 4 runs.

I'm assuming you'd like to run the entire pipeline that many times, not just the node that uses those parameters. If it's the latter, then the problem becomes simpler.
There's also this issue with an example of hyperparameter tuning if that's at all helpful.

idanov · 2020-03-12T11:18:48Z

Hi @Mar1cX, welcome to Kedro's community and thank you for your suggestion!
Kedro's main purpose is to help data scientists and data engineers define their high-level pipeline, reagardless what underlying libraries they use. For the particular usecase you mention, we'd recommend users to use scikit-learn to do the hyperparameter search or add a for loop in their nodes iterating through all the combinations they need. Users can still provide all possible configurations in parameters.yml, however they would need to setup their node to accept the list of options, rather than single instances of the parameters.

To extend your example, a node in your case needs to be defined as follows:

node(func1, ['parameter1', 'parameter2'], ...)

And you would prefer for Kedro to automatically detect that the parameters are lists and then your function should look like this:

def func1(parameter1, parameter2):
    print(parameter1, parameter2) # do whatever you need with the parameters here
    # this function gets called 4 times with the cartesian product of the combination of the parameters

Currently in Kedro you can achieve what you need by adding two lines to your function:

import itertools
def func1(parameter1, parameter2):
    for p1, p2 in itertools.product(parameter1, parameter2):
        print(p1, p2) # do whatever you need with the parameters here

Kedro has no semantic understanding about the meaning of your parameters and there are more benefits in keeping it that way, e.g. it will allow you to have nodes of all kinds, including nodes which expect parameters which are lists, dictionaries or just numbers. Doing otherwise might lead to totally unexpected behaviour, e.g. how would Kedro distinguish between the illustrated usecase from the following one:

def func1(parameter1, parameter2):
    # use parameter1 for something here
    print(parameter1)
    # use parameter2 for something else here
    print(parameter2)

Both functions take lists as parameters, but one of them needs to iterate through the cartesian product of the parameters, where the other one would like to use each parameter as list separetely. Kedro should not make any assumptions about how the users need to use the parameters and that's why we prefer to keep feeding parameters separately from the control flow of the pipeline.

Mar1cX · 2020-03-20T12:50:03Z

@lorenabalan and @idanov Thank you both for those suggestions. Both of them work in different case scenarios, but both of those solutions are worth to keep in mind, which I will be able to use further in Kedro usage.

nblumoe · 2021-01-27T08:08:22Z

@idanov @lorenabalan thanks for your explanations above and the linked example.
With that it is clear to me, how to do hyperparameter tuning with scikit-learn within a single kedro node.

Do you have an idea for how to do hyperparameter tuning across nodes? In scitkit-learn you can build pipelines that cover not just the model training, but also data preparation, scaling etc. One might want to include that in a hyperparameter search.

Naturally, I would express the different steps (e.g. scaling and model training) in dedicated kedro nodes. But this means, I cannot use scikit-learn hyperparameter search anymore.

Any suggestions for this scenario? It seems to me, that adding a way to do hyperparameter search across multiple kedro nodes could be very valuable.

Mar1cX added the Issue: Question label Mar 11, 2020

Mar1cX changed the title ~~<Question>~~ Parameter variations during single kedro execution instance (multiple Pipeline executions with different parameters) Mar 11, 2020

Mar1cX closed this as completed Mar 20, 2020

Galileo-Galilei mentioned this issue Nov 15, 2021

Universal Kedro deployment (Part 3) - Add the ability to extend and distribute the project running logic #1041

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameter variations during single kedro execution instance (multiple Pipeline executions with different parameters) #282

Parameter variations during single kedro execution instance (multiple Pipeline executions with different parameters) #282

Mar1cX commented Mar 11, 2020 •

edited

Loading

lorenabalan commented Mar 12, 2020 •

edited

Loading

idanov commented Mar 12, 2020 •

edited

Loading

Mar1cX commented Mar 20, 2020 •

edited

Loading

nblumoe commented Jan 27, 2021

Parameter variations during single kedro execution instance (multiple Pipeline executions with different parameters) #282

Parameter variations during single kedro execution instance (multiple Pipeline executions with different parameters) #282

Comments

Mar1cX commented Mar 11, 2020 • edited Loading

What are you trying to do?

lorenabalan commented Mar 12, 2020 • edited Loading

idanov commented Mar 12, 2020 • edited Loading

Mar1cX commented Mar 20, 2020 • edited Loading

nblumoe commented Jan 27, 2021

Mar1cX commented Mar 11, 2020 •

edited

Loading

lorenabalan commented Mar 12, 2020 •

edited

Loading

idanov commented Mar 12, 2020 •

edited

Loading

Mar1cX commented Mar 20, 2020 •

edited

Loading