-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add before/after_pipelines_registered
hook OR before/after_pipeline_filtered
#3000
Comments
I think it would have be injected here - but I'm not sure kedro/kedro/framework/project/__init__.py Line 142 in 0293dc1
|
This place unfortunately has no KedroContext 😞 But yeah, it's the initial place to have this. |
Looks reasonable to me, what will be the minimal signatures for these two hooks? Do you just need to executing some code before/after or you need to mutate the pipeline itself? |
Mutating the pipeline while having KedroContext at hand would be great :D |
x-ref #1718 It's unclear to me how injecting this in |
The current workaround is just "execute the code in |
after/before_pipelines_registered
hookbefore/after_pipelines_registered
hook OR before/after_pipeline_filtered
New insights, see Updated on 2023-09-12 section in the first post. |
@marrrcin we have the |
I've had the need for such functionality, as I was implementing grouping node execution by tags using kedro-airflow plugin. I wanted to validate whether the tagging was correct (so no disjoint, co-dependent nodes were joined and create a cycle dependency), so I had hard coded special prefix for tag to treat it as a grouping tag i.e "airflow:". With that tag "airflow:preprocessing" would translate to an airflow node "preprocessing" containing all nodes tagged "airflow:processing". I wished that this tag validation were done at the step of pipeline creation/validation and issue warning about incorrect tagging, so I implemented it in pipeline registry. Then I wanted to source this hard coded prefix with other parts of code that were doing the grouping and translation, but there was no mechanism for that. I couldn't refer to parameters in pipeline registry in any way. This hook would help with that. TL;DR: I had a use case to refer to parameters during registering pipelines and validating custom behavior and was unable to do that, while this feature would solve that. |
Description
It would be great to have a hook / a good place to attach some code after all pipelines have been registered but before the execution starts. I would like to do this for the sake of performing some validations on the pipeline definitions (w.r.t node tags).
Context
When doing a "Kedro-pipeline" to an "orchestrator-pipeline" translation, e.g. Airflow, there is a great pefromance optimization opportunity to "group" multiple Kedro nodes together and map multiple Kedro nodes as a single orchestrator level task instead of doing 1:1 mapping as it's done e.g. by Kedro-AzureML / Kedro-VertexAI / Kedro-Airflow.
Grouping the nodes by any mechanism (e.g. by Kedro tags) requires some additional validation of pipeline structure - checking whether after grouping the pipeline will still be a DAG.
Possible Implementation
Add new hook specs like:
after_pipelines_registered
andbefore_pipelines_registered
to access the pipelines before they are executed. For starters they could be a read-only, but it would be extra nice to be able to modify them on the fly too - especially when there would be a access e.g. do Kedro Context (this slightly connects to dynamic pipelines case; maybe it's too much for this issue anyway).Possible Alternatives
Right now, we're plugging the custom grouping/validation code in the
register_pipelines
which has some downsides:Updated on 2023-09-12:
I think even better alternative would be to have
before/after_pipeline_filtered
hooks as shown below:I would expect those hooks to access:
+ those hooks should allow to return the modified pipeline object
The text was updated successfully, but these errors were encountered: