Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

You can't manually enforce the order of local and plugin-based hooks #1939

Closed
foxale opened this issue Oct 17, 2022 · 7 comments
Closed

You can't manually enforce the order of local and plugin-based hooks #1939

foxale opened this issue Oct 17, 2022 · 7 comments
Assignees

Comments

@foxale
Copy link

foxale commented Oct 17, 2022

Description & Context

Some hooks come from plugins (like kedro-mlflow), whereas some may come from local hooks (like those). Currently, if many plugins and local hooks implement e.g. after_pipeline_run, there is no way to manually enforce order in which all those functions will be invoked.

This causes problems such as this one.

Also, while writing this post I realized another problem, which is using the name "hook" for both hook functions (e.g. after_pipeline_run hook function) and hook classes (e.g. DataCatalogHook class)

Steps to Reproduce

Available in the linked post above.

Expected Result

I assume we should be able enforce the order of hooks, assign them some priority or rules.

Your Environment

  • kedro 0.18.3
  • Python 3.9.14
@foxale foxale changed the title You can't mix order of local hooks with plugin hooks You can't manually enforce the order of local and plugin-based hooks Oct 17, 2022
@merelcht
Copy link
Member

Hi @foxale, hooks have been designed to run in the LIFO of registration (see https://kedro.readthedocs.io/en/stable/hooks/introduction.html#registering-your-hook-implementations-with-kedro).

How would you envision being able to assign priory rules? Per Hook class or at the hook function level?

@foxale
Copy link
Author

foxale commented Oct 17, 2022

Yes, I saw that before — but from what I understood and observed, there are separate queues for local hooks vs. plugin-based hooks. You can influence the registration order indirectly and directly with hook function annotations.

I would want to decide, that kedro-mlflow's hook should be the last hook executed, and second to last my custom UploadLogsToMLflowHook. How to do that in a user friendly way, that's a tough question.

@noklam
Copy link
Contributor

noklam commented Oct 17, 2022

hook_manager = _create_hook_manager()
_register_hooks(hook_manager, settings.HOOKS)
_register_hooks_setuptools(hook_manager, settings.DISABLE_HOOKS_FOR_PLUGINS)
self._hook_manager = hook_manager

I believe this is how Kedro works currently (in-order)

  1. Register hooks settings.py which run in the LIFO queue (project based)
  2. Prepare the setup.py hooks entrypoints (installed plugins)
  3. Exclude any plugin specified in settings.py and register the result of 2&3.

For 2., I am not sure if they operate in a specific order - maybe it's ordered by the name but whatever it is, it's not by our design.

I think I had some discussion with @AntonyMilneQB about this before - there wasn't any clear conclusion. How does pytest plugins handle this? In general, I don't like the idea that the plugin needs to be aware of the existence of other plugins, this creates implicit dependencies and is hard to debug. However, I understand that mlflow (or quite a few experiment tracking tools also operates on some global session) so specifying the order is a reasonable thing to do.

The try_first argument suggested by @foxale could be a workaround, but this also creates issue if multiple plugins are declaring try_first and try_last, again I think we should check on how pytest is handling this.

@foxale
Copy link
Author

foxale commented Oct 17, 2022

Thanks for the context @noklam! I checked pytest and pluggy docs, and from what I see they only offer the default LIFO queue based on hook registration order, and try_first / try_last methods, each using its own LIFO queue.

One way would be implement a mechanism to put hooks in order on kedro side and then leverage pytest's LIFO order to get reproducible results. Here's one exemplary idea:

  • generate and constantly update the default order of project based + plugin based hooks (for each hook), so that the user is aware what exactly is going on with the project and in which order
  • allow user to enforce the order of all hooks via project config and automatically validate its correctness

Side note: that could actually be used to enrich kedro-viz execution graph, or generate a separate "pipeline execution flow" graph

Side note 2: that could fix #1935 as well.

@noklam
Copy link
Contributor

noklam commented Oct 17, 2022

@foxale I think this is supported already, although we don't have docs to spill this out clearly.

  • In general, I think the order should not matter, they should operate independently. Auto-registered hooks shouldn't need to be aware of the existence of other plugins - this would be the common case.
  • In special cases you need the fine grain control of order - you can import these hooks into your project directly and disable the automatic one.
# settings.py
from kedro_mlflow.hooks import MLflowHooks  # I make these import up but you get the point
DISABLE_HOOKS_FOR_PLUGINS = ("kedro-mlflow",)
HOOKS = (MLflowHook(), YourHook(),)  # This is always a LIFO queue - in fact the LIFO order is inherit from `pytest` already

One thing we may consider is to do the auto-register hook before the project hooks, but this is up for discussion.

@foxale
Copy link
Author

foxale commented Oct 17, 2022

Awesome. I guess that solves most cases (including mine), with the exception of those that require different orders for different hook functions.

@merelcht
Copy link
Member

merelcht commented Jan 3, 2023

Closing this issue as the problem is solved. Feel free to re-open and continue discussing if you encounter further issues with this!

@merelcht merelcht closed this as completed Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants