Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add auto-registration for plugin hooks #29

Closed
Galileo-Galilei opened this issue Jul 21, 2020 · 4 comments · Fixed by #111
Closed

Add auto-registration for plugin hooks #29

Galileo-Galilei opened this issue Jul 21, 2020 · 4 comments · Fixed by #111
Assignees
Labels
enhancement New feature or request need-design-decision Several ways of implementation are possible and one must be chosen
Milestone

Comments

@Galileo-Galilei
Copy link
Owner

This is a new feature discussed in kedro-org/kedro#435 and recently added to master through kedro-org/kedro@49834dd.

This seems to be a better way to register the basic hooks to avoid messing up with the template. Some design decisions should be made:

  • Deal with retrocompatibility
  • Define where the arguments should be registered, likely in the mlflow.yml
@Galileo-Galilei Galileo-Galilei added enhancement New feature or request need-design-decision Several ways of implementation are possible and one must be chosen labels Jul 21, 2020
@Galileo-Galilei Galileo-Galilei self-assigned this Jul 21, 2020
@Galileo-Galilei
Copy link
Owner Author

In the current version (kedro-mlflow==0.2.1), the command kedro mlflow init modifies the src/<python-package>/run.py of the template of the project and adds to Hooks:

  • MlflowPipelineHook with 2 arguments, model_name and conda_env
  • MlflowNodeHook with 1 argument, flatten_dict_params

I feel that auto-registration would be much beter than messing up wih the project template because:

  • these Hooks are mandatory for the plugin to be used so they must be registered whatever happens
  • the kedro mlflow init command cannot modify the template if the user has made custom modifications in it, and therefore hook registration must be manual in this case
  • the kedro mlflow init command makes a check to see if the template has been modified, but the template vary between kedro versions, and it is hard to keep up (+we need to store different template versions).

As a consequence auto-registration would made maintenance much easier, increase compatibility with different kedro versions and improve user experience. I wish we integrate it. However:

  • we want to keep retrocompatibility with all kedro versions until 0.16. Basically, it means that we would keep the kedro mlflow init command to update the src/<python-package>/run.py but remove the message which ask to update the template if kedro version is above 0.16.4. Whatever, this message should be reworked because of Warning message appears when calling kedro mlflow init #14. @kaemo @akruszewski do you think we should enforce retrocompatibility this way (at the price of extra code) or not ? We could just pin kedro>=0.16.4 in the requirement for kedro-mlflow above 0.3.0 for instance.
  • we need to decide what to do with the Hooks parameters because the user wil no longer access them. In my opinion:
    • model_name and conda_env are only used if a PipelineML is registered. They have nothing to do in the Hook and they should be moved to pipeline_ml function. This would be more consistent to declare everything in the same place + it would enable to have several pipeline_ml in the same project (which is realistic).
    • flatten_dict_params is used to flatten parameters which are Dict because mlflow has a maximum length for parameters and big parameters dictionnary may be too long (e.g hyperparameters={param1: x, param2: y} is logged as hyperparameters.param1 and hyperparameters.param2). I don't know what is the default behaviour to enforce: flattening can deal with all situations but modifies slightly the names used which may be confusing for the user. Apart from the default beahaviour, we should give a way to modify it to the user. Maybe we can add an entry in the mlflow.yml file? @kaemo @akruszewski what do you think?

@turn1a
Copy link
Contributor

turn1a commented Sep 3, 2020

We've used automatic hook discovery feature in our (currently) inner source kedro-pandera plugin and it works well, although there's a bug that breaks kedro-viz which I submitted today so we should wait until it's resolved. When it's fixed we should definitely pursue automatic hook discovery route as it simplifies plugin setup massively.

In terms of retrocompatibility, I think that kedro-mlflow is at such an early stage at this point that I don't think it's needed. In my opinion, we should pin kedro version to the minimum required in 0.3.0 and make it clear in CHANGELOG that new version requires some migration activities for current users.

As for the PipelineML I agree that those parameters should go into pipeline_ml and multiple pipeline_ml's are definitely a realistic scenario which we should consider. flatten_dict_params in mlflow.yml sounds good too.

@Galileo-Galilei
Copy link
Owner Author

Thanks for the information regarding the kedro-viz bug, we'll wait until it is resolved.

Agreed with everything, I start separated PR.

@Galileo-Galilei
Copy link
Owner Author

kedro-org/kedro-viz#260 is now fixed, We can add this in the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request need-design-decision Several ways of implementation are possible and one must be chosen
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

2 participants