Skip to content

Commit

Permalink
FIX #29 - Auto-register the kedro-mlflow hooks
Browse files Browse the repository at this point in the history
  • Loading branch information
Galileo-Galilei committed Nov 3, 2020
1 parent 1dda1e4 commit 69f96a2
Show file tree
Hide file tree
Showing 5 changed files with 57 additions and 31 deletions.
23 changes: 11 additions & 12 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,30 +5,29 @@
### Added

- `kedro-mlflow` now supports `kedro>=0.16.5` [#62](https://github.com/Galileo-Galilei/kedro-mlflow/issues/62)
- `kedro-mlflow` hooks can now be declared in `.kedro.yml` or `pyproject.toml` by adding `kedro_mlflow.framework.hooks.mlflow_pipeline_hook` and `kedro_mlflow.framework.hooks.mlflow_node_hook` into the hooks entry. _Only for kedro>=0.16.5_ [#96](https://github.com/Galileo-Galilei/kedro-mlflow/issues/96)
- `kedro-mlflow` now supports configuring the project in `pyproject.toml` ([#96](https://github.com/Galileo-Galilei/kedro-mlflow/issues/96)) (_Only for kedro>=0.16.5_)
- `pipeline_ml_factory` now accepts that `inference` pipeline `inputs` may be in `training` pipeline `inputs` [#71](https://github.com/Galileo-Galilei/kedro-mlflow/issues/71)
- `pipeline_ml_factory` now infer automatically the schema of the input dataset to validate data automatically at inference time. The output schema can be declared manually in `model_signature` argument [#70](https://github.com/Galileo-Galilei/kedro-mlflow/issues/70)
- Add two Datasets for model logging and saving: `MlflowModelLoggerDataSet` and `MlflowModelSaverDataSet` ([#12](https://github.com/Galileo-Galilei/kedro-mlflow/issues/12))
- `MlflowPipelineHook` and `MlflowNodeHook` are now [auto-registered](https://kedro.readthedocs.io/en/latest/07_extend_kedro/04_hooks.html#registering-your-hook-implementations-with-kedro) if you use `kedro>=0.16.4` ([#29](https://github.com/Galileo-Galilei/kedro-mlflow/issues/29))

### Fixed

- `get_mlflow_config` now uses the Kedro `ProjectContext` `ConfigLoader` to get configs [#66](https://github.com/Galileo-Galilei/kedro-mlflow/issues/66). This indirectly solves the following issues:
- `get_mlflow_config` now works in interactive mode if `load_context` is called with a path different from the working directory [#30](https://github.com/Galileo-Galilei/kedro-mlflow/issues/30)
- kedro_mlflow now works fine with kedro jupyter notebook independently of the working directory [#64](https://github.com/Galileo-Galilei/kedro-mlflow/issues/64)
- You can use global variables in `mlflow.yml` which is now properly parsed if you use a `TemplatedConfigLoader` [#72](https://github.com/Galileo-Galilei/kedro-mlflow/issues/72)
- `mlflow init` is now getting conf path from context.CONF_ROOT instead of hardcoded conf folder. This makes the package robust to Kedro changes.
- `MlflowMetricsDataset` now saves in the specified `run_id` instead of the current one when the prefix is not specified [#62](https://github.com/Galileo-Galilei/kedro-mlflow/issues/62)
- `get_mlflow_config` now uses the Kedro `ProjectContext` `ConfigLoader` to get configs ([#66](https://github.com/Galileo-Galilei/kedro-mlflow/issues/66)). This indirectly solves the following issues:
- `get_mlflow_config` now works in interactive mode if `load_context` is called with a path different from the working directory ([#30](https://github.com/Galileo-Galilei/kedro-mlflow/issues/30))
- kedro_mlflow now works fine with kedro jupyter notebook independently of the working directory ([#64](https://github.com/Galileo-Galilei/kedro-mlflow/issues/64))
- You can use global variables in `mlflow.yml` which is now properly parsed if you use a `TemplatedConfigLoader` ([#72](https://github.com/Galileo-Galilei/kedro-mlflow/issues/72))
- `MlflowMetricsDataset` now saves in the specified `run_id` instead of the current one when the prefix is not specified ([#62](https://github.com/Galileo-Galilei/kedro-mlflow/issues/62))
- Other bug fixes and documentation improvements ([#6](https://github.com/Galileo-Galilei/kedro-mlflow/issues/6), [#99](https://github.com/Galileo-Galilei/kedro-mlflow/issues/99))

### Changed

- Documentation reference to the plugin is now dynamic when necessary [#6](https://github.com/Galileo-Galilei/kedro-mlflow/issues/6)
- The test coverage now excludes `tests` and `setup.py` [#99](https://github.com/Galileo-Galilei/kedro-mlflow/issues/99)
- The `KedroPipelineModel` now unpacks the result of the `inference` pipeline and no longer returns a dictionary with the name in the `DataCatalog` but only the predicted value [#93](https://github.com/Galileo-Galilei/kedro-mlflow/issues/93)
- The `PipelineML.extract_pipeline_catalog` is renamed `PipelineML._extract_pipeline_catalog` to indicate it is a private method and is not intended to be used directly by end users [#100](https://github.com/Galileo-Galilei/kedro-mlflow/issues/100)
- The `KedroPipelineModel` now unpacks the result of the `inference` pipeline and no longer returns a dictionary with the name in the `DataCatalog` but only the predicted value ([#93](https://github.com/Galileo-Galilei/kedro-mlflow/issues/93))
- The `PipelineML.extract_pipeline_catalog` is renamed `PipelineML._extract_pipeline_catalog` to indicate it is a private method and is not intended to be used directly by end users who should rely on `PipelineML.extract_pipeline_artifacts` ([#100](https://github.com/Galileo-Galilei/kedro-mlflow/issues/100))

### Removed

- `kedro mlflow init` command is no longer declaring hooks in `run.py`. You must now [register your hooks manually](docs/source/03_tutorial/02_setup.md#declaring-kedro-mlflow-hooks) in the ``run.py`` (kedro > 0.16.0), ``.kedro.yml`` (kedro >= 0.16.5) or ``pyproject.toml`` (kedro >= 0.16.5) ([#62](https://github.com/Galileo-Galilei/kedro-mlflow/issues/62))
- `kedro mlflow init` command is no longer declaring hooks in `run.py`. You must now [register your hooks manually](docs/source/03_tutorial/02_setup.md#declaring-kedro-mlflow-hooks) in the ``run.py`` if you use `kedro>=0.16.0, <0.16.3` ([#62](https://github.com/Galileo-Galilei/kedro-mlflow/issues/62)).
- Remove `pipeline_ml` function which was deprecated in 0.3.0. It is now replaced by `pipeline_ml_factory` ([#105](https://github.com/Galileo-Galilei/kedro-mlflow/issues/105))
- Remove `MlflowDataSet` dataset which was deprecated in 0.3.0. It is now replaced by `MlflowArtifactDataSet` ([#105](https://github.com/Galileo-Galilei/kedro-mlflow/issues/105))

Expand Down
16 changes: 14 additions & 2 deletions docs/source/02_hello_world_example/01_example_project.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,36 @@
# Example project

## Check your installation

Create a conda environment and ``kedro-mlflow`` (this will automatically install ``kedro>=0.16.0``).

```console
conda create -n km_example python=3.6.8 --yes
conda activate km_example
pip install kedro-mlflow
```

## Install the toy project

For this end to end example, we will use the [kedro starter](https://kedro.readthedocs.io/en/latest/02_getting_started/05_starters.html#creating-new-projects-with-kedro-starters) with the [iris dataset](https://github.com/quantumblacklabs/kedro-starter-pandas-iris).

We use this project because:

- it covers most of the common use cases
- it is compatible with older version of ``Kedro`` so newcomers are used to it
- it is maintained by ``Kedro`` maintainers and therefore enforces some best practices.

### Installation with ``kedro>=0.16.3``

The default starter is now called "pandas-iris". In a new console, enter:

```console
kedro new --starter=pandas-iris
```

Answer ``Kedro Mlflow Example``, ``km-example`` and ``km_example`` to the three setup questions of a new kedro project:
```

```console
Project Name:
=============
Please enter a human readable name for your new project.
Expand All @@ -46,6 +55,7 @@ Lowercase is recommended. Package name must start with a letter or underscore.
### Installation with ``kedro>=0.16.0, <=0.16.2``

With older versions of ``Kedro``, the starter option is not available, but this ``kedro new`` provides an "Include example" question. Answer ``y`` to this question to get the same starter as above. In a new console, enter:

```console
kedro new
```
Expand Down Expand Up @@ -83,7 +93,8 @@ Good for first-time users. (default=N)
# Install dependencies

Move to the project directory:
```

```console
cd km-example
```

Expand All @@ -93,4 +104,5 @@ Install the project dependencies:
pip install -r src/requirements.txt
pip install --upgrade kedro-mlflow==0.3.0
```

**Warning: Do not use ``kedro install`` commands does not seem to install the packages in your activated environment.**
43 changes: 27 additions & 16 deletions docs/source/03_tutorial/02_setup.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
# Setup your Kedro project

## Check the installation

Type ``kedro info`` in a terminal to check if the plugin is properly discovered by ``Kedro``. If the installation has succeeded, you should see the following ascii art:
```

```console
_ _
| | _____ __| |_ __ ___
| |/ / _ \/ _` | '__/ _ \
Expand All @@ -16,33 +19,39 @@ the Kedro initiative at QuantumBlack.
Installed plugins:
kedro_mlflow: <kedro-mlflow-version> (hooks:global,project)
```

The version ``<kedro-mlflow-version>`` of the plugin is installed ans has both global and project commands.

That's it! You are now ready to go!

## Create a kedro project

This plugins must be used in an existing kedro project. If you do not have a kedro project yet, you can create it with ``kedro new`` command. [See the kedro docs for a tutorial](https://kedro.readthedocs.io/en/latest/02_getting_started/03_new_project.html).

For this tutorial and if you do not have a real-world project, I strongly suggest that you accept to include the proposed example to make a demo of this plugin out of the box.

## Activate `kedro-mlflow` in your kedro project

In order to use the ``kedro-mlflow`` plugin, you need to set up the its configuration and declare its hooks. those 2 actions are detailled in the following paragraph.

### Setting up the kedro-mlflow configuration file
``kedro-mlflow`` is [configured](../05_python_objects/05_Configuration.md) through an ``mlflow.yml`` file. The recommended way to initialize the `mlflow.yml` is by using [the kedro-mlflow CLI](../05_python_objects/04_CLI.md).

``kedro-mlflow`` is [configured](../05_python_objects/05_Configuration.md) through an ``mlflow.yml`` file. The recommended way to initialize the `mlflow.yml` is by using [the kedro-mlflow CLI](../05_python_objects/04_CLI.md). **It is mandatory for the plugin to work.**

Set the working directory at the root of your kedro project (i.e. the folder with the ``.kedro.yml`` file)

```console
$ cd path/to/your/project
cd path/to/your/project
```

Run the init command :

```console
$ kedro mlflow init
kedro mlflow init
```

you should see the following message:

```console
'conf/base/mlflow.yml' successfully updated.
```
Expand All @@ -51,7 +60,13 @@ you should see the following message:

``kedro_mlflow`` hooks implementations must be registered with Kedro. There are three ways of registring [hooks](https://kedro.readthedocs.io/en/latest/07_extend_kedro/04_hooks.html?highlight=hooks).

#### - Declaring hooks through code, in ``ProjectContext``
**Note that you must register the two hooks provided by kedro-mlflow** (``MlflowPipelineHook`` and ``MlflowNodeHook``) for the plugin to work.

#### - Declaring hooks through auto-discovery (for `kedro>=0.16.4`)

If you use `kedro>=0.16.4`, `kedro-mlflow` hooks are auto-registered automatically by default without any action from your side. You can [disable this behaviour](https://kedro.readthedocs.io/en/latest/07_extend_kedro/04_hooks.html#disable-auto-registered-plugins-hooks) in your `.kedro.yml` or your `pyproject.toml` file.

#### - Declaring hooks through code, in ``ProjectContext`` (for `kedro>=0.16.0, <=0.16.3`)

By declaring `mlflow_pipeline_hook` and `mlflow_node_hook` in ``(src/package_name/run.py) ProjectContext``:

Expand All @@ -70,31 +85,27 @@ class ProjectContext(KedroContext):
mlflow_node_hook
)
```
#### - Declaring hooks through static configuration in `.kedro.yml` or `pyproject.toml` **[Only for kedro >= 0.16.5]**

By declaring `mlflow_pipeline_hook` and `mlflow_node_hook` in ``.kedro.yml`` :
#### - Declaring hooks through static configuration in `.kedro.yml` or `pyproject.toml` **[Only for kedro >= 0.16.5 if you have disabled auto-registration]**

```
In case you have disabled hooks for plugin, you can add them manually by declaring `mlflow_pipeline_hook` and `mlflow_node_hook` in ``.kedro.yml`` :

```yaml
context_path: km_example.run.ProjectContext
project_name: "km_example"
project_version: "0.16.5"
package_name: "km_example"
hooks:
- km_example.hooks.project_hooks
- <your-project>.hooks.project_hooks
- kedro_mlflow.framework.hooks.mlflow_pipeline_hook
- kedro_mlflow.framework.hooks.mlflow_node_hook
```
Or by declaring `mlflow_pipeline_hook` and `mlflow_node_hook` in ``pyproject.toml`` :

```
# <your_project>/pyproject.toml
```yaml
# <your-project>/pyproject.toml
[tool.kedro]
hooks=["kedro_mlflow.framework.hooks.mlflow_pipeline_hook",
"kedro_mlflow.framework.hooks.mlflow_node_hook"]
```

#### - Declaring hooks through auto-discovery **[Coming soon]**


**Note that you must register both hooks for the plugin to work**
2 changes: 1 addition & 1 deletion requirements/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
mlflow>=1.0.0, <2.0.0
kedro>=0.16.0, <=0.16.5
kedro>=0.16.0, <=0.17.0
4 changes: 4 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ def _parse_requirements(path, encoding="utf-8"):
"kedro.global_commands": [
"kedro_mlflow = kedro_mlflow.framework.cli.cli:commands"
],
"kedro.hooks": [
"mlflow_pipeline_hook = kedro_mlflow.framework.hooks.pipeline_hook:mlflow_pipeline_hook",
"mlflow_node_hooks = kedro_mlflow.framework.hooks.node_hook:mlflow_node_hook",
],
},
zip_safe=False,
keywords="kedro plugin, mlflow, model versioning, model packaging, pipelines, machine learning, data pipelines, data science, data engineering",
Expand Down

0 comments on commit 69f96a2

Please sign in to comment.