Skip to content

Commit

Permalink
[KED-1959] Implement hook integration for other library components: C…
Browse files Browse the repository at this point in the history
…onfigLoader (#761)
  • Loading branch information
Lorena Bălan authored Aug 25, 2020
1 parent 66e3a13 commit 5484b9b
Show file tree
Hide file tree
Showing 16 changed files with 438 additions and 263 deletions.
3 changes: 1 addition & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -222,8 +222,7 @@ commands:
name: Install dependencies
command: |
conda activate kedro_builder
cat *requirements.txt | Select-String -Pattern behave,psutil,requests[^\-],^pandas[^\-],cachetools,pluggy,toposort,yaml | %{ $_ -Replace "#.*", "" } > e2e.txt
pip install -r e2e.txt
pip install -r features/windows_reqs.txt
choco install make
- run:
name: Run e2e tests
Expand Down
6 changes: 5 additions & 1 deletion RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,12 @@
# Upcoming Release 0.16.5

## Major features and improvements
* Added `register_pipelines()`, a new hook to register a project's pipelines. The order of execution is: plugin hooks, `.kedro.yml` hooks, hooks in `ProjectContext.hooks`.
* Added support for `pyproject.toml` to configure Kedro. `pyproject.toml` is used if `.kedro.yml` doesn't exist (Kedro configuration should be under `[tool.kedro]` section).
* Projects created with this version will have no `pipeline.py`, having been replaced by `hooks.py`.
* Added a set of registration hooks, as the new way of registering library components with a Kedro project:
* `register_pipelines()`, to replace `_get_pipelines()`
* `register_config_loader(conf_paths)`, to replace `_create_config_loader()`
These can be defined in `src/<package-name>/hooks.py` and added to `.kedro.yml` (or `pyproject.toml`). The order of execution is: plugin hooks, `.kedro.yml` hooks, hooks in `ProjectContext.hooks`.

## Bug fixes and other changes
* `project_name`, `project_version` and `package_name` now have to be defined in `.kedro.yml` for the projects generated using Kedro 0.16.5+.
Expand Down
7 changes: 4 additions & 3 deletions docs/source/04_kedro_project_setup/02_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,14 +71,15 @@ export KEDRO_ENV=test
## Templating configuration

Kedro also provides an extension [TemplatedConfigLoader](/kedro.config.TemplatedConfigLoader) class that allows to template values in your configuration files. `TemplatedConfigLoader` is available in `kedro.config`, to apply templating to your `ProjectContext` in `src/<project-name>/run.py`, you will need to overwrite the `_create_config_loader` method as follows:
Kedro also provides an extension [TemplatedConfigLoader](/kedro.config.TemplatedConfigLoader) class that allows to template values in your configuration files. `TemplatedConfigLoader` is available in `kedro.config`, to apply templating to your project, you will need to update the `register_config_loader` hook implementation in your `src/<project-name>/hooks.py`:

```python
from kedro.config import TemplatedConfigLoader # new import


class ProjectContext(KedroContext):
def _create_config_loader(self, conf_paths: Iterable[str]) -> TemplatedConfigLoader:
class ProjectHooks:
@hook_impl
def register_config_loader(self, conf_paths: Iterable[str]) -> ConfigLoader:
return TemplatedConfigLoader(
conf_paths,
globals_pattern="*globals.yml", # read the globals dictionary from project config
Expand Down
1 change: 1 addition & 0 deletions docs/source/07_extend_kedro/04_hooks.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ The naming convention for error hooks is `on_<noun>_error`, in which:
In addition, Kedro defines Hook specifications to register certain library components to be used with the project. This is where users can define their custom class implementations. Currently, the following Hook specifications are provided:

* `register_pipelines`
* `register_config_loader`

The naming convention for registration hooks is `register_<library_component>`.

Expand Down
5 changes: 5 additions & 0 deletions features/steps/hooks_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@

import pandas as pd

from kedro.config import ConfigLoader
from kedro.framework.hooks import hook_impl
from kedro.pipeline import Pipeline, node

Expand Down Expand Up @@ -80,5 +81,9 @@ def register_pipelines(self): # pylint: disable=no-self-use

return {"__default__": example_pipeline}

@hook_impl
def register_config_loader(self, conf_paths): # pylint: disable=no-self-use
return ConfigLoader(conf_paths)


project_hooks = ProjectHooks()
12 changes: 12 additions & 0 deletions features/windows_reqs.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# same versions as `test_requirements`
# e2e tests on Windows are slow but we don't need to install
# everything, so just this subset will be enough for CI
behave==1.2.6
cachetools~=4.1
jmespath>=0.9.5, <1.0
pandas>=0.24.0, <1.0.4
pluggy~=0.13.0
psutil==5.6.6
requests~=2.20
toposort~=1.5
PyYAML>=4.2, <6.0
23 changes: 7 additions & 16 deletions kedro/config/templated_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,34 +55,25 @@ class TemplatedConfigLoader(ConfigLoader):
wrapped in brackets like: ${...}, to be automatically formatted
based on the configs.
The easiest way to use this class is by incorporating it into the
``KedroContext``. This can be done by extending the ``KedroContext`` and overwriting
the config_loader method, making it return a ``TemplatedConfigLoader``
object instead of a ``ConfigLoader`` object.
For this method to work, the context_path variable in `.kedro.yml` (if exists) or
in `pyproject.toml` under `[tool.kedro]` section needs to be pointing at this newly
created class. The `run.py` script has an extension of the ``KedroContext`` by default,
called the ``ProjectContext``.
The easiest way to use this class is by registering it into the
``KedroContext`` using hooks. This can be done by updating the
hook implementation `register_config_loader` in `hooks.py`, making it return
a ``TemplatedConfigLoader`` object instead of a ``ConfigLoader`` object.
Example:
::
>>> from kedro.framework.context import KedroContext, load_context
>>> from kedro.config import TemplatedConfigLoader
>>>
>>>
>>> class MyNewContext(KedroContext):
>>>
>>> def _create_config_loader(self, conf_paths: Iterable[str]) -> TemplatedConfigLoader:
>>> class ProjectHooks:
>>> @hook_impl
>>> def register_config_loader(self, conf_paths: Iterable[str]) -> ConfigLoader:
>>> return TemplatedConfigLoader(
>>> conf_paths,
>>> globals_pattern="*globals.yml",
>>> globals_dict={"param1": "pandas.CSVDataSet"}
>>> )
>>>
>>> my_context = load_context(Path.cwd(), env=env)
>>> my_context.run(tags, runner, node_names, from_nodes, to_nodes)
The contents of the dictionary resulting from the `globals_pattern` get
merged with the ``globals_dict``. In case of conflicts, the keys in
Expand Down
10 changes: 7 additions & 3 deletions kedro/framework/context/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,10 +252,10 @@ def __init__(

self.env = env or "local"
self._extra_params = deepcopy(extra_params)
self._setup_logging()

# setup hooks
self._register_hooks(auto=True)
# we need a ConfigLoader registered in order to be able to set up logging
self._setup_logging()

@property
def static_data(self) -> Dict[str, Any]:
Expand Down Expand Up @@ -521,7 +521,11 @@ def _create_config_loader( # pylint: disable=no-self-use
Instance of `ConfigLoader`.
"""
return ConfigLoader(conf_paths)
hook_manager = get_hook_manager()
config_loader = hook_manager.hook.register_config_loader( # pylint: disable=no-member
conf_paths=conf_paths
)
return config_loader or ConfigLoader(conf_paths) # for backwards compatibility

def _get_config_loader(self) -> ConfigLoader:
"""A hook for changing the creation of a ConfigLoader instance.
Expand Down
3 changes: 2 additions & 1 deletion kedro/framework/hooks/manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
from pluggy import PluginManager

from .markers import HOOK_NAMESPACE
from .specs import DataCatalogSpecs, NodeSpecs, PipelineSpecs
from .specs import DataCatalogSpecs, NodeSpecs, PipelineSpecs, RegistrationSpecs

_hook_manager = None

Expand All @@ -44,6 +44,7 @@ def _create_hook_manager() -> PluginManager:
manager.add_hookspecs(NodeSpecs)
manager.add_hookspecs(PipelineSpecs)
manager.add_hookspecs(DataCatalogSpecs)
manager.add_hookspecs(RegistrationSpecs)
return manager


Expand Down
38 changes: 27 additions & 11 deletions kedro/framework/hooks/specs.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,9 @@
[Pluggy's documentation](https://pluggy.readthedocs.io/en/stable/#specs)
"""
# pylint: disable=too-many-arguments
from typing import Any, Dict
from typing import Any, Dict, Iterable

from kedro.config import ConfigLoader
from kedro.io import DataCatalog
from kedro.pipeline import Pipeline
from kedro.pipeline.node import Node
Expand Down Expand Up @@ -156,16 +157,6 @@ def on_node_error(
class PipelineSpecs:
"""Namespace that defines all specifications for a pipeline's lifecycle hooks."""

@hook_spec
def register_pipelines(self) -> Dict[str, Pipeline]:
"""Hook to be invoked to register a project's pipelines.
Returns:
A mapping from a pipeline name to a ``Pipeline`` object.
"""
pass

@hook_spec
def before_pipeline_run(
self, run_params: Dict[str, Any], pipeline: Pipeline, catalog: DataCatalog
Expand Down Expand Up @@ -261,3 +252,28 @@ def on_pipeline_error(
catalog: The ``DataCatalog`` used during the run.
"""
pass


class RegistrationSpecs:
"""Namespace that defines all specifications for hooks registering
library components with a Kedro project.
"""

@hook_spec
def register_pipelines(self) -> Dict[str, Pipeline]:
"""Hook to be invoked to register a project's pipelines.
Returns:
A mapping from a pipeline name to a ``Pipeline`` object.
"""
pass

@hook_spec(firstresult=True)
def register_config_loader(self, conf_paths: Iterable[str]) -> ConfigLoader:
"""Hook to be invoked to register a project's config loader.
Returns:
An instance of a ``ConfigLoader``.
"""
pass
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,9 @@
# limitations under the License.

"""Project hooks."""
from typing import Dict
from typing import Dict, Iterable

from kedro.config import ConfigLoader
from kedro.framework.hooks import hook_impl
from kedro.pipeline import Pipeline
{%- if cookiecutter.include_example == "True" %}
Expand Down Expand Up @@ -57,5 +58,9 @@ def register_pipelines(self) -> Dict[str, Pipeline]:
}
{%- else -%}return {"__default__": Pipeline([])}{%- endif %}

@hook_impl
def register_config_loader(self, conf_paths: Iterable[str]) -> ConfigLoader:
return ConfigLoader(conf_paths)


project_hooks = ProjectHooks()
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,4 @@ forbidden_modules =
ignore_imports=
kedro.runner.parallel_runner -> kedro.framework.context.context
kedro.framework.context.context -> kedro.config
kedro.framework.hooks.specs -> kedro.config
1 change: 1 addition & 0 deletions test_requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ pyarrow>=0.12.0, <1.0.0
pylint>=2.5.2, <3.0
pyspark~=2.2; python_version < '3.8'
pytest-cov~=2.5
pytest-lazy-fixture~=0.6.3
pytest-mock>=1.7.1,<2.0
pytest~=5.0
requests-mock~=1.6
Expand Down
Loading

0 comments on commit 5484b9b

Please sign in to comment.