Skip to content

Commit

Permalink
create line magic to debug a node in notebook workflow (#3510)
Browse files Browse the repository at this point in the history
* update notes

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* add some basic structure for the debugging magic

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* add demo code

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* lint

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* implement the _find_node and _prepare_node_inputs function

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* implement _prepare_imports

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* clean up the import function to use the real function

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* separate the cells into different part

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* update dependencies

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* add test structure

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* Lint

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add some tests and placeholders

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add more tests

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Even more tests

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* And more tests

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Remove placeholders

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* test remove condition

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* add more dcostring

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* add logs

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* refactor the test and fix imports

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* more tests fixed

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* refacto tests with list of string with """

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* Fix node

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* replace test with triple quotes string

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* rename function to the _prepare pattern

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* fix more test

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* skip tests

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* Lint

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add ipylab to test requirements

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Fix missing syntax

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Apply suggestion from code review

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Remove redundant TODOs

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Fix handling node with lambda function

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Try pin pluggy

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* refactor the find_node method with pipeline as argument and tests

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* Update kedro/ipython/__init__.py

Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* Re-import mocked object

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Remove try-catch

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Rename overwritten varaible

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add warnings and simplify tests (#3568)

* Simplify mocking

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Check node func names

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Naive fix for return statements

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Handle nested case

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Change pipelines fixture type

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Remove unnecessary TODO

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Revert "Check node func names"

This reverts commit 63ee194.

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Replace commented return statements with a display() statement

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add warning about node name when node not found

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add line about debugging inputs in catalog

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Lint

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Change wording

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Revert "Replace commented return statements with a display() statement"

This reverts commit ad63afc.

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Revert "Naive fix for return statements"

This reverts commit 04c022e.

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update tests

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

---------

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add universal warnings

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* Change to copy full function definition instead of just function body

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* 1 down, 4 more tests to fix

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* fix extra empty space

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* fix tests

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* Update release notes and some typos

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

---------

Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Co-authored-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com>
Co-authored-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
  • Loading branch information
4 people authored Feb 2, 2024
1 parent a0abbd1 commit 99348e6
Show file tree
Hide file tree
Showing 5 changed files with 383 additions and 5 deletions.
1 change: 1 addition & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Upcoming Release 0.19.3

## Major features and improvements
* Create the debugging line magic `%load_node` for Jupyter Notebook and Jupyter Lab.

## Bug fixes and other changes
* Updated CLI Command `kedro catalog resolve` to work with dataset factories that use `PartitionedDataset`.
Expand Down
140 changes: 139 additions & 1 deletion kedro/ipython/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,13 @@
"""
from __future__ import annotations

import inspect
import logging
import sys
import typing
import warnings
from pathlib import Path
from typing import Any
from typing import Any, Callable

import IPython
from IPython.core.magic import needs_local_scope, register_line_magic
Expand All @@ -19,11 +21,13 @@
from kedro.framework.cli.utils import ENV_HELP, _split_params
from kedro.framework.project import (
LOGGING, # noqa: F401
_ProjectPipelines,
configure_project,
pipelines,
)
from kedro.framework.session import KedroSession
from kedro.framework.startup import _is_project, bootstrap_project
from kedro.pipeline.node import Node

logger = logging.getLogger(__name__)

Expand All @@ -36,6 +40,9 @@ def load_ipython_extension(ipython: Any) -> None:
See https://ipython.readthedocs.io/en/stable/config/extensions/index.html
"""
ipython.register_magic_function(magic_reload_kedro, magic_name="reload_kedro")
logger.info("Registered line magic 'reload_kedro'")
ipython.register_magic_function(magic_load_node, magic_name="load_node")
logger.info("Registered line magic 'load_node'")

if _find_kedro_project(Path.cwd()) is None:
logger.warning(
Expand Down Expand Up @@ -178,3 +185,134 @@ def _find_kedro_project(current_dir: Path) -> Any: # pragma: no cover
current_dir = current_dir.parent

return None


@typing.no_type_check
@magic_arguments()
@argument(
"node",
type=str,
help=("Name of the Node."),
nargs="?",
default=None,
)
def magic_load_node(node: str) -> None:
"""The line magic %load_node <node_name>
Currently it only supports Jupyter Notebook (>7.0) and Jupyter Lab. This line magic
will generate code in multiple cells to load datasets from `DataCatalog`, import
relevant functions and modules, node function definition and a function call.
"""
cells = _load_node(node, pipelines)
from ipylab import JupyterFrontEnd

app = JupyterFrontEnd()

def _create_cell_with_text(text: str) -> None:
# Noted this only works with Notebook >7.0 or Jupyter Lab. It doesn't work with
# VS Code Notebook due to imcompatible backends.
app.commands.execute("notebook:insert-cell-below")
app.commands.execute("notebook:replace-selection", {"text": text})

for cell in cells:
_create_cell_with_text(cell)


def _load_node(node_name: str, pipelines: _ProjectPipelines) -> list[str]:
"""Prepare the code to load dataset from catalog, import statements and function body.
Args:
node_name (str): The name of the node.
Returns:
list[str]: A list of string which is the generated code, each string represent a
notebook cell.
"""
warnings.warn(
"This is an experimental feature, only Jupyter Notebook (>7.0) & Jupyter Lab "
"are supported. If you encounter unexpected behaviour or would like to suggest "
"feature enhancements, add it under this github issue https://github.com/kedro-org/kedro/issues/3580"
)
node = _find_node(node_name, pipelines)
node_func = node.func

node_inputs = _prepare_node_inputs(node)
imports = _prepare_imports(node_func)
function_definition = _prepare_function_body(node_func)
function_call = _prepare_function_call(node_func)

cells: list[str] = []
cells.append(node_inputs)
cells.append(imports)
cells.append(function_definition)
cells.append(function_call)
return cells


def _find_node(node_name: str, pipelines: _ProjectPipelines) -> Node:
for pipeline in pipelines.values():
try:
found_node: Node = pipeline.filter(node_names=[node_name]).nodes[0]
return found_node
except ValueError:
continue
# If reached the node was not found in the project
raise ValueError(
f"Node with name='{node_name}' not found in any pipelines. Remember to specify the node name, not the node function."
)


def _prepare_imports(node_func: Callable) -> str:
"""Prepare the import statements for loading a node."""
python_file = inspect.getsourcefile(node_func)
logger.info(f"Loading node definition from {python_file}")

# Confirm source file was found
if python_file:
import_statement = []
with open(python_file) as file:
# Parse any line start with from or import statement
for line in file.readlines():
if line.startswith("from") or line.startswith("import"):
import_statement.append(line.strip())

clean_imports = "\n".join(import_statement).strip()
return clean_imports
else:
raise FileNotFoundError(f"Could not find {node_func.__name__}")


def _prepare_node_inputs(node: Node) -> str:
node_func = node.func
signature = inspect.signature(node_func)

node_inputs = node.inputs
func_params = list(signature.parameters)

statements = [
"# Prepare necessary inputs for debugging",
"# All debugging inputs must be defined in your project catalog",
]

for node_input, func_param in zip(node_inputs, func_params):
statements.append(f'{func_param} = catalog.load("{node_input}")')

input_statements = "\n".join(statements)
return input_statements


def _prepare_function_body(func: Callable) -> str:
source_lines, _ = inspect.getsourcelines(func)
body = "".join(source_lines)
return body


def _prepare_function_call(node_func: Callable) -> str:
"""Prepare the text for the function call."""
func_name = node_func.__name__
signature = inspect.signature(node_func)
func_params = list(signature.parameters)

# Construct the statement of func_name(a=1,b=2,c=3)
func_args = ", ".join(func_params)
body = f"""{func_name}({func_args})"""
return body
9 changes: 7 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ dependencies = [
"more_itertools>=8.14.0",
"omegaconf>=2.1.1",
"parse>=1.19.0",
"pluggy>=1.0",
"pluggy>=1.0, <1.4.0",
"pre-commit-hooks",
"PyYAML>=4.2,<7.0",
"rich>=12.0,<14.0",
Expand Down Expand Up @@ -56,6 +56,7 @@ test = [
"behave==1.2.6",
"coverage[toml]",
"import-linter==2.0",
"ipylab>=1.0.0",
"ipython>=7.31.1, <8.0; python_version < '3.8'",
"ipython~=8.10; python_version >= '3.8'",
"jupyterlab_server>=2.11.1",
Expand Down Expand Up @@ -102,7 +103,11 @@ docs = [
"sphinx-favicon",
"sphinxcontrib-youtube",
]
all = [ "kedro[test,docs]" ]
jupyter = [
"ipylab>=1.0.0",
"notebook>=7.0.0" # requires the new share backend of notebook and labs"
]
all = [ "kedro[test,docs,jupyter]" ]

[project.urls]
Homepage = "https://kedro.org"
Expand Down
2 changes: 1 addition & 1 deletion tests/framework/project/test_pipeline_registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def test_pipelines_without_configure_project_is_empty(
mock_package_name_with_pipelines_file,
):
# Reimport `pipelines` from `kedro.framework.project` to ensure that
# it was not set by a pior call to the `configure_project` function.
# it was not set by a prior call to the `configure_project` function.
del sys.modules["kedro.framework.project"]
from kedro.framework.project import pipelines

Expand Down
Loading

0 comments on commit 99348e6

Please sign in to comment.