Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create line magic to debug a node in notebook workflow #3510

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
04410b2
update notes
noklam Jan 15, 2024
a79386b
add some basic structure for the debugging magic
noklam Jan 15, 2024
a93e5cc
add demo code
noklam Jan 16, 2024
b251fb7
lint
noklam Jan 16, 2024
4b31eb9
implement the _find_node and _prepare_node_inputs function
noklam Jan 16, 2024
6c47fe4
implement _prepare_imports
noklam Jan 16, 2024
feba68b
clean up the import function to use the real function
noklam Jan 16, 2024
6dd21fb
separate the cells into different part
noklam Jan 16, 2024
d0c0633
update dependencies
noklam Jan 17, 2024
ba8fb09
add test structure
noklam Jan 17, 2024
7b041a9
Merge branch 'main' into 2009-create-line-magic-to-debug-a-node-in-no…
AhdraMeraliQB Jan 18, 2024
702e017
Lint
Jan 18, 2024
36a0e2a
Merge branch '2009-create-line-magic-to-debug-a-node-in-notebook-work…
Jan 18, 2024
dbac325
Merge branch 'main' into 2009-create-line-magic-to-debug-a-node-in-no…
AhdraMeraliQB Jan 18, 2024
ff10816
Merge branch 'main' into 2009-create-line-magic-to-debug-a-node-in-no…
AhdraMeraliQB Jan 18, 2024
0145154
Add some tests and placeholders
Jan 19, 2024
8a86403
Add more tests
Jan 19, 2024
474c0fc
Even more tests
Jan 19, 2024
be99c0e
And more tests
Jan 19, 2024
72411be
Remove placeholders
Jan 19, 2024
b4be9a7
test remove condition
Jan 19, 2024
1819261
add more dcostring
noklam Jan 22, 2024
3c343b2
add logs
noklam Jan 22, 2024
12e7f04
refactor the test and fix imports
noklam Jan 22, 2024
4e95bde
more tests fixed
noklam Jan 22, 2024
e1a6f39
refacto tests with list of string with """
noklam Jan 22, 2024
17e47d1
Fix node
noklam Jan 22, 2024
e8c85c9
replace test with triple quotes string
noklam Jan 22, 2024
a46139e
rename function to the _prepare pattern
noklam Jan 22, 2024
9487911
fix more test
noklam Jan 22, 2024
6c69a36
skip tests
noklam Jan 22, 2024
d1bb8e1
Lint
Jan 24, 2024
80e06ad
Add ipylab to test requirements
Jan 24, 2024
43a70b6
Fix missing syntax
Jan 24, 2024
68c7b97
Apply suggestion from code review
Jan 24, 2024
d3163ed
Remove redundant TODOs
Jan 24, 2024
9adf9ab
Fix handling node with lambda function
Jan 24, 2024
a92c462
Try pin pluggy
Jan 24, 2024
c5fc1fc
Merge branch 'main' into 2009-create-line-magic-to-debug-a-node-in-no…
AhdraMeraliQB Jan 24, 2024
688a4e6
refactor the find_node method with pipeline as argument and tests
noklam Jan 24, 2024
87b43a2
Merge branch '2009-create-line-magic-to-debug-a-node-in-notebook-work…
noklam Jan 24, 2024
25dec21
Update kedro/ipython/__init__.py
noklam Jan 25, 2024
fae8aa7
Merge branch 'main' into 2009-create-line-magic-to-debug-a-node-in-no…
noklam Jan 26, 2024
70dc53a
Re-import mocked object
Jan 29, 2024
7581c56
Merge branch 'main' into 2009-create-line-magic-to-debug-a-node-in-no…
AhdraMeraliQB Jan 29, 2024
28ba1ba
Remove try-catch
Jan 29, 2024
0d69fc6
Rename overwritten varaible
Jan 29, 2024
9d12c8d
Merge branch 'main' into 2009-create-line-magic-to-debug-a-node-in-no…
AhdraMeraliQB Jan 29, 2024
78ff496
Add warnings and simplify tests (#3568)
AhdraMeraliQB Feb 1, 2024
dbbe81b
Merge branch 'main' into 2009-create-line-magic-to-debug-a-node-in-no…
AhdraMeraliQB Feb 1, 2024
e39942c
Add universal warnings
noklam Feb 1, 2024
efb6fa8
Change to copy full function definition instead of just function body
noklam Feb 1, 2024
de6198d
1 down, 4 more tests to fix
noklam Feb 1, 2024
f4c69cd
fix extra empty space
noklam Feb 1, 2024
c4cb40c
fix tests
noklam Feb 1, 2024
3e9cbf1
Merge branch 'main' into 2009-create-line-magic-to-debug-a-node-in-no…
noklam Feb 1, 2024
b16dda9
Merge branch 'main' into 2009-create-line-magic-to-debug-a-node-in-no…
AhdraMeraliQB Feb 1, 2024
c9dc606
Update release notes and some typos
noklam Feb 2, 2024
bb50792
Merge branch '2009-create-line-magic-to-debug-a-node-in-notebook-work…
noklam Feb 2, 2024
d4d14d0
Merge branch 'main' into 2009-create-line-magic-to-debug-a-node-in-no…
noklam Feb 2, 2024
fa2951c
update docstring
noklam Feb 2, 2024
90fe58f
Merge branch '2009-create-line-magic-to-debug-a-node-in-notebook-work…
noklam Feb 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Upcoming Release 0.19.3

## Major features and improvements
* Create the debugging line magic `%load_node` for Jupyter Notebook and Jupyter Lab.

## Bug fixes and other changes
* Updated CLI Command `kedro catalog resolve` to work with dataset factories that use `PartitionedDataset`.
Expand Down
140 changes: 139 additions & 1 deletion kedro/ipython/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,13 @@
"""
from __future__ import annotations

import inspect
import logging
import sys
import typing
import warnings
from pathlib import Path
from typing import Any
from typing import Any, Callable

import IPython
from IPython.core.magic import needs_local_scope, register_line_magic
Expand All @@ -19,11 +21,13 @@
from kedro.framework.cli.utils import ENV_HELP, _split_params
from kedro.framework.project import (
LOGGING, # noqa: F401
_ProjectPipelines,
configure_project,
pipelines,
)
from kedro.framework.session import KedroSession
from kedro.framework.startup import _is_project, bootstrap_project
from kedro.pipeline.node import Node

logger = logging.getLogger(__name__)

Expand All @@ -36,6 +40,9 @@ def load_ipython_extension(ipython: Any) -> None:
See https://ipython.readthedocs.io/en/stable/config/extensions/index.html
"""
ipython.register_magic_function(magic_reload_kedro, magic_name="reload_kedro")
logger.info("Registered line magic 'reload_kedro'")
ipython.register_magic_function(magic_load_node, magic_name="load_node")
logger.info("Registered line magic 'load_node'")

if _find_kedro_project(Path.cwd()) is None:
logger.warning(
Expand Down Expand Up @@ -178,3 +185,134 @@ def _find_kedro_project(current_dir: Path) -> Any: # pragma: no cover
current_dir = current_dir.parent

return None


@typing.no_type_check
@magic_arguments()
@argument(
"node",
type=str,
help=("Name of the Node."),
nargs="?",
default=None,
)
def magic_load_node(node: str) -> None:
"""The line magic %load_node <node_name>
Currently it only supports Jupyter Notebook (>7.0) and Jupyter Lab. This line magic
will generate code in multiple cells to load datasets from `DataCatalog`, import
relevant functions and modules, node function definition and a function call.
"""
cells = _load_node(node, pipelines)
from ipylab import JupyterFrontEnd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying this on IPython (after naming my node, see above) and got an error:

ModuleNotFoundError: No module named 'ipylab'

I would find it confusing if kedro.ipython didn't work on IPython, just because of the name. On the other hand, I don't think it makes sense for this to be in IPython at all, because the point is to bring the source code and be able to edit it on a cell - something that can't happen on a REPL.

If this is to be a Jupyter-only extension, I think we should have it somewhere else. Maybe kedro.jupyter? Or even a separate kedro-jupyter package?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, there are two different things here:

  • I did not reinstall kedro so I didn't have ipylab, but I see it's there when the user does pip install kedro[jupyter] 👍🏽
  • After installing it, I did %load_node clean_statuses on IPython and nothing happened:
image

My node function is not in dir():

In [4]: "clean_statuses" in dir()
Out[4]: False

so I stand by my previous point: since this is pointless in IPython, I'd move it somewhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3536

There is a separate ticket for IPython support. Originally I thought the same, but it appears that some people, particular anti-notebook user would still like to have IPython support. It will work slightly different because IPython doesn't have the concept of "cells". It will still be useful if data can get pre-load (or just the code to load the corresponding data from catalog). If you have comments about the design of the feature, feel free to drop comments over there instead.

Re: kedro.ipython vs kedro.jupyter - I agree it's a bit confusing, but I think it's okay to leave it in kedro.ipython since ultimately I plan to add support for IPython. In additionally, we almost use IPython/Jupyter interchangeably because the feature parity is always the same, until now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I don't know how feasible this is, but if it's possible to show a warning like "IPython is not yet supported", that would be golden.


app = JupyterFrontEnd()

def _create_cell_with_text(text: str) -> None:
# Noted this only works with Notebook >7.0 or Jupyter Lab. It doesn't work with
# VS Code Notebook due to imcompatible backends.
app.commands.execute("notebook:insert-cell-below")
app.commands.execute("notebook:replace-selection", {"text": text})

for cell in cells:
_create_cell_with_text(cell)


def _load_node(node_name: str, pipelines: _ProjectPipelines) -> list[str]:
"""Prepare the code to load dataset from catalog, import statements and function body.

Args:
node_name (str): The name of the node.

Returns:
list[str]: A list of string which is the generated code, each string represent a
notebook cell.
"""
warnings.warn(
"This is an experimental feature, only Jupyter Notebook (>7.0) & Jupyter Lab "
"are supported. If you encounter unexpected behaviour or would like to suggest "
"feature enhancements, add it under this github issue https://github.com/kedro-org/kedro/issues/3580"
)
node = _find_node(node_name, pipelines)
node_func = node.func

node_inputs = _prepare_node_inputs(node)
imports = _prepare_imports(node_func)
function_definition = _prepare_function_body(node_func)
function_call = _prepare_function_call(node_func)

cells: list[str] = []
cells.append(node_inputs)
cells.append(imports)
cells.append(function_definition)
cells.append(function_call)
return cells


def _find_node(node_name: str, pipelines: _ProjectPipelines) -> Node:
for pipeline in pipelines.values():
try:
found_node: Node = pipeline.filter(node_names=[node_name]).nodes[0]
return found_node
except ValueError:
continue
# If reached the node was not found in the project
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want to support func_name at some point too? 🤔

Copy link
Contributor Author

@noklam noklam Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was briefly discussed in last TD. @AhdraMeraliQB originally have an implementation.

It runs into issue if function is used twice, nodes are unique but functions do not have to. Fundamentally we want to fix this with the default node name but this was decided to be fixed later.

@AhdraMeraliQB I can't find your original PR/commit, feel free to supplement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really good reasoning, perhaps we could support full classpath

f"Node with name='{node_name}' not found in any pipelines. Remember to specify the node name, not the node function."
)


def _prepare_imports(node_func: Callable) -> str:
"""Prepare the import statements for loading a node."""
python_file = inspect.getsourcefile(node_func)
logger.info(f"Loading node definition from {python_file}")

# Confirm source file was found
if python_file:
import_statement = []
with open(python_file) as file:
# Parse any line start with from or import statement
for line in file.readlines():
if line.startswith("from") or line.startswith("import"):
import_statement.append(line.strip())

clean_imports = "\n".join(import_statement).strip()
return clean_imports
else:
raise FileNotFoundError(f"Could not find {node_func.__name__}")


def _prepare_node_inputs(node: Node) -> str:
node_func = node.func
signature = inspect.signature(node_func)

node_inputs = node.inputs
func_params = list(signature.parameters)

statements = [
"# Prepare necessary inputs for debugging",
"# All debugging inputs must be defined in your project catalog",
]

for node_input, func_param in zip(node_inputs, func_params):
statements.append(f'{func_param} = catalog.load("{node_input}")')

input_statements = "\n".join(statements)
return input_statements


def _prepare_function_body(func: Callable) -> str:
source_lines, _ = inspect.getsourcelines(func)
body = "".join(source_lines)
return body


def _prepare_function_call(node_func: Callable) -> str:
"""Prepare the text for the function call."""
func_name = node_func.__name__
signature = inspect.signature(node_func)
func_params = list(signature.parameters)

# Construct the statement of func_name(a=1,b=2,c=3)
func_args = ", ".join(func_params)
body = f"""{func_name}({func_args})"""
return body
9 changes: 7 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ dependencies = [
"more_itertools>=8.14.0",
"omegaconf>=2.1.1",
"parse>=1.19.0",
"pluggy>=1.0",
"pluggy>=1.0, <1.4.0",
merelcht marked this conversation as resolved.
Show resolved Hide resolved
"pre-commit-hooks",
"PyYAML>=4.2,<7.0",
"rich>=12.0,<14.0",
Expand Down Expand Up @@ -56,6 +56,7 @@ test = [
"behave==1.2.6",
"coverage[toml]",
"import-linter==2.0",
"ipylab>=1.0.0",
"ipython>=7.31.1, <8.0; python_version < '3.8'",
"ipython~=8.10; python_version >= '3.8'",
"jupyterlab_server>=2.11.1",
Expand Down Expand Up @@ -102,7 +103,11 @@ docs = [
"sphinx-favicon",
"sphinxcontrib-youtube",
]
all = [ "kedro[test,docs]" ]
jupyter = [
"ipylab>=1.0.0",
"notebook>=7.0.0" # requires the new share backend of notebook and labs"
]
all = [ "kedro[test,docs,jupyter]" ]

[project.urls]
Homepage = "https://kedro.org"
Expand Down
2 changes: 1 addition & 1 deletion tests/framework/project/test_pipeline_registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def test_pipelines_without_configure_project_is_empty(
mock_package_name_with_pipelines_file,
):
# Reimport `pipelines` from `kedro.framework.project` to ensure that
# it was not set by a pior call to the `configure_project` function.
# it was not set by a prior call to the `configure_project` function.
del sys.modules["kedro.framework.project"]
from kedro.framework.project import pipelines

Expand Down
Loading
Loading