Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create line/cell magic to debug a node in notebook workflow #2009

Closed
merelcht opened this issue Nov 9, 2022 · 3 comments · Fixed by #3510
Closed

Create line/cell magic to debug a node in notebook workflow #2009

merelcht opened this issue Nov 9, 2022 · 3 comments · Fixed by #3510
Assignees
Labels
Component: Jupyter/IPython Issue/PR relevant for Jupyter Notebooks, IPython sessions and the interactive workflow in Kedro

Comments

@merelcht
Copy link
Member

merelcht commented Nov 9, 2022

Description

Implement an ipython line magic that can load a node, the necessary datasets and copy the node code into a notebook cell.

  • No variables should be available except all datasets/parameters consumed by the node with the data already loaded
  • If possible, the variable names should be the same as the node's function parameters

Context

See #1832 for context

Implementation

  • get_ipython().set_next_input(s) might be useful for copying the node code into the notebook
@noklam
Copy link
Contributor

noklam commented Jan 15, 2024

To make this easier - can we assume data is persisted so we have to think about how to populate data that exist in memory only?

@noklam
Copy link
Contributor

noklam commented Jan 15, 2024

                    INFO     Completed 8 out of 11 tasks                                                                                                                                                                                     sequential_runner.py:90
                    INFO     Loading data from model_input_table@pandas (ParquetDataset)...                                                                                                                                                      data_catalog.py:483
[01/15/24 15:20:57] INFO     Loading data from params:model_options (MemoryDataset)...                                                                                                                                                           data_catalog.py:483
                    INFO     Running node: split_data_node: split_data([model_input_table@pandas;params:model_options]) -> [X_train;X_test;y_train;y_test]                                                                                               node.py:340
                    INFO     Saving data to X_train (MemoryDataset)...                                                                                                                                                                           data_catalog.py:525
                    INFO     Saving data to X_test (MemoryDataset)...                                                                                                                                                                            data_catalog.py:525
                    INFO     Saving data to y_train (MemoryDataset)...                                                                                                                                                                           data_catalog.py:525
                    INFO     Saving data to y_test (MemoryDataset)...                                                                                                                                                                            data_catalog.py:525
                    INFO     Completed 9 out of 11 tasks                                                                                                                                                                                     sequential_runner.py:90
                    INFO     Loading data from X_train (MemoryDataset)...                                                                                                                                                                        data_catalog.py:483
                    INFO     Loading data from y_train (MemoryDataset)...                                                                                                                                                                        data_catalog.py:483
                    INFO     Running node: train_model_node: train_model([X_train;y_train]) -> [regressor]                                                                                                                                               node.py:340
                    INFO     Saving data to regressor (PickleDataset)...                                                                                                                                                                         data_catalog.py:525
                    INFO     Completed 10 out of 11 tasks                                                                                                                                                                                    sequential_runner.py:90
                    INFO     Loading data from regressor (PickleDataset)...                                                                                                                                                                      data_catalog.py:483
                    INFO     Loading data from X_test (MemoryDataset)...                                                                                                                                                                         data_catalog.py:483
                    INFO     Loading data from y_test (MemoryDataset)...                                                                                                                                                                         data_catalog.py:483
                    INFO     Running node: evaluate_model_node: evaluate_model([regressor;X_test;y_test]) -> [metrics]                                                                                                                                   node.py:340
                    ERROR    Node evaluate_model_node: evaluate_model() ->  failed with error:                                                                                                                                                           node.py:365
                             No active exception to reraise                                                                                                                                                                                                         
                    WARNING  There are 1 nodes that have not run.                                                                                                                                                                                      runner.py:218
                             You can resume the pipeline run from the nearest nodes with persisted inputs by adding the following argument to your previous command:                                                                                                
                               --from-nodes ""                                                                                                                                                                                                                      
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/Nok_Lam_Chan/miniconda3/envs/kedro/bin/kedro:8 in <module>                                │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/dev/kedro/kedro/framework/cli/cli.py:199 in main                             │
│                                                                                                  │
│   196 │   """                                                                                    │
│   197 │   _init_plugins()                                                                        │
│   198 │   cli_collection = KedroCLI(project_path=Path.cwd())                                     │
│ ❱ 199 │   cli_collection()                                                                       │
│   200                                                                                            │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/miniconda3/envs/kedro/lib/python3.10/site-packages/click/core.py:1157 in     │
│ __call__                                                                                         │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/dev/kedro/kedro/framework/cli/cli.py:127 in main                             │
│                                                                                                  │
│   124 │   │   )                                                                                  │
│   125 │   │                                                                                      │
│   126 │   │   try:                                                                               │
│ ❱ 127 │   │   │   super().main(                                                                  │
│   128 │   │   │   │   args=args,                                                                 │
│   129 │   │   │   │   prog_name=prog_name,                                                       │
│   130 │   │   │   │   complete_var=complete_var,                                                 │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/miniconda3/envs/kedro/lib/python3.10/site-packages/click/core.py:1078 in     │
│ main                                                                                             │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/miniconda3/envs/kedro/lib/python3.10/site-packages/click/core.py:1688 in     │
│ invoke                                                                                           │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/miniconda3/envs/kedro/lib/python3.10/site-packages/click/core.py:1434 in     │
│ invoke                                                                                           │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/miniconda3/envs/kedro/lib/python3.10/site-packages/click/core.py:783 in      │
│ invoke                                                                                           │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/dev/kedro/kedro/framework/cli/project.py:226 in run                          │
│                                                                                                  │
│   223 │   with KedroSession.create(                                                              │
│   224 │   │   env=env, conf_source=conf_source, extra_params=params                              │
│   225 │   ) as session:                                                                          │
│ ❱ 226 │   │   session.run(                                                                       │
│   227 │   │   │   tags=tuple_tags,                                                               │
│   228 │   │   │   runner=runner_obj(is_async=is_async),                                          │
│   229 │   │   │   node_names=tuple_node_names,                                                   │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/dev/kedro/kedro/framework/session/session.py:393 in run                      │
│                                                                                                  │
│   390 │   │   )                                                                                  │
│   391 │   │                                                                                      │
│   392 │   │   try:                                                                               │
│ ❱ 393 │   │   │   run_result = runner.run(                                                       │
│   394 │   │   │   │   filtered_pipeline, catalog, hook_manager, session_id                       │
│   395 │   │   │   )                                                                              │
│   396 │   │   │   self._run_called = True                                                        │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/dev/kedro/kedro/runner/runner.py:117 in run                                  │
│                                                                                                  │
│   114 │   │   │   self._logger.info(                                                             │
│   115 │   │   │   │   "Asynchronous mode is enabled for loading and saving data"                 │
│   116 │   │   │   )                                                                              │
│ ❱ 117 │   │   self._run(pipeline, catalog, hook_or_null_manager, session_id)  # type: ignore[a   │
│   118 │   │                                                                                      │
│   119 │   │   self._logger.info("Pipeline execution completed successfully.")                    │
│   120                                                                                            │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/dev/kedro/kedro/runner/sequential_runner.py:75 in _run                       │
│                                                                                                  │
│   72 │   │                                                                                       │
│   73 │   │   for exec_index, node in enumerate(nodes):                                           │
│   74 │   │   │   try:                                                                            │
│ ❱ 75 │   │   │   │   run_node(node, catalog, hook_manager, self._is_async, session_id)           │
│   76 │   │   │   │   done_nodes.add(node)                                                        │
│   77 │   │   │   except Exception:                                                               │
│   78 │   │   │   │   self._suggest_resume_scenario(pipeline, done_nodes, catalog)                │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/dev/kedro/kedro/runner/runner.py:332 in run_node                             │
│                                                                                                  │
│   329 │   if is_async:                                                                           │
│   330 │   │   node = _run_node_async(node, catalog, hook_manager, session_id)                    │
│   331 │   else:                                                                                  │
│ ❱ 332 │   │   node = _run_node_sequential(node, catalog, hook_manager, session_id)               │
│   333 │                                                                                          │
│   334 │   for name in node.confirms:                                                             │
│   335 │   │   catalog.confirm(name)                                                              │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/dev/kedro/kedro/runner/runner.py:425 in _run_node_sequential                 │
│                                                                                                  │
│   422 │   )                                                                                      │
│   423 │   inputs.update(additional_inputs)                                                       │
│   424 │                                                                                          │
│ ❱ 425 │   outputs = _call_node_run(                                                              │
│   426 │   │   node, catalog, inputs, is_async, hook_manager, session_id=session_id               │
│   427 │   )                                                                                      │
│   428                                                                                            │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/dev/kedro/kedro/runner/runner.py:391 in _call_node_run                       │
│                                                                                                  │
│   388 │   │   │   is_async=is_async,                                                             │
│   389 │   │   │   session_id=session_id,                                                         │
│   390 │   │   )                                                                                  │
│ ❱ 391 │   │   raise exc                                                                          │
│   392 │   hook_manager.hook.after_node_run(                                                      │
│   393 │   │   node=node,                                                                         │
│   394 │   │   catalog=catalog,                                                                   │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/dev/kedro/kedro/runner/runner.py:381 in _call_node_run                       │
│                                                                                                  │
│   378 │   session_id: str | None = None,                                                         │
│   379 ) -> dict[str, Any]:                                                                       │
│   380 │   try:                                                                                   │
│ ❱ 381 │   │   outputs = node.run(inputs)                                                         │
│   382 │   except Exception as exc:                                                               │
│   383 │   │   hook_manager.hook.on_node_error(                                                   │
│   384 │   │   │   error=exc,                                                                     │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/dev/kedro/kedro/pipeline/node.py:371 in run                                  │
│                                                                                                  │
│   368 │   │   │   │   str(exc),                                                                  │
│   369 │   │   │   │   extra={"markup": True},                                                    │
│   370 │   │   │   )                                                                              │
│ ❱ 371 │   │   │   raise exc                                                                      │
│   372 │                                                                                          │
│   373 │   def _run_with_no_inputs(self, inputs: dict[str, Any]) -> Any:                          │
│   374 │   │   if inputs:                                                                         │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/dev/kedro/kedro/pipeline/node.py:357 in run                                  │
│                                                                                                  │
│   354 │   │   │   elif isinstance(self._inputs, str):                                            │
│   355 │   │   │   │   outputs = self._run_with_one_input(inputs, self._inputs)                   │
│   356 │   │   │   elif isinstance(self._inputs, list):                                           │
│ ❱ 357 │   │   │   │   outputs = self._run_with_list(inputs, self._inputs)                        │
│   358 │   │   │   elif isinstance(self._inputs, dict):                                           │
│   359 │   │   │   │   outputs = self._run_with_dict(inputs, self._inputs)                        │
│   360                                                                                            │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/dev/kedro/kedro/pipeline/node.py:402 in _run_with_list                       │
│                                                                                                  │
│   399 │   │   │   │   f"{sorted(inputs.keys())}."                                                │
│   400 │   │   │   )                                                                              │
│   401 │   │   # Ensure the function gets the inputs in the correct order                         │
│ ❱ 402 │   │   return self._func(*(inputs[item] for item in node_inputs))                         │
│   403 │                                                                                          │
│   404 │   def _run_with_dict(                                                                    │
│   405 │   │   self, inputs: dict[str, Any], node_inputs: dict[str, str]                          │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/dev/kedro/tmp/debug-kedro/src/debug_kedro/pipelines/data_science/nodes.py:54 │
│ in evaluate_model                                                                                │
│                                                                                                  │
│   51 │   """                                                                                     │
│   52 │   y_pred = regressor.predict(X_test)                                                      │
│   53 │   score = r2_score(y_test, y_pred)                                                        │
│ ❱ 54 │   raise                                                                                   │
│   55 │   mae = mean_absolute_error(y_test, y_pred)                                               │
│   56 │   me = max_error(y_test, y_pred)                                                          │
│   57 │   logger = logging.getLogger(__name__)                                                    │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: No active exception to reraise

There are also challenges for discovery, since the trackback is often very long and users neglect the resume suggestion from the warnings .

@noklam noklam changed the title Create line magic to debug a node in notebook workflow Create line/cell magic to debug a node in notebook workflow Jan 15, 2024
@noklam
Copy link
Contributor

noklam commented Jan 15, 2024

Todo:

  1. Find the node code / copy&paste
  2. Prepare node inputs
  3. Prepare the Import statement

If we have time:

  1. Optimise for the format
  2. See if we can hide the catalog load code
  3. Handle MemoryDataset

@noklam noklam moved this from To Do to In Progress in Kedro Framework Jan 15, 2024
@noklam noklam linked a pull request Jan 15, 2024 that will close this issue
7 tasks
@AhdraMeraliQB AhdraMeraliQB moved this from In Progress to In Review in Kedro Framework Jan 23, 2024
@AhdraMeraliQB AhdraMeraliQB moved this from In Review to In Progress in Kedro Framework Jan 23, 2024
@astrojuanlu astrojuanlu moved this from In Progress to In Review in Kedro Framework Feb 2, 2024
@github-project-automation github-project-automation bot moved this from In Review to Done in Kedro Framework Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Jupyter/IPython Issue/PR relevant for Jupyter Notebooks, IPython sessions and the interactive workflow in Kedro
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants