Skip to content

Commit

Permalink
Streamline debugging documentation (#3608)
Browse files Browse the repository at this point in the history
* Add first draft

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Remoe outdated kedro jupyter convert docs

Signed-off-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com>

* Suggestion: Review edits

Signed-off-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com>

* Update FAQs

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Edit jupyter ipython debug section

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Change link to section that does not exist anymore

Signed-off-by: L. R. Couto <laurarccouto@gmail.com>

* Change link to section that does not exist anymore

Signed-off-by: L. R. Couto <laurarccouto@gmail.com>

* Change wording and formatting

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Lint

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Update docs/source/notebooks_and_ipython/kedro_and_notebooks.md

Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com>

* Update docs/source/notebooks_and_ipython/kedro_and_notebooks.md

Co-authored-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com>
Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com>

* Changes to the wording, remove unnecessary section

Signed-off-by: lrcouto <laurarccouto@gmail.com>

* Move docs on debugging with hooks to hooks section

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add links to main debugging page

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Make notebook debugging an independent section

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update link in FAQs

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Apply suggestions from code review - adjust wording

Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Signed-off-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com>

* Capitalise Hooks

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Reorder links on debugging page

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Use markdown admonitions

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add short explanations to debugging page

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

---------

Signed-off-by: lrcouto <laurarccouto@gmail.com>
Signed-off-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: L. R. Couto <laurarccouto@gmail.com>
Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com>
Co-authored-by: lrcouto <laurarccouto@gmail.com>
Co-authored-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com>
Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com>
  • Loading branch information
4 people authored Feb 12, 2024
1 parent f54c6fb commit 80ad182
Show file tree
Hide file tree
Showing 4 changed files with 90 additions and 87 deletions.
85 changes: 7 additions & 78 deletions docs/source/development/debugging.md
Original file line number Diff line number Diff line change
@@ -1,83 +1,12 @@
# Debugging

## Introduction
:::note

If you're running your Kedro pipeline from the CLI or you can't/don't want to run Kedro from within your IDE debugging framework, it can be hard to debug your Kedro pipeline or nodes. This is particularly frustrating because:
Our debugging documentation has moved. Please see our existing guides:

* If you have long running nodes or pipelines, inserting `print` statements and running them multiple times quickly becomes time-consuming.
* Debugging nodes outside the `run` session isn't very helpful because getting access to the local scope within the `node` can be hard, especially if you're dealing with large data or memory datasets, where you need to chain a few nodes together or re-run your pipeline to produce the data for debugging purposes.
:::

This guide provides examples on [how to instantiate a post-mortem debugging session](https://docs.python.org/3/library/pdb.html#pdb.post_mortem) with [`pdb`](https://docs.python.org/3/library/pdb.html) using [Kedro Hooks](../hooks/introduction.md) when an uncaught error occurs during a pipeline run. [ipdb](https://pypi.org/project/ipdb/) could be integrated in the same manner.

For guides on how to set up debugging with IDEs, please visit the [guide for debugging in VSCode](./set_up_vscode.md#debugging) and the [guide for debugging in PyCharm](./set_up_pycharm.md#debugging).

## Debugging a node

To start a debugging session when an uncaught error is raised within your `node`, implement the `on_node_error` [Hook specification](/api/kedro.framework.hooks):

```python
import pdb
import sys
import traceback

from kedro.framework.hooks import hook_impl


class PDBNodeDebugHook:
"""A hook class for creating a post mortem debugging with the PDB debugger
whenever an error is triggered within a node. The local scope from when the
exception occured is available within this debugging session.
"""

@hook_impl
def on_node_error(self):
_, _, traceback_object = sys.exc_info()

# Print the traceback information for debugging ease
traceback.print_tb(traceback_object)

# Drop you into a post mortem debugging session
pdb.post_mortem(traceback_object)
```

You can then register this `PDBNodeDebugHook` in your project's `settings.py`:

```python
HOOKS = (PDBNodeDebugHook(),)
```

## Debugging a pipeline

To start a debugging session when an uncaught error is raised within your `pipeline`, implement the `on_pipeline_error` [Hook specification](/api/kedro.framework.hooks):

```python
import pdb
import sys
import traceback

from kedro.framework.hooks import hook_impl


class PDBPipelineDebugHook:
"""A hook class for creating a post mortem debugging with the PDB debugger
whenever an error is triggered within a pipeline. The local scope from when the
exception occured is available within this debugging session.
"""

@hook_impl
def on_pipeline_error(self):
# We don't need the actual exception since it is within this stack frame
_, _, traceback_object = sys.exc_info()

# Print the traceback information for debugging ease
traceback.print_tb(traceback_object)

# Drop you into a post mortem debugging session
pdb.post_mortem(traceback_object)
```

You can then register this `PDBPipelineDebugHook` in your project's `settings.py`:

```python
HOOKS = (PDBPipelineDebugHook(),)
```
* [Debugging a Kedro project within a notebook](../notebooks_and_ipython/kedro_and_notebooks.md#debugging-a-kedro-project-within-a-notebook) for information on how to launch an interactive debugger in your notebook.
* [Debugging in VSCode](./set_up_vscode.md#debugging) for information on how to set up VSCode's built-in debugger.
* [Debugging in PyCharm](./set_up_pycharm.md#debugging) for information on using PyCharm's debugging tool.
* [Debugging in the CLI with Kedro Hooks](../hooks/common_use_cases.md#use-hooks-to-debug-your-pipeline) for information on how to automatically launch an interactive debugger in the CLI when an error occurs in your pipeline run.
2 changes: 1 addition & 1 deletion docs/source/faq/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ This is a growing set of technical FAQs. The [product FAQs on the Kedro website]

## Working with Jupyter

* [How can I debug a Kedro project in a Jupyter notebook](../notebooks_and_ipython/kedro_and_notebooks.md#debugging-with-debug-and-pdb)?
* [How can I debug a Kedro project in a Jupyter notebook](../notebooks_and_ipython/kedro_and_notebooks.md#debugging-a-kedro-project-within-a-notebook)?
* [How do I connect a Kedro project kernel to other Jupyter clients like JupyterLab](../notebooks_and_ipython/kedro_and_notebooks.md#ipython-jupyterlab-and-other-jupyter-clients)?

## Kedro project development
Expand Down
76 changes: 75 additions & 1 deletion docs/source/hooks/common_use_cases.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ HOOKS = (AzureSecretsHook(),)
Note: `DefaultAzureCredential()` is Azure's recommended approach to authorise access to data in your storage accounts. For more information, consult the [documentation about how to authenticate to Azure and authorize access to blob data](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python).
```

## Use a Hook to read `metadata` from `DataCatalog`
## Use Hooks to read `metadata` from `DataCatalog`
Use the `after_catalog_created` Hook to access `metadata` to extend Kedro.

```python
Expand All @@ -214,3 +214,77 @@ class MetadataHook:
for dataset_name, dataset in catalog.datasets.__dict__.items():
print(f"{dataset_name} metadata: \n {str(dataset.metadata)}")
```

## Use Hooks to debug your pipeline
You can use Hooks to launch a [post-mortem debugging session](https://docs.python.org/3/library/pdb.html#pdb.post_mortem) with [`pdb`](https://docs.python.org/3/library/pdb.html) using [Kedro Hooks](../hooks/introduction.md) when an error occurs during a pipeline run. [ipdb](https://pypi.org/project/ipdb/) could be integrated in the same manner.

### Debugging a node

To start a debugging session when an error is raised within your `node` that is not caught, implement the `on_node_error` [Hook specification](/api/kedro.framework.hooks):

```python
import pdb
import sys
import traceback

from kedro.framework.hooks import hook_impl


class PDBNodeDebugHook:
"""A hook class for creating a post mortem debugging with the PDB debugger
whenever an error is triggered within a node. The local scope from when the
exception occured is available within this debugging session.
"""

@hook_impl
def on_node_error(self):
_, _, traceback_object = sys.exc_info()

# Print the traceback information for debugging ease
traceback.print_tb(traceback_object)

# Drop you into a post mortem debugging session
pdb.post_mortem(traceback_object)
```

You can then register this `PDBNodeDebugHook` in your project's `settings.py`:

```python
HOOKS = (PDBNodeDebugHook(),)
```

### Debugging a pipeline

To start a debugging session when an error is raised within your `pipeline` that is not caught, implement the `on_pipeline_error` [Hook specification](/api/kedro.framework.hooks):

```python
import pdb
import sys
import traceback

from kedro.framework.hooks import hook_impl


class PDBPipelineDebugHook:
"""A hook class for creating a post mortem debugging with the PDB debugger
whenever an error is triggered within a pipeline. The local scope from when the
exception occured is available within this debugging session.
"""

@hook_impl
def on_pipeline_error(self):
# We don't need the actual exception since it is within this stack frame
_, _, traceback_object = sys.exc_info()

# Print the traceback information for debugging ease
traceback.print_tb(traceback_object)

# Drop you into a post mortem debugging session
pdb.post_mortem(traceback_object)
```

You can then register this `PDBPipelineDebugHook` in your project's `settings.py`:

```python
HOOKS = (PDBPipelineDebugHook(),)
```
14 changes: 7 additions & 7 deletions docs/source/notebooks_and_ipython/kedro_and_notebooks.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,14 +209,8 @@ You don't need to restart the kernel for the `catalog`, `context`, `pipelines` a

For more details, run `%reload_kedro?`.

## Useful to know (for advanced users)
Each Kedro project has its own Jupyter kernel so you can switch between Kedro projects from a single Jupyter instance by selecting the appropriate kernel.

To ensure that a Jupyter kernel always points to the correct Python executable, if one already exists with the same name `kedro_<package_name>`, then it is replaced.

You can use the `jupyter kernelspec` set of commands to manage your Jupyter kernels. For example, to remove a kernel, run `jupyter kernelspec remove <kernel_name>`.

### Debugging with %debug and %pdb
## Debugging a Kedro project within a notebook

You can use the `%debug` [line magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-debug) to launch an interactive debugger in your Jupyter notebook. Declare it before a single-line statement to step through the execution in debug mode. You can use the argument `--breakpoint` or `-b` to provide a breakpoint.
The follow sequence occurs when `%debug` runs immediately after an error occurs:
Expand Down Expand Up @@ -264,6 +258,12 @@ Some examples of the possible commands that can be used to interact with the ipd

For more information, use the `help` command in the debugger, or take at the [ipdb repository](https://github.com/gotcha/ipdb) for guidance.

## Useful to know (for advanced users)
Each Kedro project has its own Jupyter kernel so you can switch between Kedro projects from a single Jupyter instance by selecting the appropriate kernel.

To ensure that a Jupyter kernel always points to the correct Python executable, if one already exists with the same name `kedro_<package_name>`, then it is replaced.

You can use the `jupyter kernelspec` set of commands to manage your Jupyter kernels. For example, to remove a kernel, run `jupyter kernelspec remove <kernel_name>`.

### Managed services

Expand Down

0 comments on commit 80ad182

Please sign in to comment.