[Parent] - Notebook/IPython Debug line magic `%load_node` feature discussion thread #3535

noklam · 2024-01-19T17:14:24Z

Description

This ticket is created to track and discuss the development of this feature.

What does this feature do?

In short, it tries to recreate the context of a pipeline error by eliminating manual steps as much as possible to provide better UX. It loads up the data, imports (and) function body in notebook cells, allowing user to quickly start debugging, plotting the data etc.

Tracked Tickets:

create line magic to debug a node in notebook workflow #3510 is the initial PR for the implementation
Refactor jupyter magic unit tests with conftest #3585
Document how users can debug Kedro inside notebooks using the %load_node magic #2011
Support kedro ipython for the debugging line magic #3536
e2e-test for %load_node magic #3528 - Add e2e test, blocked by an issue that running notebook programatically yield different result.

Comments

I have collected some comments from a few users and these are some quotes categorised by themes.

IPython

But I agree, ipython version, since you're likely in your IDE, no need to bring in the function body. You can open the right file and dev there
HI. very nice! is it something that could work in kedro ipython also direclty ?

Support more platform

support platform:
- Mainly notbeook & Lab
- VSCode Notebook (not yet)
- Databricks (not tested)

Two-way sync

Would be handy to debug, but you should do this in the IDE instead of a notebook. If you change the code, you still have to manually sync the changes to VCS
Look awesome
@Nok
. Would there be a path back from notebook into the project’s files (as it was with kedro jupyter convert )?

Recursive definition of function body

Also how would this work if your node function imports other functions which may import other functions? You only show where depth is 1?

Edge Case improvement

Handle imports that are not from top of the script, i.e. functions define in the same file as the node function which lives in locals().
Intermediate MemoryDataset

The text was updated successfully, but these errors were encountered:

datajoely · 2024-01-19T17:26:50Z

As stretch goal a runner like --runner=SequentialDebug where the last failed node spins up a notebook with the function body and data available to work out why it failed 🚀

astrojuanlu · 2024-01-23T16:15:26Z

As stretch goal a runner like --runner=SequentialDebug where the last failed node spins up a notebook with the function body and data available to work out why it failed 🚀

This is actually a very cool idea - alternatively, instead of a notebook it could be a (pdb) session

datajoely · 2024-01-23T18:33:34Z

ipdb!

AhdraMeraliQB · 2024-01-31T13:03:40Z

Context for Tech Design - 31/1/24

Feature motivation - #1832

To summarise, debugging Kedro projects in notebooks is a painful process. One of the suggestions to make this easier for users included the proposal of a %load_node line magic, that would load a node from a user's Kedro project pipeline into notebook cells for easier debugging. The current MVP for this feature lives in #3510.

Steps to test the MVP

Clone the PR an install your local Kedro version
Navigate to an existing test project, or a directory where you wish to create one and follow kedro new. Make sure to select example code if not using a starter.
Navigate into your project root
Run kedro jupyter lab
In the notebook, test the line magic with %load_node <node-name>

The current implementation

The line magic currently will load any imports in the source file of the node, the node's inputs (assumed to be declared in the catalog), and the body of the node function.

This serves as an excellent starting point, the implementation successfully resolves:

Nested functions
One line functions
Messy/scattered import statements

However, it does not account/allow for:

References to objects out of node function scope but within the source-file
MemoryDataset inputs
kedro ipython and other platforms
Unnamed nodes (node name is not required when defining a node)

This brings us to the points of our discussion.

What makes a complete feature?

The focus of this discussion is to iron out the expectations for the first version of this feature, and possible future extensions. In this, there are four primary themes to consider. In this discussion, we must determine if a particular use case should be addressed now, later, much later, or never. Additionally, if something is to be done later, much later, or never, what will we do in the interim?

Consider: Extending the current implementation

Several edge cases are not handled by the line magic:

1. If a node input is a MemoryDataset

The current implementation assumes all inputs exist in the catalogue, and running the cell that loads the node inputs will result in an error if the input is not defined in the catalog. We have several options when addressing this:

a. Ignore it - expect users to understand the error.
Pro: No technical requirement. Con: Invites friction and confusion

b. Add a comment to the imports block explaining the above.
Pro: No technical requirement. Con: Requires users to edit their project catalog to continue debugging with the feature.

c. Resolve it with the line magic.
Pro: Frictionless for users. Con: Technically challenging - requires it's own investigation into how this could be implemented.

2. Node references an object defined within the source-file but outside of node scope

Including only imports and the node function means we don't have access to any helper functions defined inside the file but outside of the node function. Any references to these when the node function is loaded will result in an error.

a. Ignore it - expect users to understand the error.
Pro: No technical requirement. Con: Invites friction and confusion

b. Add from <node.source.module> import *
Pro: Frictionless for users. Con: Technically challenging - requires it's own investigation into how this could be implemented.

3. Return statements make unrunnable cells

When loading a node function, we omit the defining line. Including return statements then results in a syntax error when the cell is run.

a. Ignore it - expect users to understand the error.
Pro: No technical requirement. Con: Invites friction and confusion

b. Comment out the return statement
Pro: Simple Con: Introduces syntax errors if returns are within conditions, see:

x = yyy
if <some_flag):
  # return x
else:
  # return None

c. Comment out return statement and introduce print statement as replacement.
Pro: Simple to introduce - exists on #3568, eliminates errors. Con: Modifies node, user will have to fix this if copying the code back into their project files

Consider: Supporting Other Platforms

#3510 Only supports jupyter lab/notebook. #3536 introduces support for kedro ipython (Support across notebook/lab most IDE and platforms except databricks). What should we do about:

Databricks notebooks
VS Code

And if we choose not to support any platforms in the current iteration, how do we inform users? (Multiple selections allowed)

Docs
Release Notes
Warn when trying to use the line magic in unsupported platform

Consider: Supporting the node-finding process by making changes to the default node name

As mentioned briefly above, node names are not required when defining a node, and are often omitted by users. However, the current implementation of the line magic searches for the specified by filtering the pipeline for matching node names, and so if the name is not defined, the %load_node will not be able to find the node.

We could address this by:

Mentioning this in the ValueNotFoundError
Pro: Trivial to add. Makes user aware of problem. Con: Requires user to modify project files to continue debugging in notebook
Search all node.func.__name__ if node name is not found - see %load_node experiments #3568
Pro: Solves problem. Easy to implement. Con: Ugly complexity. Behaviour diverges from kedro run --nodes/--to-nodes/--from-nodes
Modify default node name - see Spike: proposal for better default node names #3575
Pro: Solves issue across Kedro, not just for line magic. Con: technically breaking. Can we ship these changes together?

[Extra] Consider: The future of notebook debugging

There were some suggestions for the future of this feature, for completeness I am listing them here

2-way conversion (%load_node and %save_node ?) - do we bring back kedro jupyter convert? xref
A SequentialDebug runner - xref
Populating problematic functions when error results from function call - xref

References: main PR: #3510, original discussion: #1832, experiments: #3568, related: #3575, #3536

astrojuanlu · 2024-01-31T14:56:26Z

My 2 cents:

Extending the current implementation

2c. Bringing the full contents of the corresponding file, including the function definitions themselves
3d. Do not unpack the function body, and retain the function definition (see also point above)

Supporting Other Platforms

Frankly I'd ignore kedro ipython, kedro jupyter and shift the focus to "make this work where %load_ext kedro.ipython works". This doesn't work for pure REPLs, and also it's unclear how other non-Jupyter notebooks would behave (they're all mostly based on IPython, but then frontend code might differ significantly). But the point is that I wouldn't consider kedro ipython/jupyter a platform because of #2777

The future of notebook debugging

I would say let's not bring back kedro jupyter convert, it didn't work back then and there's no reason it will work now.

Having a %save_node is promising, but maybe we should broaden our perspective and have a %save_code that is more generic? Note that this is a big, unsolved problem in the Jupyter <-> IDE interaction as a whole.

deepyaman · 2024-01-31T15:09:37Z

Extending the current implementation

Nit on 3c: should be a display() call?
@astrojuanlu's suggestion 3d also makes sense to me

Supporting Other Platforms

For a lot of these edge cases, as well as for other platforms, I think I'm OK with having some friction (and perhaps marking the feature as experimental for now). I think there's a risk of going too deep in this feature, without understanding how many people actually use it/how they use it.

AhdraMeraliQB · 2024-01-31T16:34:40Z

Tech Design Summary

Link to Miro board

Now (part of #3510):

Create parent issue to track "later" actions and to be referenced in universal warning (below): %load_node line magic improvements #3580
Warn users when preparing node inputs that inputs need to be explicitly declared in the catalog
~~Handle return statements by commenting out and substituting ~~print~~ display()~~ Include function definition and function call with node inputs - see comment below for why
Add a line when node not found about node name vs node function
Add a platform agnostic warning that this is an experimental feature and is only supports jupyter and ipython

Later

from node.source.file import * (for helper functions)
Support other platforms
Resolving memory datasets (can be ignored fully)
Update default node name - will have to go in next major release, needs to be unique. More discussion required.

datajoely · 2024-01-31T17:40:32Z

Sad I couldn't join the design meeting but this looks great - thanks @noklam for initial innovation which now feels like a tangible reality and also @AhdraMeraliQB for making the decisions made easy to grok!

astrojuanlu · 2024-01-31T18:26:01Z

One salient point I just realized about unpacking the function body and transforming return statements into print or display - we would be at risk of changing the logic of the code.

A silly example:

def foo():
    if True:
        return 1
    subprocess.check_output("rm -rf /")  # but `True is True` always right??

so foo() will absolutely always return 1.

However:

if True:
    display(1)
subprocess.check_output("rm -rf /")  # but `True is True` always right??

will nuke your filesystem.

Could we reconsider my proposal above

3d. Do not unpack the function body, and retain the function definition (see also point above)

? I think we never got to discuss it during the call (my fault, the conversation drifted exactly at that point and I forgot to bring it up)

noklam · 2024-03-05T13:04:58Z

Closing this as this is released in 0.19.3. Follow up improvement discussion will be document here: #3580

noklam self-assigned this Jan 19, 2024

merelcht added this to the Improve the usability and debugging experience for Jupyter notebooks milestone Jan 22, 2024

astrojuanlu mentioned this issue Jan 31, 2024

create line magic to debug a node in notebook workflow #3510

Merged

7 tasks

github-actions bot mentioned this issue Feb 1, 2024

Monthly issue metrics report #3582

Closed

noklam changed the title ~~[Parent] - Jupyter Debug line magic thread~~ [Parent] - Jupyter Debug line magic %load_node thread and bug reports Feb 1, 2024

noklam changed the title ~~[Parent] - Jupyter Debug line magic %load_node thread and bug reports~~ [Parent] - Jupyter Debug line magic %load_node feature discussion thread Feb 1, 2024

noklam changed the title ~~[Parent] - Jupyter Debug line magic %load_node feature discussion thread~~ [Parent] - Notebook/IPython Debug line magic %load_node feature discussion thread Feb 19, 2024

This comment was marked as outdated.

Sign in to view

This comment was marked as off-topic.

Sign in to view

noklam closed this as completed Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Parent] - Notebook/IPython Debug line magic `%load_node` feature discussion thread #3535

[Parent] - Notebook/IPython Debug line magic `%load_node` feature discussion thread #3535

noklam commented Jan 19, 2024 •

edited by merelcht

Loading

datajoely commented Jan 19, 2024

astrojuanlu commented Jan 23, 2024

datajoely commented Jan 23, 2024

AhdraMeraliQB commented Jan 31, 2024 •

edited by noklam

Loading

astrojuanlu commented Jan 31, 2024 •

edited

Loading

deepyaman commented Jan 31, 2024

AhdraMeraliQB commented Jan 31, 2024 •

edited by merelcht

Loading

datajoely commented Jan 31, 2024

astrojuanlu commented Jan 31, 2024

This comment was marked as outdated.

This comment was marked as off-topic.

noklam commented Mar 5, 2024

[Parent] - Notebook/IPython Debug line magic %load_node feature discussion thread #3535

[Parent] - Notebook/IPython Debug line magic %load_node feature discussion thread #3535

Comments

noklam commented Jan 19, 2024 • edited by merelcht Loading

Description

What does this feature do?

Comments

IPython

Support more platform

Two-way sync

Recursive definition of function body

Edge Case improvement

datajoely commented Jan 19, 2024

astrojuanlu commented Jan 23, 2024

datajoely commented Jan 23, 2024

AhdraMeraliQB commented Jan 31, 2024 • edited by noklam Loading

Context for Tech Design - 31/1/24

Feature motivation - #1832

Steps to test the MVP

The current implementation

What makes a complete feature?

Consider: Extending the current implementation

1. If a node input is a MemoryDataset

2. Node references an object defined within the source-file but outside of node scope

3. Return statements make unrunnable cells

Consider: Supporting Other Platforms

Consider: Supporting the node-finding process by making changes to the default node name

[Extra] Consider: The future of notebook debugging

astrojuanlu commented Jan 31, 2024 • edited Loading

Extending the current implementation

Supporting Other Platforms

The future of notebook debugging

deepyaman commented Jan 31, 2024

Extending the current implementation

Supporting Other Platforms

AhdraMeraliQB commented Jan 31, 2024 • edited by merelcht Loading

Tech Design Summary

Now (part of #3510):

Next:

Later

datajoely commented Jan 31, 2024

astrojuanlu commented Jan 31, 2024

This comment was marked as outdated.

This comment was marked as off-topic.

noklam commented Mar 5, 2024

[Parent] - Notebook/IPython Debug line magic `%load_node` feature discussion thread #3535

[Parent] - Notebook/IPython Debug line magic `%load_node` feature discussion thread #3535

noklam commented Jan 19, 2024 •

edited by merelcht

Loading

AhdraMeraliQB commented Jan 31, 2024 •

edited by noklam

Loading

astrojuanlu commented Jan 31, 2024 •

edited

Loading

AhdraMeraliQB commented Jan 31, 2024 •

edited by merelcht

Loading