Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

%load_node line magic improvements #3580

Closed
2 of 5 tasks
AhdraMeraliQB opened this issue Jan 31, 2024 · 7 comments
Closed
2 of 5 tasks

%load_node line magic improvements #3580

AhdraMeraliQB opened this issue Jan 31, 2024 · 7 comments

Comments

@AhdraMeraliQB
Copy link
Contributor

AhdraMeraliQB commented Jan 31, 2024

Description

In #3510 we introduce a new line magic, aimed at improving the process of debugging Kedro projects in notebooks. This feature is experimental - this issue should be used to add suggestions for extending and improving it. Add a suggestion in the comments, or if already mentioned, bump its priority with a 👍 .

(edited by Nok)

(Copy from previous issue)

Two-way sync

Would be handy to debug, but you should do this in the IDE instead of a notebook. If you change the code, you still have to manually sync the changes to VCS
Look awesome
@Nok
. Would there be a path back from notebook into the project’s files (as it was with kedro jupyter convert )?

Recursive definition of function body

Also how would this work if your node function imports other functions which may import other functions? You only show where depth is 1?

@AhdraMeraliQB
Copy link
Contributor Author

Add support for other platforms, currently only supports jupyter lab/notebook (#3510) and ipython (#3536). Consider including:

  • Databricks
  • VSCode

@AhdraMeraliQB
Copy link
Contributor Author

AhdraMeraliQB commented Jan 31, 2024

Add import statement to import * from node source file - allows nodes with helper functions to be runnable in notebooks without having to go back to source files and copy paste the code over

Edited by Nok below:
if we can use insppect.getsourcefile, we can directly import the module with importlib, then we can use from <module> import * to make sure everything is loaded.
https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly

@AhdraMeraliQB
Copy link
Contributor Author

Resolve MemoryDatasets so that users don't have to add them to catalog to access them as node inputs

@noklam
Copy link
Contributor

noklam commented Feb 15, 2024

if we can use insppect.getsourcefile, we can directly import the module with importlib, then we can use from <module> import * to make sure everything is loaded.
https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly
Cc @DimedS

, it works well on my side! This new functionality is fantastic! It would be helpful to include instructions in the documentation for this new command on how to use it in different environments. Additionally, perhaps we should consider adding relative functions when loading specific nodes. Currently, in spaceflights-pandas, a node is loaded but cannot be run because it returns a NameError: name '_is_true' is not defined error, indicating that the function _is_true was not loaded.
#3604 (review)

@noklam
Copy link
Contributor

noklam commented Feb 15, 2024

I find a couple of things to improve when I try to help an user to debug on 0.18.x

  • When running %load_node on IPython, if the code block is long enough, the top of the code block will "disappear" unless you hit the arrow. At first I thought the function is broken. This is confusing because most likely the function will not run (assuming there are some error and user want to debug), they will not see the variable declaration in the terminal data_a = catalog.load("xxx")
  • It seems broken if the function is from "wheel" or somehow wrapped, cannot reproduce this yet but the error is "FileNotFoundError: [Errno 2] No such file or directory: '<boltons.funcutils.FunctionBuilder-115>'"
  • Maybe prepare some standalone script so that we can test 0.18.x
  • The function call always have the full function signature, which is problematic if some of them is optional argument.

i.e.
def node_func(a,b, c=None):
return ...

It should be valid to have a node node(node_func, inputs=["data_a","data_b"], ...). Currently the result code block is

node_func(a,b,c)

This will cause error because c is not defined, this will work as long as we delete c from the result code block.

@noklam
Copy link
Contributor

noklam commented Feb 19, 2024

Better handle of *args and **kwargs, currently the %load_node have a simple logic to map node's input to function parameters.

The idea is to use inspect.Signature.bind and inspect.Parameters to identify the special arguments (VAR_POSITIONAL)

For example:

def dummy(a,b,c, *args, **kwargs):
    ...

node(dummy, ["data_1", "data_2", "data_3", "dummy1","dummy2","dummy3"], ...)

should translate to

a = catalog.load("data_1")
b = catalog.load("data_2")
c = catalog.load("data_3")
dummy1 = catalog.load("dummy1")
dummy2 = catalog.load("dummy2")
dummy3 = catalog.load("dummy3")
args = [dummy1, dummy2, dummy3] # Noted here the name of the "dummy_x" variable are arbitrary
dummy(a, b, c, *args)

@noklam
Copy link
Contributor

noklam commented Feb 19, 2024

Consider adding before_node_run and after_node_run. If user mutates the inputs with hooks, the current logic fails to do so. For example, some users have a custom ConfigLoader that only instantiate object with before_node_run, so catalog.load only return a dict object but not the instantiated class.

@noklam noklam changed the title [PARENT] %load_node line magic improvements %load_node line magic improvements Mar 28, 2024
@kedro-org kedro-org locked and limited conversation to collaborators Mar 28, 2024
@merelcht merelcht converted this issue into discussion #3754 Mar 28, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants