Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lay foundation to pass notebook names to kernel at startup. #656

Closed
wants to merge 1 commit into from

Conversation

Carreau
Copy link
Member

@Carreau Carreau commented Jun 7, 2021

This has been a controversial topic from some time:

jupyter/notebook#1000
https://forums.databricks.com/questions/21390/is-there-any-way-to-get-the-current-notebook-name.html
https://stackoverflow.com/questions/12544056/how-do-i-get-the-current-ipython-jupyter-notebook-name
https://ask.sagemath.org/question/36873/access-notebook-filename-from-jupyter-with-sagemath-kernel/

This is also sometime critical to linter, and tab completion to know
current name.

Of course current answer is that the question is ill-defined,
there might not be a file associated with the current kernel, there
might be multiple files, files might not be on the same system, it could
change through the execution and many other gotchas.

This suggest to add an JPY_ASSOCIATED_FILE env variable which is not
too visible, but give an escape hatch which should mostly be correct
unless the notebook is renamed or kernel attached to a new one.

Do do so this handles the new associated_file parameters in a few
function of the kernel manager. On jupyter_server this one line change
make the notebook name available using typical local installs:

--- a/jupyter_server/services/sessions/sessionmanager.py
+++ b/jupyter_server/services/sessions/sessionmanager.py
@@ -96,7 +96,12 @@ class SessionManager(LoggingConfigurable):
         """Start a new kernel for a given session."""
         # allow contents manager to specify kernels cwd
         kernel_path = self.contents_manager.get_kernel_path(path=path)
-        kernel_id = await self.kernel_manager.start_kernel(path=kernel_path, kernel_name=kernel_name)
+
+        kernel_id = await self.kernel_manager.start_kernel(
+            path=kernel_path, kernel_name=kernel_name, associated_file=name
+        )
         return kernel_id

Of course only launchers that will pass forward this value will allow
the env variable to be set.

I'm thinking that various kernels may use this and expose it in
different ways. like notebook_name if it ends with .ipynb in
ipykernel.

@Carreau
Copy link
Member Author

Carreau commented Jun 7, 2021

@goanpeca pointed out to me that it could be a good idea to include the associated file in each execute request metadata in order to take into account multiple connected clients and renaming, though that's a more invasive change.

@davidbrochart
Copy link
Member

As you said, it only makes sense in specific situations, the most obvious one being when the kernel is used by a single notebook. But in this case, it looks to me like it would be better handled by e.g. the Jupyter Server, which is well aware of the notebook's name and its associated kernel. It could execute some code in the kernel at startup and each time the notebook is renamed, something like:

__notebook_name__ = "my_notebook.ipynb"

@SylvainCorlay
Copy link
Member

Note that there is some notion of "temporary" filename for the debug protocol.

@kevin-bates
Copy link
Member

This sounds like another item for the kernel parameterization cart, although one that is more of a system-owned parameter.

I agree that using an environment variable is the way to go, but I'd like to better understand the use-case here. The referenced notebook issue talks of a high availability scenario, while others talk about creating files with differing suffixes, where the notebook file's basename is the common association.

  • What kind of use-case is this driving at?
  • Can the "association" be something that is pure metadata that survives rename operations, etc. (as suggested here)? A GUID isn't user-friendly and if there needs to be a "visual association", not appropriate.
  • What happens to this use case when the associated file is renamed or multiple files are associated? Will it lead to errors for faulty behavior?

The fact that this value can change after a kernel has started is troubling. At best, this is merely a hint, so if this were to be supported, I would recommend this environment variable be named to convey that aspect: JPY_ASSOCIATED_FILE_HINT or JPY_FILE_HINT or JPY_INITIAL_FILE.

I like the idea of providing system-owned environment variables and the JPY_ prefix seems good (and has precedence). (For reference, Enterprise Gateway flows any env prefixed with KERNEL_ to the kernel.)

If we were to proceed with this, I would suggest introducing the environment variable in the session manager directly and utilizing the existing env keyword argument (dict). This should flow through the kernel management layer as-is. Applications relying on core projects like nbclient could then benefit from this (assuming nbclient is updated to set the env as well).

@Carreau
Copy link
Member Author

Carreau commented Jun 15, 2021

As you said, it only makes sense in specific situations, the most obvious one being when the kernel is used by a single notebook. But in this case, it looks to me like it would be better handled by e.g. the Jupyter Server, which is well aware of the notebook's name and its associated kernel. It could execute some code in the kernel at startup and each time the notebook is renamed, something like:

__notebook_name__ = "my_notebook.ipynb"

While that is true for the Python kernel this is not a cross language solution, the advantage of using an env variable is that it's kernel agnostic and does not risk breaking anything,

I agree that __notebook_name__, would be great but imho this should be handled at the protocol level, for example in metadata in the execute request and let the kernel do "the right things" depending on the language.

This sounds like another item for the kernel parameterization cart, although one that is more of a system-owned parameter.

I agree that using an environment variable is the way to go, but I'd like to better understand the use-case here. The referenced notebook issue talks of a high availability scenario, while others talk about creating files with differing suffixes, where the notebook file's basename is the common association.
...

I'm not sure this is parameterization, but I see where you are coming from. I think that having this as an env variable is a cheap way to at least get started. Having the name can be used to better reference error messages (instead of having <IPython-input-12> or similar in traceback. I'm not in generala huge fan of it, but as you saw in above issues these is a growing number of people asking for it, and workaround

The fact that this value can change after a kernel has started is troubling. At best, this is merely a hint, so if this were to be supported, I would recommend this environment variable be named to convey that aspect: JPY_ASSOCIATED_FILE_HINT or JPY_FILE_HINT or JPY_INITIAL_FILE.

I like the idea of providing system-owned environment variables and the JPY_ prefix seems good (and has precedence). (For reference, Enterprise Gateway flows any env prefixed with KERNEL_ to the kernel.)

I don't have particular feeling for how to name the env variable, HINT in fine, JPY_ prefix sounds great. I want to avoid NOTEBOOK as it's increasingly note a notebook.

My goal here is to have a stopgap – the env variable will in most of the case be the right value, if we call it INITIAL as you pointed that's enough to say that it might not be correct. But if we get that in it works for all the kernel, and then we can discuss what to add in execute request metadata , for example file type (notebook, plain, other), and the filename or full path.

Those metadata can be interpreted by each kernel to provide language specific way to access those values, maybe __notebook_name__, or __notebook_path__ either as str, or objects... and then there is no issues to make it change at each execution requests, or have those objects have events if the notebook name changes. I want to also keep in mind that resources might not be on a filesystem so I would be fine with this value being s3://...., or anything else that the kernel can understand.

if that can be harmonized with the debugger I'm all for it.

Now I believe the first two questions we want to address are:

  • Do we agree that accessing in some way the current file path/name/url/uuid is something we want to allow ?
  • if so: Do we want to start working on a complete solution right now which is correct in all the cases or do it in steps with maybe something which is correct in most of the cases and then refine ?

@MSeal
Copy link
Contributor

MSeal commented Jun 16, 2021

Do we agree that accessing in some way the current file path/name/url/uuid is something we want to allow ?

I think it's asked for often enough that having a path for this that's reasonably safe guarded makes sense.

if so: Do we want to start working on a complete solution right now which is correct in all the cases or do it in steps with maybe something which is correct in most of the cases and then refine ?

I think if the solution is scoped to "When executing in situation A, then B will be available to the runtime" is sufficient. Solving perfectly for all cases is going to a major headache with little upside. Good documentation and naming convention here would probably make it clear around scope of use imho.

I don't have particular feeling for how to name the env variable, HINT in fine, JPY_ prefix sounds great. I want to avoid NOTEBOOK as it's increasingly note a notebook.

What about instead talking about it as kernel session naming? When you launch a kernel the kernel can accept a session name that's then set to an env variable (or whatever is decided). e.g. KERNEL_SESSION_NAME="my_notebook.ipynb", or or any string including hashes / UUID strings. This would leave the solution somewhat generic but we could then have the jupyter server state that whenever it launches a notebook file it follows a specific convention for setting the session name. e.g. server always sets it to the initial notebook file name, but doesn't change that name after kernel launch. Unless I am missing some context of the problem I believe this would also get around the conversation about every client execute request sending a file name. The field isn't a client field, it's a launcher field for naming the overall repl session for it's given lifecycle independent of users of that session.

@kevin-bates
Copy link
Member

I think couching this in terms of a session name is a great idea - thanks @MSeal. Such an approach removes all sorts of implications that other names might suggest.

For my own understanding, is this value just the notebook filename or an actual path? This comment seems to imply that, at least initially, the kernel can expect the value of this env to be a tangible resource (and not just a contextual string).

I want to also keep in mind that resources might not be on a filesystem so I would be fine with this value being s3://...., or anything else that the kernel can understand.

@Carreau
Copy link
Member Author

Carreau commented Jun 22, 2021

My reasoning for a path instead of a filename, is that sufficiently enough people cd around in their notebook that the filename might not be sufficient. But I agree that if we go with a session name this ambiguity is resolved as we don't guarantee it's the curent file.

@Carreau Carreau force-pushed the current-file-name branch from c8dae7c to 4afe680 Compare June 22, 2021 10:15
@Carreau
Copy link
Member Author

Carreau commented Jun 23, 2021

As a note, one of the use case this was privately requested to me was reporting which file are containing deprecated functions when using kernels scheduled on remote machines. There are hooks installed on a distributed system that log when deprecated functions are used and send user regular reports; problem is that the kernels do not know the file names and the logs are a bit useless as the only thing thay can mention is <IPython-input-XXXX>

@blink1073
Copy link
Contributor

Do we want to target this for 7.0?

@kevin-bates
Copy link
Member

If I'm understanding things correctly, and assuming we went with an env similar to KERNEL_SESSION_NAME, I think this boils down to the various applications setting that value into the env keyword argument of their corresponding "start kernel" request and it should just flow. That is, I don't think there'd be any work here, and if there was, I suspect it nothing API-breaking from a release versioning perspective.

Carreau added a commit to Carreau/jupyter_notebook that referenced this pull request Sep 14, 2021
Unfortunately this does requires upstream, because even if most
functions takes **kw, those in the end get passed to Popen() which does
not like unkonwn keywords
@Carreau
Copy link
Member Author

Carreau commented Sep 14, 2021

I've reworked this on top of the 7.x branch.

I'm a bit worried of a couple of things.

  1. you need to thread the session name down to the providers via kwargs.
  2. the current code is not super clear as to which kw goes where, and seem to rely on kwargs mutability being visible to caller.

I've sent jupyter/notebook#6180 to show how this can be used.

it's also unclear where the name of the env variable should be documented. I can document in it in Jupyter_client, though I guess this is more interesting for end-users so maybe somewhere else ?

This has been a controversial topic from some time:

jupyter/notebook#1000
https://forums.databricks.com/questions/21390/is-there-any-way-to-get-the-current-notebook-name.html
https://stackoverflow.com/questions/12544056/how-do-i-get-the-current-ipython-jupyter-notebook-name
https://ask.sagemath.org/question/36873/access-notebook-filename-from-jupyter-with-sagemath-kernel/

This is also sometime critical to linter, and tab completion to know
current name.

Of course current answer is that the question is ill-defined,
there might not be a file associated with the current kernel, there
might be multiple files, files might not be on the same system, it could
change through the execution and many other gotchas.

This suggest to add an JPY_KERNEL_SESSION_NAME env variable which is not
too visible, but give an escape hatch which should mostly be correct
unless the notebook is renamed or kernel attached to a new one.

Do do so this handles the new associated_file parameters in a few
function of the kernel manager. On jupyter_server this one line change
make the notebook name available using typical local installs:

```diff
diff --git a/notebook/services/sessions/sessionmanager.py b/notebook/services/sessions/sessionmanager.py
index 92b2a7345..f7b4011ce 100644
--- a/notebook/services/sessions/sessionmanager.py
+++ b/notebook/services/sessions/sessionmanager.py
@@ -108,7 +108,9 @@ class SessionManager(LoggingConfigurable):
         # allow contents manager to specify kernels cwd
         kernel_path = self.contents_manager.get_kernel_path(path=path)
         kernel_id = yield maybe_future(
-            self.kernel_manager.start_kernel(path=kernel_path, kernel_name=kernel_name)
+            self.kernel_manager.start_kernel(
+                path=kernel_path, kernel_name=kernel_name, session_name=path
+            )
         )
         # py2-compat
         raise gen.Return(kernel_id)
```diff

Of course only launchers that will pass forward this value will allow
the env variable to be set.

I'm thinking that various kernels may use this and expose it in
different ways. like __notebook_name__ if it ends with `.ipynb` in
ipykernel.
@kevin-bates
Copy link
Member

kevin-bates commented Sep 15, 2021

I agree that plumbing this as a specific (and new) keyword argument is painful. I guess I figured this would simply be incorporated in the existing env:dict keyword argument on the start_kernel request. I believe that would just flow. Has that approach been attempted?

it's also unclear where the name of the env variable should be documented. I can document in it in Jupyter_client, though I guess this is more interesting for end-users so maybe somewhere else ?

I think jupyter_client is the correct location for this. When/if we ever parameterize kernel launches, this would/should be viewed as a system parameter/property, and those should probably be documented at the lowest level (IMHO).

Carreau added a commit to Carreau/jupyter_notebook that referenced this pull request Jan 25, 2022
There are many use case where users want to know the current notebook
name/path. This help by adding a session identifier (to not really say
this is the current notebook name), and by default make it the full path
to the notebook document that created the session.

This will of course not work if the notebook get renamed, but we can
tackle this later.

See also jupyter/jupyter_client#656,
jupyter#6180. It will need to be ported
to jupyter_server as well.
Carreau added a commit to Carreau/jupyter_server that referenced this pull request Jan 25, 2022
There are many use case where users want to know the current notebook
name/path. This help by adding a session identifier (to not really say
this is the current notebook name), and by default make it the full path
to the notebook document that created the session.

This will of course not work if the notebook get renamed, but we can
tackle this later.

See also jupyter/jupyter_client#656,
jupyter/notebook#6180. It will need to be ported
to jupyter_server as well.

This is the mirror commit of
jupyter/notebook#6279 on main notebook.
Carreau added a commit to Carreau/jupyter_server that referenced this pull request Jan 26, 2022
There are many use case where users want to know the current notebook
name/path. This help by adding a session identifier (to not really say
this is the current notebook name), and by default make it the full path
to the notebook document that created the session.

This will of course not work if the notebook get renamed, but we can
tackle this later.

See also jupyter/jupyter_client#656,
jupyter/notebook#6180. It will need to be ported
to jupyter_server as well.

This is the mirror commit of
jupyter/notebook#6279 on main notebook.
Carreau added a commit to Carreau/jupyter_notebook that referenced this pull request Jan 26, 2022
There are many use case where users want to know the current notebook
name/path. This help by adding a session identifier (to not really say
this is the current notebook name), and by default make it the full path
to the notebook document that created the session.

This will of course not work if the notebook get renamed, but we can
tackle this later.

See also jupyter/jupyter_client#656,
jupyter#6180. It will need to be ported
to jupyter_server as well.
@@ -151,6 +151,9 @@ def pre_start_kernel(
constructor_kwargs = {}
if self.kernel_spec_manager:
constructor_kwargs["kernel_spec_manager"] = self.kernel_spec_manager

if 'session_name' in kwargs:
constructor_kwargs['session_name'] = kwargs.copy().pop('session_name')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just kwargs['session_name'] ?

@arogozhnikov
Copy link

arogozhnikov commented May 3, 2022

From my experience:

  • Common usecases are management of resources associated with notebook / auto-saving artifacts like plots in "right place" / sending notifications / saving current notebook or stripped version of current notebook.
  • Case which I am currently exploring is custom pycharm-like remote kernels, which assosicated with notebook name. So even if laptop was rebooted, pycharm still can connect to the right kernel for the notebook on remote server. This requires kernel to know name of a notebook.

Already-proposed-and-discussed implementation with just passing properly-named envvar (containing notebook path, not name) suffice for above usecases.

I'd be very supportive in getting it incorporated.

@arogozhnikov
Copy link

@Carreau @kevin-bates do you think you converged on how this should work or are there any alternatives you still consider?

@kevin-bates
Copy link
Member

@arogozhnikov - thanks for the ping.

Reading back through this, I think we're good with using an env named JPY_SESSION_NAME (although if not KERNEL_SESSION_NAME, the gateway integration will need to be adjusted to flow JPY_-prefixed variables as well).

The one thing I'd rather see is to introduce the env kwarg when starting the kernel relative to the session. This way, nothing in jupyter_client needs to change and it's purely a jupyter_server play.

@arogozhnikov
Copy link

arogozhnikov commented May 11, 2022

ok, @Carreau

  1. Do you preferJPY_SESSION_NAME or KERNEL_SESSION_NAME ?
  2. should it belong here or jupyter-server as Kevin suggested in previous comment?

vidartf pushed a commit to jupyter-server/jupyter_server that referenced this pull request May 27, 2022
There are many use case where users want to know the current notebook
name/path. This help by adding a session identifier (to not really say
this is the current notebook name), and by default make it the full path
to the notebook document that created the session.

This will of course not work if the notebook get renamed, but we can
tackle this later.

See also jupyter/jupyter_client#656,
jupyter/notebook#6180. It will need to be ported
to jupyter_server as well.

This is the mirror commit of
jupyter/notebook#6279 on main notebook.
vidartf pushed a commit to vidartf/jupyter_server that referenced this pull request Jul 13, 2022
There are many use case where users want to know the current notebook
name/path. This help by adding a session identifier (to not really say
this is the current notebook name), and by default make it the full path
to the notebook document that created the session.

This will of course not work if the notebook get renamed, but we can
tackle this later.

See also jupyter/jupyter_client#656,
jupyter/notebook#6180. It will need to be ported
to jupyter_server as well.

This is the mirror commit of
jupyter/notebook#6279 on main notebook.
blink1073 pushed a commit to jupyter-server/jupyter_server that referenced this pull request Jul 13, 2022
)

* Inject session identifier into environment variable.

There are many use case where users want to know the current notebook
name/path. This help by adding a session identifier (to not really say
this is the current notebook name), and by default make it the full path
to the notebook document that created the session.

This will of course not work if the notebook get renamed, but we can
tackle this later.

See also jupyter/jupyter_client#656,
jupyter/notebook#6180. It will need to be ported
to jupyter_server as well.

This is the mirror commit of
jupyter/notebook#6279 on main notebook.

* Make start_kernel env extend os.environ (#859)

* Make start_kernel env extend os.environ

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Co-authored-by: Matthias Bussonnier <bussonniermatthias@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@emsi
Copy link

emsi commented Nov 22, 2022

As this is still open I'll ask the question: why I do see something like:

JPY_SESSION_NAME=git/notebooks/1b7dcc20-a919-1d16-9efe-13f1d420ee93

with the uuid rather than actual file name of the notebook?

I was going to use the current notebook name as a way to add it to artifacts that are preserve along with other outputs but without the file name it's of no use and there seems to be no other sane, portable way to do so (https://stackoverflow.com/questions/12544056/how-do-i-get-the-current-ipython-jupyter-notebook-name) :(

p.s. I'm using Jupyter notebooks under JupyterLab.

@kevin-bates
Copy link
Member

Hi @emsi. Yeah, this was realized the other day as well and an issue opened here: jupyter-server/jupyter_server#1059. I would suggest we keep the discussion there.

Seems like this issue should be closed since the desired behavior was implemented (in jupyter_server), just that something caused the desired behavior to be changed. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants