Lay foundation to pass notebook names to kernel at startup. #656

Carreau · 2021-06-07T14:34:19Z

This has been a controversial topic from some time:

jupyter/notebook#1000
https://forums.databricks.com/questions/21390/is-there-any-way-to-get-the-current-notebook-name.html
https://stackoverflow.com/questions/12544056/how-do-i-get-the-current-ipython-jupyter-notebook-name
https://ask.sagemath.org/question/36873/access-notebook-filename-from-jupyter-with-sagemath-kernel/

This is also sometime critical to linter, and tab completion to know
current name.

Of course current answer is that the question is ill-defined,
there might not be a file associated with the current kernel, there
might be multiple files, files might not be on the same system, it could
change through the execution and many other gotchas.

This suggest to add an JPY_ASSOCIATED_FILE env variable which is not
too visible, but give an escape hatch which should mostly be correct
unless the notebook is renamed or kernel attached to a new one.

Do do so this handles the new associated_file parameters in a few
function of the kernel manager. On jupyter_server this one line change
make the notebook name available using typical local installs:

--- a/jupyter_server/services/sessions/sessionmanager.py
+++ b/jupyter_server/services/sessions/sessionmanager.py
@@ -96,7 +96,12 @@ class SessionManager(LoggingConfigurable):
         """Start a new kernel for a given session."""
         # allow contents manager to specify kernels cwd
         kernel_path = self.contents_manager.get_kernel_path(path=path)
-        kernel_id = await self.kernel_manager.start_kernel(path=kernel_path, kernel_name=kernel_name)
+
+        kernel_id = await self.kernel_manager.start_kernel(
+            path=kernel_path, kernel_name=kernel_name, associated_file=name
+        )
         return kernel_id

Of course only launchers that will pass forward this value will allow
the env variable to be set.

I'm thinking that various kernels may use this and expose it in
different ways. like notebook_name if it ends with .ipynb in
ipykernel.

Carreau · 2021-06-07T15:13:09Z

@goanpeca pointed out to me that it could be a good idea to include the associated file in each execute request metadata in order to take into account multiple connected clients and renaming, though that's a more invasive change.

davidbrochart · 2021-06-07T17:47:43Z

As you said, it only makes sense in specific situations, the most obvious one being when the kernel is used by a single notebook. But in this case, it looks to me like it would be better handled by e.g. the Jupyter Server, which is well aware of the notebook's name and its associated kernel. It could execute some code in the kernel at startup and each time the notebook is renamed, something like:

__notebook_name__ = "my_notebook.ipynb"

SylvainCorlay · 2021-06-10T09:59:07Z

Note that there is some notion of "temporary" filename for the debug protocol.

kevin-bates · 2021-06-10T15:29:25Z

This sounds like another item for the kernel parameterization cart, although one that is more of a system-owned parameter.

I agree that using an environment variable is the way to go, but I'd like to better understand the use-case here. The referenced notebook issue talks of a high availability scenario, while others talk about creating files with differing suffixes, where the notebook file's basename is the common association.

What kind of use-case is this driving at?
Can the "association" be something that is pure metadata that survives rename operations, etc. (as suggested here)? A GUID isn't user-friendly and if there needs to be a "visual association", not appropriate.
What happens to this use case when the associated file is renamed or multiple files are associated? Will it lead to errors for faulty behavior?

The fact that this value can change after a kernel has started is troubling. At best, this is merely a hint, so if this were to be supported, I would recommend this environment variable be named to convey that aspect: JPY_ASSOCIATED_FILE_HINT or JPY_FILE_HINT or JPY_INITIAL_FILE.

I like the idea of providing system-owned environment variables and the JPY_ prefix seems good (and has precedence). (For reference, Enterprise Gateway flows any env prefixed with KERNEL_ to the kernel.)

If we were to proceed with this, I would suggest introducing the environment variable in the session manager directly and utilizing the existing env keyword argument (dict). This should flow through the kernel management layer as-is. Applications relying on core projects like nbclient could then benefit from this (assuming nbclient is updated to set the env as well).

Carreau · 2021-06-15T16:30:59Z

As you said, it only makes sense in specific situations, the most obvious one being when the kernel is used by a single notebook. But in this case, it looks to me like it would be better handled by e.g. the Jupyter Server, which is well aware of the notebook's name and its associated kernel. It could execute some code in the kernel at startup and each time the notebook is renamed, something like:
__notebook_name__ = "my_notebook.ipynb"

While that is true for the Python kernel this is not a cross language solution, the advantage of using an env variable is that it's kernel agnostic and does not risk breaking anything,

I agree that __notebook_name__, would be great but imho this should be handled at the protocol level, for example in metadata in the execute request and let the kernel do "the right things" depending on the language.

This sounds like another item for the kernel parameterization cart, although one that is more of a system-owned parameter.

I agree that using an environment variable is the way to go, but I'd like to better understand the use-case here. The referenced notebook issue talks of a high availability scenario, while others talk about creating files with differing suffixes, where the notebook file's basename is the common association.
...

I'm not sure this is parameterization, but I see where you are coming from. I think that having this as an env variable is a cheap way to at least get started. Having the name can be used to better reference error messages (instead of having <IPython-input-12> or similar in traceback. I'm not in generala huge fan of it, but as you saw in above issues these is a growing number of people asking for it, and workaround

The fact that this value can change after a kernel has started is troubling. At best, this is merely a hint, so if this were to be supported, I would recommend this environment variable be named to convey that aspect: JPY_ASSOCIATED_FILE_HINT or JPY_FILE_HINT or JPY_INITIAL_FILE.

I like the idea of providing system-owned environment variables and the JPY_ prefix seems good (and has precedence). (For reference, Enterprise Gateway flows any env prefixed with KERNEL_ to the kernel.)

I don't have particular feeling for how to name the env variable, HINT in fine, JPY_ prefix sounds great. I want to avoid NOTEBOOK as it's increasingly note a notebook.

My goal here is to have a stopgap – the env variable will in most of the case be the right value, if we call it INITIAL as you pointed that's enough to say that it might not be correct. But if we get that in it works for all the kernel, and then we can discuss what to add in execute request metadata , for example file type (notebook, plain, other), and the filename or full path.

Those metadata can be interpreted by each kernel to provide language specific way to access those values, maybe __notebook_name__, or __notebook_path__ either as str, or objects... and then there is no issues to make it change at each execution requests, or have those objects have events if the notebook name changes. I want to also keep in mind that resources might not be on a filesystem so I would be fine with this value being s3://...., or anything else that the kernel can understand.

if that can be harmonized with the debugger I'm all for it.

Now I believe the first two questions we want to address are:

Do we agree that accessing in some way the current file path/name/url/uuid is something we want to allow ?
if so: Do we want to start working on a complete solution right now which is correct in all the cases or do it in steps with maybe something which is correct in most of the cases and then refine ?

MSeal · 2021-06-16T06:05:49Z

Do we agree that accessing in some way the current file path/name/url/uuid is something we want to allow ?

I think it's asked for often enough that having a path for this that's reasonably safe guarded makes sense.

if so: Do we want to start working on a complete solution right now which is correct in all the cases or do it in steps with maybe something which is correct in most of the cases and then refine ?

I think if the solution is scoped to "When executing in situation A, then B will be available to the runtime" is sufficient. Solving perfectly for all cases is going to a major headache with little upside. Good documentation and naming convention here would probably make it clear around scope of use imho.

I don't have particular feeling for how to name the env variable, HINT in fine, JPY_ prefix sounds great. I want to avoid NOTEBOOK as it's increasingly note a notebook.

What about instead talking about it as kernel session naming? When you launch a kernel the kernel can accept a session name that's then set to an env variable (or whatever is decided). e.g. KERNEL_SESSION_NAME="my_notebook.ipynb", or or any string including hashes / UUID strings. This would leave the solution somewhat generic but we could then have the jupyter server state that whenever it launches a notebook file it follows a specific convention for setting the session name. e.g. server always sets it to the initial notebook file name, but doesn't change that name after kernel launch. Unless I am missing some context of the problem I believe this would also get around the conversation about every client execute request sending a file name. The field isn't a client field, it's a launcher field for naming the overall repl session for it's given lifecycle independent of users of that session.

kevin-bates · 2021-06-16T16:34:42Z

I think couching this in terms of a session name is a great idea - thanks @MSeal. Such an approach removes all sorts of implications that other names might suggest.

For my own understanding, is this value just the notebook filename or an actual path? This comment seems to imply that, at least initially, the kernel can expect the value of this env to be a tangible resource (and not just a contextual string).

I want to also keep in mind that resources might not be on a filesystem so I would be fine with this value being s3://...., or anything else that the kernel can understand.

Carreau · 2021-06-22T10:09:28Z

My reasoning for a path instead of a filename, is that sufficiently enough people cd around in their notebook that the filename might not be sufficient. But I agree that if we go with a session name this ambiguity is resolved as we don't guarantee it's the curent file.

Carreau · 2021-06-23T15:14:03Z

As a note, one of the use case this was privately requested to me was reporting which file are containing deprecated functions when using kernels scheduled on remote machines. There are hooks installed on a distributed system that log when deprecated functions are used and send user regular reports; problem is that the kernels do not know the file names and the logs are a bit useless as the only thing thay can mention is <IPython-input-XXXX>

blink1073 · 2021-07-30T09:36:44Z

Do we want to target this for 7.0?

kevin-bates · 2021-07-30T15:32:04Z

If I'm understanding things correctly, and assuming we went with an env similar to KERNEL_SESSION_NAME, I think this boils down to the various applications setting that value into the env keyword argument of their corresponding "start kernel" request and it should just flow. That is, I don't think there'd be any work here, and if there was, I suspect it nothing API-breaking from a release versioning perspective.

Unfortunately this does requires upstream, because even if most functions takes **kw, those in the end get passed to Popen() which does not like unkonwn keywords

Carreau · 2021-09-14T21:47:34Z

I've reworked this on top of the 7.x branch.

I'm a bit worried of a couple of things.

you need to thread the session name down to the providers via kwargs.
the current code is not super clear as to which kw goes where, and seem to rely on kwargs mutability being visible to caller.

I've sent jupyter/notebook#6180 to show how this can be used.

it's also unclear where the name of the env variable should be documented. I can document in it in Jupyter_client, though I guess this is more interesting for end-users so maybe somewhere else ?

This has been a controversial topic from some time: jupyter/notebook#1000 https://forums.databricks.com/questions/21390/is-there-any-way-to-get-the-current-notebook-name.html https://stackoverflow.com/questions/12544056/how-do-i-get-the-current-ipython-jupyter-notebook-name https://ask.sagemath.org/question/36873/access-notebook-filename-from-jupyter-with-sagemath-kernel/ This is also sometime critical to linter, and tab completion to know current name. Of course current answer is that the question is ill-defined, there might not be a file associated with the current kernel, there might be multiple files, files might not be on the same system, it could change through the execution and many other gotchas. This suggest to add an JPY_KERNEL_SESSION_NAME env variable which is not too visible, but give an escape hatch which should mostly be correct unless the notebook is renamed or kernel attached to a new one. Do do so this handles the new associated_file parameters in a few function of the kernel manager. On jupyter_server this one line change make the notebook name available using typical local installs: ```diff diff --git a/notebook/services/sessions/sessionmanager.py b/notebook/services/sessions/sessionmanager.py index 92b2a7345..f7b4011ce 100644 --- a/notebook/services/sessions/sessionmanager.py +++ b/notebook/services/sessions/sessionmanager.py @@ -108,7 +108,9 @@ class SessionManager(LoggingConfigurable): # allow contents manager to specify kernels cwd kernel_path = self.contents_manager.get_kernel_path(path=path) kernel_id = yield maybe_future( - self.kernel_manager.start_kernel(path=kernel_path, kernel_name=kernel_name) + self.kernel_manager.start_kernel( + path=kernel_path, kernel_name=kernel_name, session_name=path + ) ) # py2-compat raise gen.Return(kernel_id) ```diff Of course only launchers that will pass forward this value will allow the env variable to be set. I'm thinking that various kernels may use this and expose it in different ways. like __notebook_name__ if it ends with `.ipynb` in ipykernel.

kevin-bates · 2021-09-15T00:56:35Z

I agree that plumbing this as a specific (and new) keyword argument is painful. I guess I figured this would simply be incorporated in the existing env:dict keyword argument on the start_kernel request. I believe that would just flow. Has that approach been attempted?

it's also unclear where the name of the env variable should be documented. I can document in it in Jupyter_client, though I guess this is more interesting for end-users so maybe somewhere else ?

I think jupyter_client is the correct location for this. When/if we ever parameterize kernel launches, this would/should be viewed as a system parameter/property, and those should probably be documented at the lowest level (IMHO).

There are many use case where users want to know the current notebook name/path. This help by adding a session identifier (to not really say this is the current notebook name), and by default make it the full path to the notebook document that created the session. This will of course not work if the notebook get renamed, but we can tackle this later. See also jupyter/jupyter_client#656, jupyter#6180. It will need to be ported to jupyter_server as well.

There are many use case where users want to know the current notebook name/path. This help by adding a session identifier (to not really say this is the current notebook name), and by default make it the full path to the notebook document that created the session. This will of course not work if the notebook get renamed, but we can tackle this later. See also jupyter/jupyter_client#656, jupyter/notebook#6180. It will need to be ported to jupyter_server as well. This is the mirror commit of jupyter/notebook#6279 on main notebook.

There are many use case where users want to know the current notebook name/path. This help by adding a session identifier (to not really say this is the current notebook name), and by default make it the full path to the notebook document that created the session. This will of course not work if the notebook get renamed, but we can tackle this later. See also jupyter/jupyter_client#656, jupyter#6180. It will need to be ported to jupyter_server as well.

arogozhnikov · 2022-05-03T19:00:28Z

jupyter_client/multikernelmanager.py

@@ -151,6 +151,9 @@ def pre_start_kernel(
        constructor_kwargs = {}
        if self.kernel_spec_manager:
            constructor_kwargs["kernel_spec_manager"] = self.kernel_spec_manager
+
+        if 'session_name' in kwargs:
+            constructor_kwargs['session_name'] = kwargs.copy().pop('session_name')


Maybe just kwargs['session_name'] ?

arogozhnikov · 2022-05-03T19:18:16Z

From my experience:

Common usecases are management of resources associated with notebook / auto-saving artifacts like plots in "right place" / sending notifications / saving current notebook or stripped version of current notebook.
Case which I am currently exploring is custom pycharm-like remote kernels, which assosicated with notebook name. So even if laptop was rebooted, pycharm still can connect to the right kernel for the notebook on remote server. This requires kernel to know name of a notebook.

Already-proposed-and-discussed implementation with just passing properly-named envvar (containing notebook path, not name) suffice for above usecases.

I'd be very supportive in getting it incorporated.

arogozhnikov · 2022-05-08T03:53:43Z

@Carreau @kevin-bates do you think you converged on how this should work or are there any alternatives you still consider?

kevin-bates · 2022-05-09T15:48:44Z

@arogozhnikov - thanks for the ping.

Reading back through this, I think we're good with using an env named JPY_SESSION_NAME (although if not KERNEL_SESSION_NAME, the gateway integration will need to be adjusted to flow JPY_-prefixed variables as well).

The one thing I'd rather see is to introduce the env kwarg when starting the kernel relative to the session. This way, nothing in jupyter_client needs to change and it's purely a jupyter_server play.

arogozhnikov · 2022-05-11T07:07:17Z

ok, @Carreau

Do you preferJPY_SESSION_NAME or KERNEL_SESSION_NAME ?
should it belong here or jupyter-server as Kevin suggested in previous comment?

There are many use case where users want to know the current notebook name/path. This help by adding a session identifier (to not really say this is the current notebook name), and by default make it the full path to the notebook document that created the session. This will of course not work if the notebook get renamed, but we can tackle this later. See also jupyter/jupyter_client#656, jupyter/notebook#6180. It will need to be ported to jupyter_server as well. This is the mirror commit of jupyter/notebook#6279 on main notebook.

) * Inject session identifier into environment variable. There are many use case where users want to know the current notebook name/path. This help by adding a session identifier (to not really say this is the current notebook name), and by default make it the full path to the notebook document that created the session. This will of course not work if the notebook get renamed, but we can tackle this later. See also jupyter/jupyter_client#656, jupyter/notebook#6180. It will need to be ported to jupyter_server as well. This is the mirror commit of jupyter/notebook#6279 on main notebook. * Make start_kernel env extend os.environ (#859) * Make start_kernel env extend os.environ * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Matthias Bussonnier <bussonniermatthias@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

emsi · 2022-11-22T14:37:25Z

As this is still open I'll ask the question: why I do see something like:

JPY_SESSION_NAME=git/notebooks/1b7dcc20-a919-1d16-9efe-13f1d420ee93

with the uuid rather than actual file name of the notebook?

I was going to use the current notebook name as a way to add it to artifacts that are preserve along with other outputs but without the file name it's of no use and there seems to be no other sane, portable way to do so (https://stackoverflow.com/questions/12544056/how-do-i-get-the-current-ipython-jupyter-notebook-name) :(

p.s. I'm using Jupyter notebooks under JupyterLab.

kevin-bates · 2022-11-22T16:01:27Z

Hi @emsi. Yeah, this was realized the other day as well and an issue opened here: jupyter-server/jupyter_server#1059. I would suggest we keep the discussion there.

Seems like this issue should be closed since the desired behavior was implemented (in jupyter_server), just that something caused the desired behavior to be changed. Closing.

Carreau force-pushed the current-file-name branch from c8dae7c to 4afe680 Compare June 22, 2021 10:15

blink1073 added the enhancement label Aug 5, 2021

Carreau force-pushed the current-file-name branch from 4afe680 to 95f01a7 Compare September 14, 2021 21:37

Carreau force-pushed the current-file-name branch from 95f01a7 to a94cfb2 Compare September 14, 2021 21:40

Carreau force-pushed the current-file-name branch from a94cfb2 to d0ee6c9 Compare September 14, 2021 21:51

marcinwrochna mentioned this pull request Sep 23, 2021

Not passing token to NotebookApp spoils notebook discovery. jupyterhub/jupyterhub#3605

Open

Carreau mentioned this pull request Jan 25, 2022

Inject session identifier into environment variable. jupyter/notebook#6279

Closed

Carreau mentioned this pull request Jan 25, 2022

Inject session identifier into environment variable. jupyter-server/jupyter_server#679

Merged

arogozhnikov reviewed May 3, 2022

View reviewed changes

shahinrostami mentioned this pull request Oct 26, 2022

Add Jupyter integration page to Docs datapane/datapane#312

Merged

2 tasks

kevin-bates closed this Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lay foundation to pass notebook names to kernel at startup. #656

Lay foundation to pass notebook names to kernel at startup. #656

Carreau commented Jun 7, 2021

Carreau commented Jun 7, 2021

davidbrochart commented Jun 7, 2021

SylvainCorlay commented Jun 10, 2021

kevin-bates commented Jun 10, 2021

Carreau commented Jun 15, 2021

MSeal commented Jun 16, 2021

kevin-bates commented Jun 16, 2021

Carreau commented Jun 22, 2021

Carreau commented Jun 23, 2021

blink1073 commented Jul 30, 2021

kevin-bates commented Jul 30, 2021

Carreau commented Sep 14, 2021

kevin-bates commented Sep 15, 2021 •

edited

Loading

arogozhnikov May 3, 2022

arogozhnikov commented May 3, 2022 •

edited

Loading

arogozhnikov commented May 8, 2022

kevin-bates commented May 9, 2022

arogozhnikov commented May 11, 2022 •

edited

Loading

emsi commented Nov 22, 2022

kevin-bates commented Nov 22, 2022

Lay foundation to pass notebook names to kernel at startup. #656

Lay foundation to pass notebook names to kernel at startup. #656

Conversation

Carreau commented Jun 7, 2021

Carreau commented Jun 7, 2021

davidbrochart commented Jun 7, 2021

SylvainCorlay commented Jun 10, 2021

kevin-bates commented Jun 10, 2021

Carreau commented Jun 15, 2021

MSeal commented Jun 16, 2021

kevin-bates commented Jun 16, 2021

Carreau commented Jun 22, 2021

Carreau commented Jun 23, 2021

blink1073 commented Jul 30, 2021

kevin-bates commented Jul 30, 2021

Carreau commented Sep 14, 2021

kevin-bates commented Sep 15, 2021 • edited Loading

arogozhnikov May 3, 2022

Choose a reason for hiding this comment

arogozhnikov commented May 3, 2022 • edited Loading

arogozhnikov commented May 8, 2022

kevin-bates commented May 9, 2022

arogozhnikov commented May 11, 2022 • edited Loading

emsi commented Nov 22, 2022

kevin-bates commented Nov 22, 2022

kevin-bates commented Sep 15, 2021 •

edited

Loading

arogozhnikov commented May 3, 2022 •

edited

Loading

arogozhnikov commented May 11, 2022 •

edited

Loading