Poor performance when saving a notebook over SSH #939

sparseinference · 2016-01-09T03:49:20Z

I use the Jupiter notebook on a local Linux system (Ubuntu 15.10) and I run the sever and kernel on a remote Linux VPS (also Ubuntu) through an SSH connection. The server is Python 3.5 (Anaconda).

The connection upload speed is much slower than the download speed and I've noticed that the time to save a notebook is very long (several minutes).
The network connection upload rate spikes to about half the available bandwidth and the browser (Chrome 47) slows down so that normal browsing (for documentation for example) is also slowed down to the point of being unusable.

I use Bokeh plots in the notebook, and it is only when I have Bokeh plot output in the notebook that I've noticed that this occurs.

Could the notebook save function be also saving all the Bokeh plot output back through to the server?
That seems unnecessary to me.

Can anyone confirm that this is happening?
This could be related to Issue #650. In this case too, large plots are present in the notebook output. In my case I don't really want the plot output to be saved because I can always regenerate them and if I want a permanent output I can save an image.

takluyver · 2016-01-09T17:14:12Z

At the moment, saving is quite naive, and it does indeed send all the data back to the server. This is because only the frontend, not the server, keeps track of the notebook state. Some of the work @Carreau is doing for real time collaboration should make smarter saving possible by holding document state in the server.

sparseinference · 2016-01-09T17:26:25Z

OK thanks - I suspected that was the case.

Until the collaboration work is ready then, would it be a bad idea to simply avoid sending any cell output back to the server if a flag was enabled somewhere?

Maybe it could work like first clearing all outputs before saving - but without requiring any computed data in variables to be recomputed, of course, since that would make the effort useless.

minrk · 2016-06-16T10:00:55Z

An extension can clear outputs automatically on save, or even exclude output from every save if desired. I don't think we should be baking this in as a configurable by default, though, as there is a cost to every configurable option.

sparseinference · 2016-06-17T03:06:51Z

Amazing ... 6 months after I asked a simple question, the issue is closed without even attempting to address the original problem or to enter into any dialog at all about it.

ellisonbg · 2016-06-17T03:30:04Z

Apologize for that - not sure why it was closed. Can someone provide some background on why this was closed?

I do know that we are waaaay behind on issue triage though

Sent from my iPhone

On Jun 16, 2016, at 8:06 PM, Peter Cavén notifications@github.com wrote:

Amazing ... 6 months after I asked a simple question, the issue is closed without even attempting to address the original problem or to enter into any dialog at all about it.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

Carreau · 2016-06-17T03:55:40Z

Amazing ... 6 months after I asked a simple question, the issue is closed without even attempting to address the original problem or to enter into any dialog at all about it.

Sorry about that, the "close and comment button" is next to the "comment" one, it's common to miss click on the wrong button, especially when navigating quickly with the keyboard. So annoying that there are even extensions to move it.

It's true also that we are small team, and literally have 100ds of notification each day – just today ~50 issues open, many notification per issue just on GitHub. Though we try our best and we like to have the benefit of the doubt.

The comment that "without even attempting to address the original problem or to enter into any dialog at all about it." seem a bit harsh, as 2 developers from the team did write comment explaining the current situation: and proposing to write an extension.

Anyway looking again at the issue and the already given comments: yes it is an issue we are aware of; we need to have incremental saving/synchronization. Which is not going to happen that soon but is worked on. And like the second, I agree that a configuration option is likely not the right way to tackle, it but that yes, a custom extension for the notebook can perfectly handle the case and strip bokeh plot at save time.

We are likely not going to special case Bokeh, and I know of a few people that will purchase us to the end of the earth with chainsaw if we strip large output by default.

So here is the status quo. there is likely nothing that going to be done for this specific case right now, at least not in core. And it will improve with time.
custom.js and various extensions can achieve that, if needed, but we don't provide stable API for it.
So your best chance is to go with a custom API, resources online are plentiful, if you need a hand we'll be happy to provide more information.

Thanks,

sparseinference · 2016-06-17T06:00:39Z

@Carreau @ellisonbg Thanks for responding.

I understand that you are overwhelmed with issues.
When I first asked the question I guess you missed that I was asking for advice on how I could help to solve the problem.
Since then, I rearranged things to work locally instead of remotely because it was very difficult to be productive on a remote server.
Since I don't know the code-base, I was looking for a way to work-around the problem without introducing any more. I didn't know anything about extensions or even that they existed.
I also wasn't trying to fix a special case with Bokeh. The problem exists where any cell output in large quantity is produced.

OK, I will search for the online resources about extensions and see how that works. A requirement to work remotely will undoubtedly appear again soon, so it would be nice to be ready.

Thanks,

minrk · 2016-06-17T08:08:49Z

Sorry for closing prematurely, it was not meant to shutdown discussion, just indicating that this does not represent a task to do on this project. I mentioned extensions as a possible solution, but should have given more detail. As an example, I wrote this one that removes output from the data that is to be saved. I hope that helps.

sparseinference · 2016-06-17T16:47:02Z

@minrk Thank you very much for the example. That helps a lot to get started.

JamiesHQ · 2017-04-27T02:05:49Z

Hi @sparseinference : just checking in to see how things went with the extension minrk suggested above and if there's anything else we can help you with on this issue. thanks!

sparseinference · 2017-05-11T18:00:32Z

Hello @JamiesHQ Thanks for the reminder - I intend to resume looking into that soon.

TuranTimur · 2017-11-16T13:46:07Z

+1

it seems that it uploads 717b each time uploading.
@JamiesHQ would there be any workaround?

shahbazbaig · 2020-03-31T13:00:20Z

I am also facing same issue. Its taking so long for each operation/command to be executed. Any suggestion to overcome this problem?

LustigePerson · 2021-08-12T14:43:47Z

For some time now, notebooks have unique cell IDs, right? Perhaps it would even be possible to only sync changed cells this way?
Working on a remote server really becomes painful when you start having some plots in your notebook.
Is there any Solution for this at jupyter hub? They must be facing this problem even stronger.

minrk added this to the no action milestone Jun 16, 2016

minrk closed this as completed Jun 16, 2016

Carreau reopened this Jun 17, 2016

This was referenced Jun 17, 2018

Define a standard between Jupyter and RStudio mwouts/jupytext#1

Closed

Support for text only notebooks (python scripts, R markdown) in Jupyter #3694

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor performance when saving a notebook over SSH #939

Poor performance when saving a notebook over SSH #939

sparseinference commented Jan 9, 2016

takluyver commented Jan 9, 2016

sparseinference commented Jan 9, 2016

minrk commented Jun 16, 2016

sparseinference commented Jun 17, 2016

ellisonbg commented Jun 17, 2016

Carreau commented Jun 17, 2016

sparseinference commented Jun 17, 2016

minrk commented Jun 17, 2016

sparseinference commented Jun 17, 2016

JamiesHQ commented Apr 27, 2017

sparseinference commented May 11, 2017

TuranTimur commented Nov 16, 2017 •

edited

Loading

shahbazbaig commented Mar 31, 2020

LustigePerson commented Aug 12, 2021

Poor performance when saving a notebook over SSH #939

Poor performance when saving a notebook over SSH #939

Comments

sparseinference commented Jan 9, 2016

takluyver commented Jan 9, 2016

sparseinference commented Jan 9, 2016

minrk commented Jun 16, 2016

sparseinference commented Jun 17, 2016

ellisonbg commented Jun 17, 2016

Carreau commented Jun 17, 2016

sparseinference commented Jun 17, 2016

minrk commented Jun 17, 2016

sparseinference commented Jun 17, 2016

JamiesHQ commented Apr 27, 2017

sparseinference commented May 11, 2017

TuranTimur commented Nov 16, 2017 • edited Loading

shahbazbaig commented Mar 31, 2020

LustigePerson commented Aug 12, 2021

TuranTimur commented Nov 16, 2017 •

edited

Loading