Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up loading of REMARKs using JupyterHub or alternative #5

Open
4 tasks
shaunagm opened this issue Feb 28, 2019 · 23 comments
Open
4 tasks

Speed up loading of REMARKs using JupyterHub or alternative #5

shaunagm opened this issue Feb 28, 2019 · 23 comments

Comments

@shaunagm
Copy link

shaunagm commented Feb 28, 2019

Update: Google Colab is looking very promising. Next steps:

  • SGM tries example with full LaTex - how much does it slow down loading?
  • check all notebooks work in Colab (will need to at least add import statements) (SGM can do this by forking REMARKs repo)

If we decide on Colab, we will then:

  • update how we direct people on the website
  • reformat REMARKs to remove now-unnecessary mybinder structure

Currently, mybinder takes several minutes to load our remarks and other notebooks. We'd like anyone loading a notebook from the econARK site to be able to access it quickly, definitely in under 30 seconds and ideally faster. Not sure whether we need special hosting, caching, etc to make this happen. Relatedly, our current setup causes issues with dependency management (see issue #12).

Use cases:

  • giving a presentation or sharing with a colleague, when you need to launch your own notebook quickly and interactively
  • teaching a workshop, with dozens of individuals needing to interact with a notebook
  • random visitors to the website

The above use cases have different limitations. A lot of platforms I've investigated have access control such that it would be hard to provide this service to anonymous internet people but easy enough for anyone we approve of and who can spare a few minutes of initial setup. It's possible we could set up a two tiered system where community members (and the students/workshop attendees of community members) can launch notebooks quickly and internet strangers continue to use the mybinder system. (Although this doesn't address the dependency management aspect, just the performance aspect.)

Potential approaches:

  1. find a hosting platform that allows us to do this (trying this first, since it's the easier option; see list below)
  2. set up our own hosting system, perhaps on AWS (OTS did this for us briefly last year, for a specific event)

People to consult:

  • OTS
  • QuantEcon (Though they appear to use the same mybinder.org-based solution that we currently do, with corresponding 30 second launch times. What did they customize?)

Hosting Platforms & Notes

  • Google Colab
  • CodeOcean - major roadblock here is that you have to login to run notebooks interactively
@shaunagm
Copy link
Author

@llorracc, @mnwhite - I'm going to check out Google Colab sometime in the next week or so, using one or two of the existing notebooks as tests. Are they all about the same in terms of computational power needed? If not, can you suggest a notebook that's on the computation-heavy end, but not an outlier?

@llorracc
Copy link
Contributor

llorracc commented Mar 20, 2019 via email

@shaunagm
Copy link
Author

shaunagm commented Mar 21, 2019

I'm liking Google CoLab so far. Loading a notebook stored in github was as simple as using the customized url which takes the format:

colab.research.google.com/github/$our_organization_name/$repository_name/blob/master/$relative_path_to_notebook.ipynb

It loaded fairly fast, about 10 seconds by my count, and it looks like the Latex is all there.

The hosted runtime options seem fairly limited - only Python 2.7 and 3.6 are options, and they come with certain libraries pre-installed, which means we don't have as much flexibility in choose the environment we want the notebooks to run in. I think that means we're stuck with a little cell at the top of all our notebooks that looks something like:

!pip install econ-ark
!pip install matplotlib==1.2  # made up example for if we need to use a different version

But the notebooks currently have set-up cells anyway, so. Anyway, here's a list of issues I encountered and their solutions:

  1. TkAgg error
ImportError: Cannot load backend 'TkAgg' which requires the 'tk' interactive framework, as 'headless' is currently running

This is due to trying to load get_ipython().run_line_magic('matplotlib', 'auto') instead of get_ipython().run_line_magic('matplotlib', 'inline') (inline is the only matplotlib backend supported on CoLab) (no I don't know what that means). This should be easily fixable by rejiggering that if/else statement to catch the CoLab environment in the if, but for now I just replace 'auto' with 'inline' manually.

  1. Import HARK error

As expected I then got told that HARK didn't exist, and added !pip install econ-ark to the top of the first cell.

  1. Latex display issue

In cell 25 the latex on lines 22-26 (I think, annoyingly it doesn't give line numbers) is causing an error. I deleted the four lines and it ran fine but without the necessary output. I don't know what's going on well enough to fix it. Maybe the Latex didn't load as well as I thought? Same/similar issue with the last code cell.

Anyway, that was surprisingly straightforward, but I'm also not familiar enough with the notebook to know if there's content errors or missing pieces. Chris & Matt, you should definitely take a look!

@DrDrij
Copy link
Member

DrDrij commented Mar 28, 2019

Great summary @shaunagm! Its exactly where QuantEcon is also at - including a cell with pip install commands to get the requirements. I think it's a good solution, the requirements are transparent and means the notebooks can run standalone.

@llorracc
Copy link
Contributor

llorracc commented Mar 28, 2019 via email

@shaunagm
Copy link
Author

We chatted about this during our meeting, but I'm going to test out how fast Colab is when loading Latex (Chris, do you know what package that is?) and then, if that works, the next steps are a) making sure existing notebooks all work in colab; b) changing how we refer to the notebooks on the website to point to colab instead of mybinder and c) re-organizing the remarks repo to remove the mybinder stuff since it will no longer be necessary.

@shaunagm
Copy link
Author

Update: the issue with overline appears to be a minor syntax error. There's a cell in the original notebook that uses the syntax overline c instead of overline{c}. Once I fixed that, and replaced underline with underbar, I ceased to get errors, although I can't verify that the output is what's desired beyond saying "yup sure does include some underlines and overlines".

I'm finding the underline issue deeply confusing, because isn't underline in basic latex? Why would we need to import anything?

I tried adding !pip install jupyterlab_latex and it doesn't change initial load time at all, since it's not executed until the cell is run. When you do run the cell, it adds another 3-4 seconds, which is not great but not terrible. Importing jupyterlab_latex did not solve the underline issue though.

@mnwhite
Copy link
Contributor

mnwhite commented Mar 28, 2019 via email

@shaunagm
Copy link
Author

I've got no idea - I've barely ever used LaTeX, and I'm finding the documentation hard to parse. Unfortunately Colaboratory's documentation is not great either (documentation & user support has always been a weak point for Google) so I'm not sure what's even running in the notebook. How do you know if you're in math mode? How can I check what LaTeX extensions would have underline implemented?

Anyway, here's a version of the notebook with the fixes described above:
https://colab.research.google.com/github/shaunagm/REMARK/blob/master/REMARKs/BufferStockTheory/BufferStockTheory.ipynb#scrollTo=cB71h4tn1dC0

@mnwhite
Copy link
Contributor

mnwhite commented Mar 28, 2019 via email

@shaunagm
Copy link
Author

Okay, I think I was misunderstanding what math mode even is. Yeah, that may be the issue. If swapping underline for underbar doesn't work for aesthetic reasons, I can explore the issue further, but for now I'll hold off.

@llorracc
Copy link
Contributor

llorracc commented Mar 28, 2019 via email

@shaunagm
Copy link
Author

I'm not super familiar with matplotlib. I'm sure I could fix the display issues given enough time but it may be more efficient to have a student work through it.

I'm going to try to generate a list of libraries and extensions we need. I suppose I could just use everything in the binder requirements.txt folder but I think that will include some extra stuff. But your "jupyter_contrib_nbextensions" is in there, so it's a good start.

re: configuring for both mybinder and colab - we should be able to do that, the question is whether we're okay with the potential added complexity. For instance, we could have a line in the notebook which checks whether something's already installed and only installs it if it isn't.

@shaunagm
Copy link
Author

From Chris's email, it seems line of of the biggest barriers with regard to CoLab is installing latex. I'm going to take a look and see if there's a way to install a much smaller subset of the library.

@shaunagm
Copy link
Author

shaunagm commented Apr 4, 2019

Update from the weekly meeting: Chris has tried the approach from this StackOverflow answer in this notebook (on colab) but it's not currently working - I'm going to try to debug.

There also appears to be an issue where notebooks aren't running if you aren't logged in to a google account, which I need to look into.

@llorracc
Copy link
Contributor

llorracc commented Apr 29, 2019

There is now a revision of BufferStockTheory.ipynb that works on MyBinder, CoLab, and on my Mac using the local jupyter notebook server that comes with Anaconda.

The notebook requires LaTeX tools from the American Mathematical Society's amsmath package in order for matplotlib to be able to render all the figures, and the solution to this problem is painful: The first (code) cell installs all needed dependencies from scratch, which can be very slow if LaTeX is not installed (e.g., for either MyBinder or CoLab's default environments).

The notebook starts by testing whether LaTeX is installed on the machine. If not, it tests whether the machine is ubuntu or not. MyBinder and CoLab both have a default of ubuntu; so you're in ubuntu, it installs the full version of LaTeX:

!apt-get install texlive dvipng texlive-xetex texlive-latex-extras

in CoLab this seems to take about 2-3 minutes, which is painful but not intolerable. MyBinder can take 10 minutes or more, if it works at all (it seems to fail altogether about half the time). This kind of defeats the purpose of "live" notebooks. (Even when myBinder says it "found built image" it might say "Launching server ... Launch attempt 1 failed, retrying ...")

It appears that MyBinder allows you to use prespecified Docker images instead of their default setup, but it is not clear to me whether that would be any faster. And at this point it looks like CoLab doesn't let you use your own docker images?

If there is not now a way to "pre-cook" or "pre-cache" a VM image to speed up loading, I'm guessing that's not an accident. At some point MyBinder needs a revenue model, and I'd totally be willing to pay something to reduce the loading time from 5 minutes to 30 seconds. I just wish they'd roll out their pricing scheme and let me pay them for this!

PS. The installations are not necessary (and therefore a waste of time) if the libraries are already available. But the overriding goal was to have a single notebook that works everywhere, and so the installation stuff all has to be in that first cell since CoLab does not have a mechanism like MyBinder for prespecifying requirements.

@shaunagm
Copy link
Author

@llorracc can you add a link to the "revision of BufferStockTheory.ipynb that works on MyBinder, CoLab, and on my Mac using the local jupyter notebook server that comes with Anaconda"?

@llorracc
Copy link
Contributor

There WAS a link -- it just didn't work! That's what I get for trying to construct the link myself. I've now fixed it -- just click on BufferStockTheory.ipynb above

@shaunagm
Copy link
Author

shaunagm commented May 6, 2019

I've heard from a couple folks here at PyCon, including @MridulS (one of our sprinters), that many projects use MathJax to load subsets of Latex fast. Here's some info on configuration options. Still need to research this

@llorracc
Copy link
Contributor

llorracc commented May 6, 2019

Hmmm, I had thought of MathJax as more of a rendering engine (it draws the characters on the your bitmap) than a tool that can read packages and interpret things like \underline. But underline is part of the amsmath package, and the configuration link you sent seems to be reading in something called amsmath.js which presumably is a javascript version of the amsmath package. So maybe it will work (if matplotlib plays nice with MathJax ...)

@shaunagm shaunagm mentioned this issue May 13, 2019
7 tasks
@shaunagm
Copy link
Author

shaunagm commented Jun 13, 2019

@llorracc, what do you need done by the 24th? Specifically, which notebooks do you need to be ready, and what are the ideal, and maximum acceptable, times for them to load?

I know you said you wanted Bufferstock Theory ready - any others?

@llorracc
Copy link
Contributor

llorracc commented Jun 13, 2019 via email

@llorracc
Copy link
Contributor

llorracc commented Jun 16, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants