Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write an estimate how long it takes to load #830

Merged
merged 1 commit into from
Apr 25, 2019

Conversation

certik
Copy link
Contributor

@certik certik commented Apr 25, 2019

It can take a long time to load the jupyter notebook. On average it takes about 30s for me. For new users who have never used Binder, it would be nice to give them some estimate what the expected waiting time is, so that they do not give up. This commit tries to do that.

It can take a long time to load the jupyter notebook. On average it takes about 30s for me. For new users who have never used Binder, it would be nice to give them some estimate what the expected waiting time is, so that they do not give up. This commit tries to do that.
@betatim
Copy link
Member

betatim commented Apr 25, 2019

Thanks for adding this! Let's try it out, it is better than what we had before (nothing) so merging this. We can then iterate on it in new PRs. Or maybe someone can extend it with some clever JS.

@betatim betatim merged commit 9ea3a2b into jupyterhub:master Apr 25, 2019
yuvipanda pushed a commit to jupyterhub/helm-chart that referenced this pull request Apr 25, 2019
@certik
Copy link
Contributor Author

certik commented Apr 25, 2019

Thanks @betatim. I just tested the load time on Binder and here are the times from clicking the button to seeing a Jupyter notebook (I clicked, waited to load, then clicked again, waited to load, and so on, 10 times):

253s
13s
66s
12s
21s
16s
24s
15s
39s
13s

My experience is that the first time takes longer, usually between 30s and 60s, and then the subsequent times can be as fast as 10s, although it varies.

In the above experiment, the first time took several minutes, but that's an anomaly, and the subsequent times oscillate between 12s - 66s.

In all the above I assume the docker image is already built, which it was in my case. Of course, if it has to be built the very first time, then it takes several minutes, but that's only a one time thing, which I trigger myself by clicking the link whenever I push into my notebook repository.

I assume the reason the first time is slower is that it needs to create a pod and download the docker image, so an improvement over my PR would be to put more detailed log message like "starting a pod", "downloading docker image" and then show some progress bar on the docker image download, etc.

@betatim
Copy link
Member

betatim commented Apr 25, 2019

There are a few things that happen when you click a badge like yours:

  1. we build the image because there isn't an already built image in our cache (very slow)
  2. a node in the cluster is selected based on some algorithm. Roughly: it prefers nodes that already have lots of pods running on them
  3. pod is created on that node
  4. if the node doesn't have the docker image it pulls it (slow)
  5. pod starts

While the starting of the pod could take "a while" (users could do crazy long computations as part of startup or the node is mega busy because people are training neural networks like crazy) I think the difference is small. So in practice it'll be "quick".

This leaves pulling images as a culprit, but doesn't explain why there is so much variation. I would expect a bimodal distribution: needs pulling (cluster around a slow time) and doesn't need pulling (cluster around a fast time). With "needs pulling" happening at the start because after you should always end up on the same node.

There is one last thing: we check if the pod is ready to go (and redirect the user's browser) with an increasing back off. We wait an initial amount, then wait twice as long, check, double the wait, check, double the wait, check, etc (Actually the real wait increase might be t^1.41 not t^2). I've been wonderig for a while if a "large part" of the slow start up is because we don't wait long enough for the pod to get ready and then spend a lot of time backing off. Its a bit tricky to investigate because it depends on the "load" and cluster setup. If you run BinderHub locally on your laptop it will be like a one node cluster with a super fast docker registry. So timings will be different. A cool project for a long winter's evening :)

@betatim
Copy link
Member

betatim commented Apr 25, 2019

A much shorter point: exposing where in the mentioned steps you are as a user would already be good feedback (I think). Even if we don't really know how long it will take to finish pulling the image, giving people a sense of "You are at step 3 of 4" etc would be good already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants