Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EventStream disconnects are "harmless errors" #743

Open
betatim opened this issue Nov 30, 2018 · 7 comments
Open

EventStream disconnects are "harmless errors" #743

betatim opened this issue Nov 30, 2018 · 7 comments

Comments

@betatim
Copy link
Member

betatim commented Nov 30, 2018

ref: jupyterhub/repo2docker#412 (comment)

A frequent source of "my repo doesn't build" comments comes from when the BinderHub loading page loses connection to the EventStream. This happens sometimes for builds (short and long :-/) as well as when you snooze your laptop. The visual feedback currently is unambiguous: your build failed.

You have to know that this error is one of these errors that you can ignore and fix by refreshing the page.

Can we detect losing connection to the event stream and distinguish it from the stream ending because the build is over? That way we could either reload automatically or present a different message to the user that encourages them to do so.

@minrk
Copy link
Member

minrk commented Dec 4, 2018

I really want to implement the proper event-stream api where connecting to an event stream never has any effects, and exclusively retrieves events. Then there's no need to notify the user of lost event-stream connections, just constantly reconnect. As it is now, reconnecting the event stream causes a new request to build and launch.

@betatim
Copy link
Member Author

betatim commented Dec 4, 2018

What do you think of splitting the current /build/<spec> endpoint into two:

  1. PUT /trigger_build/<spec> to kick off a build and
  2. GET /build_logs/<spec> to see a stream of the current logs

The PUT can return a 201 if the repo has never been built before and a 200 (or 204?) otherwise. If the return code is not a 201 there would be no promise that the log endpoint contains anything. It however might contain stuff if a previously triggered build is still in progress.

This would let our "client website" connect to the log endpoint all the time like you suggested and didn't have to tell the user.

One thing I don't know how to do: where do we keep the Queue that is created for the builder to put log messages? We'd need some shared place that both the build trigger and build log handler can access. Would it be an attribute of the tornado app instance?


Do we need to go from API v2 to v3 for this? Maybe we can stay at v2 and keep /build/ as is but stop using it ourselves and persuade thebelab/juniper/nbinteract maintainers to switch over to the new endpoints?

@minrk
Copy link
Member

minrk commented Dec 5, 2018

We've sketched this out in #358, but that's basically how it works. The trick is in designing the API such that:

  1. any state lives in the token, so that multiple BinderHub instances can handle each of the the now multiple requests (this works already for build logs, but maybe not launch)
  2. event-stream handler is purely side-effect-free
  3. when is the launch requested?

Here's a sketch:

  1. consumer does POST /api/v2/build/:spec. This triggers the build if needed and selects everything that will be used throughout the process (build id, username, image name)
  2. reply contains a jwt with information about a username and build id, and whether a build has been launched
  3. GET /api/v2/events/:jwt parses the token and hooks up logs of the running build, notifying when build is complete (build may be complete before this request is made)
  4. POST /api/v2/launch/:jwt actually does the launching and replies with the running server url and token (should error appropriately if build is incomplete or failed, which would mean the request was made too early)

@rheiland
Copy link

rheiland commented Apr 9, 2019

In lieu of any serious fix, is it possible to just append a message to the existing "Failed to connect to event stream" to have the user simply try refreshing the page?

@betatim
Copy link
Member Author

betatim commented Apr 9, 2019

I think that would be a good first step. Especially considering this issue has been here since November last year and no one has had time to work on it. Do you want to give extending the error message a go @rheiland ?

@gaelforget
Copy link

gaelforget commented Jan 14, 2020

Not sure if this is the exact same issue but I get the dreaded failed to connect to event stream for this repo: https://github.com/gaelforget/MeshArrayNotebooks. I tried from a different branch and get the same result. However, mybinder.org does build fine when I use this identical fork: https://github.com/hubgcm/MeshArrayNotebooks

What am I missing? How do I get back to being able to use binder with the original repo?

Please advise.

ps, 2020/01/15. Chrome did not show the same issue as Safari does.

@betatim
Copy link
Member Author

betatim commented Jan 25, 2020

This is an issue that is very hard to debug for us. It seems to depend more on the user than the repo. Evidence of this is that repositories users report as broken with this error message "just work" for me. Sometimes switching browsers or location (other network provider) fixes things for the users as well.

This means we rely on someone who has this problem to debug it or figure out how to make it reproducible for others.

A PR that adds "try reloading the page" to the error message would be welcome and (empirically) would help a large fraction of users who encounter this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants