Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebSocket long-polling in a Kibana cluster may cause hard to diagnose errors #29653

Closed
chrisdavies opened this issue Jan 30, 2019 · 8 comments
Closed
Labels
Feature:Canvas Team:Presentation Presentation Team for Dashboard, Input Controls, and Canvas

Comments

@chrisdavies
Copy link
Contributor

In a clustered environment, with long-polling, each new long-poll request could theoretically hit a different Kibana server than the one previously used. This means that the server-side state for expression evaluation might get lost.

It's probably a pretty rare edge-case, but one worth bringing up and discussing.

@chrisdavies chrisdavies added Team:Presentation Presentation Team for Dashboard, Input Controls, and Canvas canvasGA_0 labels Jan 30, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-canvas

@epixa
Copy link
Contributor

epixa commented Jan 30, 2019

Why do you think that would be rare? I would expect this to effectively break expressions on all multi-instance Kibana installs.

@chrisdavies
Copy link
Contributor Author

I suspect that sticky sessions are probably the norm, so long-polling requests probably will normally reconnect to the same server. There was a discussion about this topic in the Canvas channel, and it sounds like Canvas is mostly stateless (e.g. when a long-poll request completes, the state will have been transferred back to the client). If I understand correctly, that means that most of the time, the client can connect to any server for the next long-poll request, because it will be sending the context along with it.

The case where this might cause a problem more frequently is if socket.io tries to multiplex requests over the same long-polling connection. I suspect it would try to do this, otherwise, it'd use up all of the browser's available connections pretty fast. If it's multiplexing, then we'll see this issue more frequently.

So you may be right. It may not be rare, but it's hard to say without concrete testing of the scenario.

@epixa
Copy link
Contributor

epixa commented Jan 31, 2019

We've actively discouraged people from using sticky sessions in the past, and we certainly don't encourage people to use them. In fact, we have features in the product (e.g. security.encryptionKey) that exist specifically to ensure folks do not need to use sticky sessions. I don't believe Cloud uses them at all.

So at the least, we must assume sticky sessions are not in play.

@stacey-gammon
Copy link
Contributor

Websockets have been removed!

@Randy-312
Copy link

With what version will the webSockets be removed?

@cqliu1
Copy link
Contributor

cqliu1 commented Mar 13, 2019

With what version will the webSockets be removed?

It will be removed in 6.7.0.

@Randy-312
Copy link

Great.. Next week?
Deciding how much time to spend solving this problem.. although if it IS something else, i should keep working on it.

I wasn't able to find materials around canvas requiring sticky sessions, but this socket.io issue does appear to confirm that is still required.

So, with this fix, will we still need to have Sticky Sessions in a Load Balancer between Kibana an ES?
.. any ECE insights for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Canvas Team:Presentation Presentation Team for Dashboard, Input Controls, and Canvas
Projects
None yet
Development

No branches or pull requests

7 participants