-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster state is serialized on the network thread #39806
Comments
Pinging @elastic/es-distributed |
This sounds reasonable. Even if the state is 100MB or so it shouldn't take that much time to serialise it (I hope at least :)) so it seems like a reasonable improvement. Though I wonder, why not just run this serialization on the generic thread pool, just to be safe (we could still cache to save some CPU)? That would seem like the cleaner approach and the cost of the context switches is probably well worth not having to see this in profiling on the transport thread when trying to reason out where the next slowness/blocker lives :) |
Today we compute the size of the compressed cluster state for each `cluster:monitor/state` action. We do so by serializing the whole cluster state and compressing it, and this happens on the network thread. This calculation can be rather expensive if the cluster state is large, and these actions can be rather frequent particularly if there are sniffing transport clients in use. Also the calculation is a simple function of the cluster state, so there is a lot of duplicated work here. This change introduces a small cache for this size computation to avoid all this duplicated work, and to avoid blocking network threads. Fixes elastic#39806.
@original-brownbear I was missing knowledge of |
We discussed this and decided we'd like to remove the reporting of the compressed cluster state size here. Ideally we will deprecate this in 6.7 and remove it in 7.0 by following these steps:
The BWC involved in the first step seems to be tricky, because we process I'm currently stuck trying to work out how best to permit (but not require) this warning to happen only if sending requests via a 6.x node. |
Today when we respond to a `cluster:monitor/state` action we serialize the current cluster state on the network thread in order to return its size when compressed. This serialization can be expensive if the cluster state is very large. Yet the size we return is not a useful number to report to clients. We plan to remove the size from the cluster state response because it is not a useful number to report to clients. This is the first step, in which the size computation becomes optional and deprecated. Relates elastic#39806
Closed by the PRs above. |
Today when we respond to a
cluster:monitor/state
action we serialize the current cluster state:elasticsearch/server/src/main/java/org/elasticsearch/action/admin/cluster/state/TransportClusterStateAction.java
Line 186 in 4cab8ec
However, we do so on the network thread:
elasticsearch/server/src/main/java/org/elasticsearch/action/admin/cluster/state/TransportClusterStateAction.java
Lines 59 to 60 in 4cab8ec
This is a problem if the cluster state is large, because it consumes a network thread doing non-network things. It's particularly a problem if using lots of transport clients with sniffing enabled since each client will, by default, call this method every 5 seconds.
Yet we do this serialization to get the compressed size of the cluster state so that we can report it (see #3415). Many clients will not care about this. Maybe we can omit it? Alternatively, note that it only depends on the cluster state so maybe we can cache it and invalidate the cache on each cluster state update.
The text was updated successfully, but these errors were encountered: