-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster performance degradation over time #977
Comments
Do me a favor and look for log notices in the log similar to:
and
If more than 2secs, the above would be reported as an error, below that, as a notice. I am wondering if this is not the RAFT log compaction that is starting to degrade the performance. Based on that, we will see what's the next step. |
Hello. Thank you for quick reply.
|
So that's not it..
|
Let me answer that (I'm working with @qtheya):
|
For your use case, I wonder if a key-value store would not be a better fit. Regardless, since you use some memory store (we don't support this at the configuration level, but you bypass by using tmpfs as filestore), I see that you are not defining the node ids and list of peers, how do you initialize the cluster? |
Key-value store doesn't work in this use case, while NATS Streaming fits really nicely with minimal effort. I believe we are using https://github.com/nats-io/nats-operator and https://github.com/nats-io/nats-streaming-operator with number of nodes in cluster set to 3 for both. But our primary issue is not with cluster survival (though we did have many spot instances leaving during a day and cluster seem to survive them), but with performance we get out of it. We've resorted to tmpfs primarily because I/O load on disk subsystem was very high and caused all kinds of issues for nodes in our cluster when we've tried it. tmpfs was next logical option considering our use case. We though that may have been the bottleneck that caused performance degradation, but it didn't help (much?). |
But my point is that it may be that if you have several nodes being brought down and up again but every time with different names, and the leader is still alive, it will try to replicate to the past known nodes. I need to check with Wally how the operator works in term of node id/peers specification (from your config, it seems that you don't specify any). |
nats-operator and nats-streaming-operator bring nodes as statefulset. So, names are always the same, i.e:
|
Sorry for the delay. So if I wanted to reproduce:
Again, I want to insist of the cluster deployment. Even if this is not the problem at hand, yes the operator is deployed as a statefulset, but those names are not the one of the streaming node ids, which means that some random name is generated and cluster is initially formed, but if a node goes away (and since you don't use "persistent" storage), then even if the pod restarts as say "nats-streaming-2", its configuration does not have a cluster node id specified, and since there is no store, it will start as a fresh new node with a unique node ID. Again, the leader will have in its list of peers the old cluster node ID and will continuously try to replicate to it. Something to keep in mind... |
But that's single node, and I assume this is low level NATS messages in/out (since we don't have this metric in Streaming monitoring). In the cluster, in/out will be way higher since messages (and meta data: connection/subscription create/close, msg x sent to subscription y, subscription y ack'ing msg x, etc..), are replicated to all nodes and there is on top of that the RAFT protocol messages (quite a lot). Should I assume that load/etc would be same in cluster mode? So if I wanted to simulate the load, what would I have to do? You gave me information about the number of connections, and the options to use for the pubs, but what should I do?
There is a lot I am missing to even try to reproduce the behavior you are experiencing. If you were running NATS Streaming with embedded NATS Server, there would be the option to enable the profiling (-profile ) which allows to use go tooling to capture some profile (cpu or mem) when a server would be experiencing those high CPU usage. That may have given a clue on what it is doing at this time. |
Hello. We have:
Setup 1:
Single instance of nats-streaming in k8s on c5.2xlarge
In-Memory store
Setup 2:
nats-operator 3x nodes
nats-streaming-operator 3x nodes
in k8s on c5.2xlarge
Filestore mounted to emptyDir.medium: Memory (tmpfs)
in top it was 400-500% cpu usage
Our usual load looks like this:
k8s nodes have antiAffinity and podAntiAffinity too
Questions:
0) As you can see, performance in cluster mode is ok for some period after start of cluster. But over time it starts "eating" cpu and as result ack time increasing. Do you have any ideas why it happens ? Our setup with single node in that environment works like a charm for 3 months.
Thnx
The text was updated successfully, but these errors were encountered: