-
Hello, I got a distributed layout with multiple sensors, a manager and a search node, using many elastic agent integrations. The manager has a disk of 200 GB, and the search node has multiple TBs. Over night the manager /nsm hit 100% usage and was completely full. This shut down Elasticsearch with the disk watermark. My errors were identical to this: https://discuss.elastic.co/t/how-to-solve-we-couldnt-log-you-in-please-try-again-error-in-kibana/332658/4 I see in /nsm/elasticsearch/indicies/ that full indicies are being stored on the manager, which to my understanding I thought the searchnodes should be the only ones storing them. How does 2.4 handle deleting logs based off the directory size? I know in 2.3 it used to be handled by salt/curator/action/delete.yml but now looking at that file in 2.4 it seems entirely broken. The log_size_limit isn't being set properly. Then search nodes also don't have curator installed on them. My manager is a VM and I had a snapshot so I was able to go back to before this error, but that means I don't have any of the SO logs. But I'm assuming this problem will arise again as the /nsm directory on the manager keeps increasing in storage. So is there a way to make sure this directory doesn't reach 100% storage utilization or a way to make the manager not store indicies and just utilize the search nodes? |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 16 replies
-
Can you run sudo so-elasticsearch-query _cat/shards |
Beta Was this translation helpful? Give feedback.
-
Search node is automatically relocating shards to the manager. Caught this happening in the act, unsure what's causing this, but the result of
Ran
A fix to stop the automatic relocating would be great, as it'll eventually fill the manager /nsm disk up and hit the elasticsearch data floor and therefore stop ingesting logs, when the search node still has plenty of space. |
Beta Was this translation helpful? Give feedback.
-
Another update: After running |
Beta Was this translation helpful? Give feedback.
-
Wanted to give this thread a bump... having the same issue for a fresh SOv2.4.30 distributed install. Like reported above, some Zeek data streams are being stored on the manager node (which led to /nsm reaching 100% and caused manager services to fail). Modifying the elasticsearch watermarks (thanks @EddieN17) appears to be preventing further /nsm space issues. Is there any way to prevent Zeek data streams from being stored on the manager?
|
Beta Was this translation helpful? Give feedback.
The following should keep indices and data streams from creating on your manager. Replace the IP with the IP address of your manager.
PUT _cluster/settings { "transient" : { "cluster.routing.allocation.exclude._ip" : "10.0.0.1" } }