-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcd creates excessive number of WAL files #10885
Comments
can you set max-wals number lower than 128? the purging is a periodical operation. |
Yes, we're going to reduce I have two concerns:
Setting |
We reduced
This is from an etcd node that had started with a blank disk and running for less than three hours, on a fairly small Kubernetes cluster (less than 800 pods across 20 nodes) |
can you get the logs from etcd server |
Sure, @xiang90. The logs will likely be very large and we'll have to pull a large amount of data from Splunk- are there any particular lines or patterns you are interested in? |
I've pulled 3 hours of etcd logs from our Splunk. The logs start just before we begin maintenance on the 3 etcd nodes to change the |
Potential problems:
|
We've implemented a few changes:
Thanks all for the advice! Closing this issue- will reopen if needed. |
Hello everyone. My team had the same issue. In our case we are using a Docker with ETCD and the excessive WAL files were not only consuming the assigned disk, but also creating a high memory consumption that was causing the Docker being to stop when it reached more than 4 Gb of memory. We found that "ETCD was taking snapshots every 10,000 records by default, and the WAL files couldn’t be deleted". We identified that the Docker was not creating any snapshots. We decided to modify our ETCD Docker with the following values: "--max-snapshots=2 --max-wals=5 --enable-v2 -auto-compaction-retention 1 --snapshot-count 5000". After these changes, the Docker started creating snapshots every 25 minutes and the WAL files are being deleted so the device is working with 2 snapshots and just 5 WALs. The disk is not being overloaded and the memory consumption goes from 1.2 Gb to 2.3 Gb maximum and then goes back to 1.2 Gb and as a result, the Docker is not getting stuck as the memory stays below 4 Gb. I hope this comment is useful. Regards, Julian Gomez |
The problem still exists. Could we reopen this issue? |
Could you raise a new issue and provide the following info?
|
same issue in our cluster |
Since max-wals is not declared, the default value (5) is used, but the wal file exceeds 150 and the storage is full. please reopen this issue. |
Do we have new issue to track this? |
Since v3.2, the default value of --snapshot-count has changed from from 10,000 to 100,000. If the snapshots are too infrequent, there can be more than --max-wals=5, as file-system level locks are protecting the files preventing them from being deleted too early. maybe because this ? so change snapshot-count to a little value cause snapshot in little time, then wal file can clean more often 。 |
/mark |
We have a 3-node etcd cluster running etcd 3.3.10 for a Kubernetes cluster. This etcd cluster runs on nodes with high-performance but low-capacity volumes. We've set
max-wals
to 128, but we consistently see the number of WALs exceeding 128 for extended periods of time. (We do see WAL purging in our logs, so purging is occurring eventually.) Restarting etcd purges the WALs back down to 128.How do we troubleshoot why so many WALs are being created? Can we take further steps to limit the number of WALs and reduce the chance of running out of storage space on these smaller nodes?
The text was updated successfully, but these errors were encountered: