etcd creates excessive number of WAL files #10885

dharmab · 2019-07-09T21:08:15Z

We have a 3-node etcd cluster running etcd 3.3.10 for a Kubernetes cluster. This etcd cluster runs on nodes with high-performance but low-capacity volumes. We've set max-wals to 128, but we consistently see the number of WALs exceeding 128 for extended periods of time. (We do see WAL purging in our logs, so purging is occurring eventually.) Restarting etcd purges the WALs back down to 128.

vm-etcd-13prodva7-2 etcd # ls *.wal | wc -l
133
vm-etcd-13prodva7-2 etcd # sudo systemctl restart etcd
vm-etcd-13prodva7-2 etcd # ls *.wal | wc -l
128

How do we troubleshoot why so many WALs are being created? Can we take further steps to limit the number of WALs and reduce the chance of running out of storage space on these smaller nodes?

The text was updated successfully, but these errors were encountered:

xiang90 · 2019-07-09T22:21:38Z

can you set max-wals number lower than 128? the purging is a periodical operation.

dharmab · 2019-07-09T23:19:56Z

Yes, we're going to reduce max-wals from 128 back to the default of 5.

I have two concerns:

max-wals isn't actually a hard maximum, but rather a soft cap beyond which WALs begin to be purged. So max-wals needs to be set significantly below the desired maximum WAL count.
fileutils.PurgeFile only purges files down to max-wals. There may be further candidates for purging (WALs which have no flocks), but PurgeFile won't purge these files if it gets down to max-wals. On smaller disks, it may be helpful to purge any and all unlocked WALs rather than short-circuiting.

Setting max-wals to a very small number seems to workaround these concerns, so we'll try that next.

dharmab · 2019-07-16T20:28:51Z

We reduced max-wals to 5 but are still seeing wildly excessive WAL utilization:

vm-etcd-13prodva7-1 wal # cat /etc/ethos/etcd.conf.yaml | grep max-wals
max-wals: 5
vm-etcd-13prodva7-1 wal # ls -lh
total 17G
-rw-------. 1 root root   0 Jul 16 18:53 0.tmp
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003e4-000000000776e505.wal
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003e5-000000000776e8f5.wal
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003e6-000000000776e9ba.wal
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003e7-000000000776ea7a.wal
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003e8-000000000776eb18.wal
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003e9-000000000776ebe4.wal
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003ea-000000000776ed05.wal
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003eb-000000000776ee29.wal
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003ec-000000000776eeeb.wal
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003ed-000000000776efb3.wal
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003ee-000000000776f06a.wal
-rw-------. 1 root root 62M Jul 16 18:16 00000000000003ef-000000000776f19a.wal
-rw-------. 1 root root 62M Jul 16 18:16 00000000000003f0-000000000776f404.wal
-rw-------. 1 root root 62M Jul 16 18:16 00000000000003f1-000000000776f4ac.wal
-rw-------. 1 root root 62M Jul 16 18:16 00000000000003f2-000000000776f55f.wal
-rw-------. 1 root root 62M Jul 16 18:16 00000000000003f3-000000000776f615.wal
-rw-------. 1 root root 62M Jul 16 18:16 00000000000003f4-000000000776f6d7.wal
-rw-------. 1 root root 62M Jul 16 18:17 00000000000003f5-000000000776fad3.wal
-rw-------. 1 root root 62M Jul 16 18:17 00000000000003f6-000000000776fe2d.wal
-rw-------. 1 root root 62M Jul 16 18:17 00000000000003f7-000000000776fed8.wal
-rw-------. 1 root root 62M Jul 16 18:17 00000000000003f8-000000000776ff8a.wal
-rw-------. 1 root root 62M Jul 16 18:17 00000000000003f9-000000000777003e.wal
-rw-------. 1 root root 62M Jul 16 18:18 00000000000003fa-00000000077700f0.wal
-rw-------. 1 root root 62M Jul 16 18:18 00000000000003fb-0000000007770592.wal
-rw-------. 1 root root 62M Jul 16 18:18 00000000000003fc-0000000007770653.wal
-rw-------. 1 root root 62M Jul 16 18:18 00000000000003fd-00000000077706ea.wal
-rw-------. 1 root root 62M Jul 16 18:18 00000000000003fe-000000000777078c.wal
-rw-------. 1 root root 62M Jul 16 18:18 00000000000003ff-0000000007770847.wal
-rw-------. 1 root root 62M Jul 16 18:19 0000000000000400-0000000007770be9.wal
-rw-------. 1 root root 62M Jul 16 18:19 0000000000000401-0000000007770e98.wal
-rw-------. 1 root root 62M Jul 16 18:19 0000000000000402-0000000007770f46.wal
-rw-------. 1 root root 62M Jul 16 18:19 0000000000000403-0000000007770ff8.wal
-rw-------. 1 root root 62M Jul 16 18:19 0000000000000404-00000000077710b3.wal
-rw-------. 1 root root 62M Jul 16 18:20 0000000000000405-000000000777115c.wal
-rw-------. 1 root root 62M Jul 16 18:20 0000000000000406-0000000007771637.wal
-rw-------. 1 root root 62M Jul 16 18:20 0000000000000407-0000000007771706.wal
-rw-------. 1 root root 62M Jul 16 18:20 0000000000000408-00000000077717ad.wal
-rw-------. 1 root root 62M Jul 16 18:20 0000000000000409-0000000007771856.wal
-rw-------. 1 root root 62M Jul 16 18:20 000000000000040a-0000000007771910.wal
-rw-------. 1 root root 62M Jul 16 18:20 000000000000040b-0000000007771a49.wal
-rw-------. 1 root root 62M Jul 16 18:20 000000000000040c-0000000007771b49.wal
-rw-------. 1 root root 62M Jul 16 18:20 000000000000040d-0000000007771c33.wal
-rw-------. 1 root root 62M Jul 16 18:20 000000000000040e-0000000007771ce4.wal
-rw-------. 1 root root 62M Jul 16 18:20 000000000000040f-0000000007771d97.wal
-rw-------. 1 root root 62M Jul 16 18:21 0000000000000410-0000000007771e9d.wal
-rw-------. 1 root root 62M Jul 16 18:21 0000000000000411-00000000077721fc.wal
-rw-------. 1 root root 62M Jul 16 18:21 0000000000000412-00000000077722e5.wal
-rw-------. 1 root root 62M Jul 16 18:21 0000000000000413-0000000007772392.wal
-rw-------. 1 root root 62M Jul 16 18:21 0000000000000414-000000000777242e.wal
-rw-------. 1 root root 62M Jul 16 18:21 0000000000000415-00000000077724de.wal
-rw-------. 1 root root 62M Jul 16 18:22 0000000000000416-00000000077725af.wal
-rw-------. 1 root root 62M Jul 16 18:22 0000000000000417-0000000007772b1b.wal
-rw-------. 1 root root 62M Jul 16 18:22 0000000000000418-0000000007772bc6.wal
-rw-------. 1 root root 62M Jul 16 18:22 0000000000000419-0000000007772c72.wal
-rw-------. 1 root root 62M Jul 16 18:22 000000000000041a-0000000007772d1b.wal
-rw-------. 1 root root 62M Jul 16 18:22 000000000000041b-0000000007772dc4.wal
-rw-------. 1 root root 62M Jul 16 18:23 000000000000041c-0000000007772ecf.wal
-rw-------. 1 root root 62M Jul 16 18:23 000000000000041d-000000000777341d.wal
-rw-------. 1 root root 62M Jul 16 18:23 000000000000041e-00000000077734b7.wal
-rw-------. 1 root root 62M Jul 16 18:23 000000000000041f-000000000777356c.wal
-rw-------. 1 root root 62M Jul 16 18:23 0000000000000420-000000000777360e.wal
-rw-------. 1 root root 62M Jul 16 18:24 0000000000000421-00000000077736c4.wal
-rw-------. 1 root root 62M Jul 16 18:24 0000000000000422-0000000007773b3d.wal
-rw-------. 1 root root 62M Jul 16 18:24 0000000000000423-0000000007773c08.wal
-rw-------. 1 root root 62M Jul 16 18:24 0000000000000424-0000000007773cb3.wal
-rw-------. 1 root root 62M Jul 16 18:24 0000000000000425-0000000007773d5c.wal
-rw-------. 1 root root 62M Jul 16 18:24 0000000000000426-0000000007773dfd.wal
-rw-------. 1 root root 62M Jul 16 18:24 0000000000000427-0000000007773ed4.wal
-rw-------. 1 root root 62M Jul 16 18:24 0000000000000428-00000000077742a8.wal
-rw-------. 1 root root 62M Jul 16 18:24 0000000000000429-0000000007774469.wal
-rw-------. 1 root root 62M Jul 16 18:24 000000000000042a-000000000777451a.wal
-rw-------. 1 root root 62M Jul 16 18:24 000000000000042b-00000000077745b9.wal
-rw-------. 1 root root 62M Jul 16 18:24 000000000000042c-0000000007774683.wal
-rw-------. 1 root root 62M Jul 16 18:25 000000000000042d-0000000007774750.wal
-rw-------. 1 root root 62M Jul 16 18:25 000000000000042e-0000000007774b15.wal
-rw-------. 1 root root 62M Jul 16 18:25 000000000000042f-0000000007774bde.wal
-rw-------. 1 root root 62M Jul 16 18:25 0000000000000430-0000000007774cac.wal
-rw-------. 1 root root 62M Jul 16 18:25 0000000000000431-0000000007774d54.wal
-rw-------. 1 root root 62M Jul 16 18:25 0000000000000432-0000000007774e09.wal
-rw-------. 1 root root 62M Jul 16 18:25 0000000000000433-0000000007774eb6.wal
-rw-------. 1 root root 62M Jul 16 18:25 0000000000000434-0000000007775037.wal
-rw-------. 1 root root 62M Jul 16 18:25 0000000000000435-0000000007775105.wal
-rw-------. 1 root root 62M Jul 16 18:25 0000000000000436-00000000077751c8.wal
-rw-------. 1 root root 62M Jul 16 18:25 0000000000000437-0000000007775298.wal
-rw-------. 1 root root 62M Jul 16 18:25 0000000000000438-0000000007775361.wal
-rw-------. 1 root root 62M Jul 16 18:26 0000000000000439-00000000077754b8.wal
-rw-------. 1 root root 62M Jul 16 18:26 000000000000043a-00000000077756ce.wal
-rw-------. 1 root root 62M Jul 16 18:26 000000000000043b-0000000007775780.wal
-rw-------. 1 root root 62M Jul 16 18:26 000000000000043c-000000000777582b.wal
-rw-------. 1 root root 62M Jul 16 18:26 000000000000043d-00000000077758d5.wal
-rw-------. 1 root root 62M Jul 16 18:27 000000000000043e-00000000077759a0.wal
-rw-------. 1 root root 62M Jul 16 18:27 000000000000043f-000000000777604e.wal
-rw-------. 1 root root 62M Jul 16 18:27 0000000000000440-000000000777610b.wal
-rw-------. 1 root root 62M Jul 16 18:27 0000000000000441-00000000077761c4.wal
-rw-------. 1 root root 62M Jul 16 18:27 0000000000000442-0000000007776268.wal
-rw-------. 1 root root 62M Jul 16 18:28 0000000000000443-0000000007776314.wal
-rw-------. 1 root root 62M Jul 16 18:28 0000000000000444-00000000077767cb.wal
-rw-------. 1 root root 62M Jul 16 18:28 0000000000000445-0000000007776886.wal
-rw-------. 1 root root 62M Jul 16 18:28 0000000000000446-000000000777692a.wal
-rw-------. 1 root root 62M Jul 16 18:28 0000000000000447-00000000077769d1.wal
-rw-------. 1 root root 62M Jul 16 18:28 0000000000000448-0000000007776a73.wal
-rw-------. 1 root root 62M Jul 16 18:29 0000000000000449-0000000007776b6a.wal
-rw-------. 1 root root 62M Jul 16 18:29 000000000000044a-00000000077770ce.wal
-rw-------. 1 root root 62M Jul 16 18:29 000000000000044b-0000000007777181.wal
-rw-------. 1 root root 62M Jul 16 18:29 000000000000044c-000000000777722a.wal
-rw-------. 1 root root 62M Jul 16 18:29 000000000000044d-00000000077772d6.wal
-rw-------. 1 root root 62M Jul 16 18:30 000000000000044e-0000000007777387.wal
-rw-------. 1 root root 62M Jul 16 18:30 000000000000044f-0000000007777849.wal
-rw-------. 1 root root 62M Jul 16 18:30 0000000000000450-0000000007777933.wal
-rw-------. 1 root root 62M Jul 16 18:30 0000000000000451-00000000077779e1.wal
-rw-------. 1 root root 62M Jul 16 18:30 0000000000000452-0000000007777a84.wal
-rw-------. 1 root root 62M Jul 16 18:30 0000000000000453-0000000007777b30.wal
-rw-------. 1 root root 62M Jul 16 18:30 0000000000000454-0000000007777c11.wal
-rw-------. 1 root root 62M Jul 16 18:31 0000000000000455-0000000007777f9a.wal
-rw-------. 1 root root 62M Jul 16 18:31 0000000000000456-000000000777830f.wal
-rw-------. 1 root root 62M Jul 16 18:31 0000000000000457-00000000077783b9.wal
-rw-------. 1 root root 62M Jul 16 18:31 0000000000000458-0000000007778463.wal
-rw-------. 1 root root 62M Jul 16 18:31 0000000000000459-0000000007778514.wal
-rw-------. 1 root root 62M Jul 16 18:32 000000000000045a-00000000077785cd.wal
-rw-------. 1 root root 62M Jul 16 18:32 000000000000045b-0000000007778b3e.wal
-rw-------. 1 root root 62M Jul 16 18:32 000000000000045c-0000000007778c18.wal
-rw-------. 1 root root 62M Jul 16 18:32 000000000000045d-0000000007778cd8.wal
-rw-------. 1 root root 62M Jul 16 18:32 000000000000045e-0000000007778d78.wal
-rw-------. 1 root root 62M Jul 16 18:32 000000000000045f-0000000007778e03.wal
-rw-------. 1 root root 62M Jul 16 18:33 0000000000000460-0000000007778f33.wal
-rw-------. 1 root root 62M Jul 16 18:33 0000000000000461-0000000007779479.wal
-rw-------. 1 root root 62M Jul 16 18:33 0000000000000462-0000000007779527.wal
-rw-------. 1 root root 62M Jul 16 18:33 0000000000000463-00000000077795cf.wal
-rw-------. 1 root root 62M Jul 16 18:33 0000000000000464-0000000007779682.wal
-rw-------. 1 root root 62M Jul 16 18:34 0000000000000465-0000000007779730.wal
-rw-------. 1 root root 62M Jul 16 18:34 0000000000000466-0000000007779b46.wal
-rw-------. 1 root root 62M Jul 16 18:34 0000000000000467-0000000007779c6d.wal
-rw-------. 1 root root 62M Jul 16 18:34 0000000000000468-0000000007779d1e.wal
-rw-------. 1 root root 62M Jul 16 18:34 0000000000000469-0000000007779dc0.wal
-rw-------. 1 root root 62M Jul 16 18:34 000000000000046a-0000000007779e66.wal
-rw-------. 1 root root 62M Jul 16 18:34 000000000000046b-0000000007779f19.wal
-rw-------. 1 root root 62M Jul 16 18:34 000000000000046c-000000000777a34b.wal
-rw-------. 1 root root 62M Jul 16 18:34 000000000000046d-000000000777a525.wal
-rw-------. 1 root root 62M Jul 16 18:34 000000000000046e-000000000777a5ca.wal
-rw-------. 1 root root 62M Jul 16 18:34 000000000000046f-000000000777a67c.wal
-rw-------. 1 root root 62M Jul 16 18:34 0000000000000470-000000000777a739.wal
-rw-------. 1 root root 62M Jul 16 18:35 0000000000000471-000000000777a7fc.wal
-rw-------. 1 root root 62M Jul 16 18:35 0000000000000472-000000000777ab9a.wal
-rw-------. 1 root root 62M Jul 16 18:35 0000000000000473-000000000777ac55.wal
-rw-------. 1 root root 62M Jul 16 18:35 0000000000000474-000000000777ad0d.wal
-rw-------. 1 root root 62M Jul 16 18:35 0000000000000475-000000000777adc8.wal
-rw-------. 1 root root 62M Jul 16 18:35 0000000000000476-000000000777ae7e.wal
-rw-------. 1 root root 62M Jul 16 18:35 0000000000000477-000000000777af33.wal
-rw-------. 1 root root 62M Jul 16 18:35 0000000000000478-000000000777b0bc.wal
-rw-------. 1 root root 62M Jul 16 18:35 0000000000000479-000000000777b178.wal
-rw-------. 1 root root 62M Jul 16 18:35 000000000000047a-000000000777b230.wal
-rw-------. 1 root root 62M Jul 16 18:35 000000000000047b-000000000777b2df.wal
-rw-------. 1 root root 62M Jul 16 18:36 000000000000047c-000000000777b3c2.wal
-rw-------. 1 root root 62M Jul 16 18:36 000000000000047d-000000000777b5e7.wal
-rw-------. 1 root root 62M Jul 16 18:36 000000000000047e-000000000777b742.wal
-rw-------. 1 root root 62M Jul 16 18:36 000000000000047f-000000000777b805.wal
-rw-------. 1 root root 62M Jul 16 18:36 0000000000000480-000000000777b8b3.wal
-rw-------. 1 root root 62M Jul 16 18:36 0000000000000481-000000000777b959.wal
-rw-------. 1 root root 62M Jul 16 18:37 0000000000000482-000000000777ba15.wal
-rw-------. 1 root root 62M Jul 16 18:37 0000000000000483-000000000777c078.wal
-rw-------. 1 root root 62M Jul 16 18:37 0000000000000484-000000000777c153.wal
-rw-------. 1 root root 62M Jul 16 18:37 0000000000000485-000000000777c221.wal
-rw-------. 1 root root 62M Jul 16 18:37 0000000000000486-000000000777c2bc.wal
-rw-------. 1 root root 62M Jul 16 18:37 0000000000000487-000000000777c385.wal
-rw-------. 1 root root 62M Jul 16 18:38 0000000000000488-000000000777c4d1.wal
-rw-------. 1 root root 62M Jul 16 18:38 0000000000000489-000000000777c8da.wal
-rw-------. 1 root root 62M Jul 16 18:38 000000000000048a-000000000777c978.wal
-rw-------. 1 root root 62M Jul 16 18:38 000000000000048b-000000000777ca2a.wal
-rw-------. 1 root root 62M Jul 16 18:38 000000000000048c-000000000777cacb.wal
-rw-------. 1 root root 62M Jul 16 18:39 000000000000048d-000000000777cbd0.wal
-rw-------. 1 root root 62M Jul 16 18:39 000000000000048e-000000000777d155.wal
-rw-------. 1 root root 62M Jul 16 18:39 000000000000048f-000000000777d205.wal
-rw-------. 1 root root 62M Jul 16 18:39 0000000000000490-000000000777d2a7.wal
-rw-------. 1 root root 62M Jul 16 18:39 0000000000000491-000000000777d35a.wal
-rw-------. 1 root root 62M Jul 16 18:40 0000000000000492-000000000777d405.wal
-rw-------. 1 root root 62M Jul 16 18:40 0000000000000493-000000000777d8f4.wal
-rw-------. 1 root root 62M Jul 16 18:40 0000000000000494-000000000777d9b3.wal
-rw-------. 1 root root 62M Jul 16 18:40 0000000000000495-000000000777da78.wal
-rw-------. 1 root root 62M Jul 16 18:40 0000000000000496-000000000777db16.wal
-rw-------. 1 root root 62M Jul 16 18:40 0000000000000497-000000000777dbc4.wal
-rw-------. 1 root root 62M Jul 16 18:40 0000000000000498-000000000777dc86.wal
-rw-------. 1 root root 62M Jul 16 18:40 0000000000000499-000000000777de1a.wal
-rw-------. 1 root root 62M Jul 16 18:40 000000000000049a-000000000777deef.wal
-rw-------. 1 root root 62M Jul 16 18:40 000000000000049b-000000000777dfb1.wal
-rw-------. 1 root root 62M Jul 16 18:40 000000000000049c-000000000777e054.wal
-rw-------. 1 root root 62M Jul 16 18:40 000000000000049d-000000000777e10d.wal
-rw-------. 1 root root 62M Jul 16 18:41 000000000000049e-000000000777e35b.wal
-rw-------. 1 root root 62M Jul 16 18:41 000000000000049f-000000000777e5c7.wal
-rw-------. 1 root root 62M Jul 16 18:41 00000000000004a0-000000000777e66c.wal
-rw-------. 1 root root 62M Jul 16 18:41 00000000000004a1-000000000777e720.wal
-rw-------. 1 root root 62M Jul 16 18:41 00000000000004a2-000000000777e7bd.wal
-rw-------. 1 root root 62M Jul 16 18:41 00000000000004a3-000000000777e87f.wal
-rw-------. 1 root root 62M Jul 16 18:42 00000000000004a4-000000000777eae2.wal
-rw-------. 1 root root 62M Jul 16 18:42 00000000000004a5-000000000777edff.wal
-rw-------. 1 root root 62M Jul 16 18:42 00000000000004a6-000000000777eeab.wal
-rw-------. 1 root root 62M Jul 16 18:42 00000000000004a7-000000000777ef48.wal
-rw-------. 1 root root 62M Jul 16 18:42 00000000000004a8-000000000777effa.wal
-rw-------. 1 root root 62M Jul 16 18:43 00000000000004a9-000000000777f0ab.wal
-rw-------. 1 root root 62M Jul 16 18:43 00000000000004aa-000000000777f6fc.wal
-rw-------. 1 root root 62M Jul 16 18:43 00000000000004ab-000000000777f7b6.wal
-rw-------. 1 root root 62M Jul 16 18:43 00000000000004ac-000000000777f85b.wal
-rw-------. 1 root root 62M Jul 16 18:43 00000000000004ad-000000000777f906.wal
-rw-------. 1 root root 62M Jul 16 18:43 00000000000004ae-000000000777f9b4.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004af-000000000777fb0e.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004b0-000000000777ff08.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004b1-000000000777ffa2.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004b2-0000000007780053.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004b3-00000000077800f9.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004b4-00000000077801af.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004b5-0000000007780560.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004b6-00000000077807f5.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004b7-000000000778089d.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004b8-000000000778094b.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004b9-0000000007780a08.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004ba-0000000007780ab7.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004bb-0000000007780c3f.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004bc-0000000007780ee2.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004bd-0000000007780f8b.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004be-0000000007781041.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004bf-0000000007781100.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004c0-00000000077811c0.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004c1-00000000077812f0.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004c2-00000000077813bb.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004c3-000000000778148a.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004c4-0000000007781522.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004c5-00000000077815ef.wal
-rw-------. 1 root root 62M Jul 16 18:46 00000000000004c6-000000000778178b.wal
-rw-------. 1 root root 62M Jul 16 18:46 00000000000004c7-00000000077819af.wal
-rw-------. 1 root root 62M Jul 16 18:46 00000000000004c8-0000000007781a58.wal
-rw-------. 1 root root 62M Jul 16 18:46 00000000000004c9-0000000007781b1c.wal
-rw-------. 1 root root 62M Jul 16 18:46 00000000000004ca-0000000007781bce.wal
-rw-------. 1 root root 62M Jul 16 18:47 00000000000004cb-0000000007781c84.wal
-rw-------. 1 root root 62M Jul 16 18:47 00000000000004cc-00000000077821c4.wal
-rw-------. 1 root root 62M Jul 16 18:47 00000000000004cd-0000000007782342.wal
-rw-------. 1 root root 62M Jul 16 18:47 00000000000004ce-000000000778240c.wal
-rw-------. 1 root root 62M Jul 16 18:47 00000000000004cf-000000000778249f.wal
-rw-------. 1 root root 62M Jul 16 18:47 00000000000004d0-000000000778254a.wal
-rw-------. 1 root root 62M Jul 16 18:48 00000000000004d1-0000000007782681.wal
-rw-------. 1 root root 62M Jul 16 18:48 00000000000004d2-0000000007782b85.wal
-rw-------. 1 root root 62M Jul 16 18:48 00000000000004d3-0000000007782c23.wal
-rw-------. 1 root root 62M Jul 16 18:48 00000000000004d4-0000000007782cc5.wal
-rw-------. 1 root root 62M Jul 16 18:48 00000000000004d5-0000000007782d76.wal
-rw-------. 1 root root 62M Jul 16 18:49 00000000000004d6-0000000007782e6c.wal
-rw-------. 1 root root 62M Jul 16 18:49 00000000000004d7-0000000007783455.wal
-rw-------. 1 root root 62M Jul 16 18:49 00000000000004d8-0000000007783505.wal
-rw-------. 1 root root 62M Jul 16 18:49 00000000000004d9-00000000077835d8.wal
-rw-------. 1 root root 62M Jul 16 18:49 00000000000004da-0000000007783678.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004db-0000000007783727.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004dc-0000000007783c0e.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004dd-0000000007783d0c.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004de-0000000007783da5.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004df-0000000007783e41.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004e0-0000000007783ee9.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004e1-0000000007783fdd.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004e2-0000000007784126.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004e3-0000000007784216.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004e4-00000000077842b5.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004e5-000000000778435a.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004e6-0000000007784425.wal
-rw-------. 1 root root 62M Jul 16 18:51 00000000000004e7-0000000007784594.wal
-rw-------. 1 root root 62M Jul 16 18:51 00000000000004e8-0000000007784909.wal
-rw-------. 1 root root 62M Jul 16 18:51 00000000000004e9-000000000778499e.wal
-rw-------. 1 root root 62M Jul 16 18:51 00000000000004ea-0000000007784a5e.wal
-rw-------. 1 root root 62M Jul 16 18:51 00000000000004eb-0000000007784b0c.wal
-rw-------. 1 root root 62M Jul 16 18:51 00000000000004ec-0000000007784bd1.wal
-rw-------. 1 root root 62M Jul 16 18:52 00000000000004ed-0000000007784cee.wal
-rw-------. 1 root root 62M Jul 16 18:52 00000000000004ee-0000000007785196.wal
-rw-------. 1 root root 62M Jul 16 18:52 00000000000004ef-000000000778524f.wal
-rw-------. 1 root root 62M Jul 16 18:52 00000000000004f0-0000000007785307.wal
-rw-------. 1 root root 62M Jul 16 18:52 00000000000004f1-000000000778539e.wal
-rw-------. 1 root root 62M Jul 16 18:52 00000000000004f2-0000000007785452.wal
-rw-------. 1 root root 62M Jul 16 18:53 00000000000004f3-00000000077858b4.wal
-rw-------. 1 root root 62M Jul 16 18:53 00000000000004f4-0000000007785bff.wal
-rw-------. 1 root root 62M Jul 16 18:53 00000000000004f5-0000000007785cb0.wal
-rw-------. 1 root root 62M Jul 16 18:53 00000000000004f6-0000000007785d6d.wal
-rw-------. 1 root root 62M Jul 16 18:53 00000000000004f7-0000000007785e17.wal
-rw-------. 1 root root 62M Jul 16 19:40 00000000000004f8-0000000007785ed3.wal

This is from an etcd node that had started with a blank disk and running for less than three hours, on a fairly small Kubernetes cluster (less than 800 pods across 20 nodes)

xiang90 · 2019-07-16T23:39:26Z

can you get the logs from etcd server

dharmab · 2019-07-17T00:41:16Z

Sure, @xiang90. The logs will likely be very large and we'll have to pull a large amount of data from Splunk- are there any particular lines or patterns you are interested in?

dharmab · 2019-07-17T15:19:21Z

I've pulled 3 hours of etcd logs from our Splunk. The logs start just before we begin maintenance on the 3 etcd nodes to change the max-wals setting. The file is a JSON array where each log line is a JSON object. The result._time field is the timestamp, the result.node_ip field is the peer IP, and the result.message field is the log line. The timeframe in the logs overlaps the timestamp in the above WAL listing.

etcd-issue-10885.tar.gz

xiang90 · 2019-07-18T00:03:44Z

Potential problems:

check what is written into etcd through k8s. do you have very large CRDs? It is not normal to write more than 200MB data into etcd within 1 minutes for a small k8s cluster
if you do write large items into etcd, then you also need to reduce the --snapshot-count. the in use wal entries cannot be cleaned up.
there are also a bunch of healthy check failures from etcd peers. you also need to fix that.

dharmab · 2019-08-07T22:54:53Z

We've implemented a few changes:

Reduced snapshot count back to pre-3.2 setting
Moved the WAL files to separate, large disks
Disabled some workload features which used very large CRDs

Thanks all for the advice! Closing this issue- will reopen if needed.

juliangomez-samana · 2021-04-29T13:28:43Z

Hello everyone. My team had the same issue. In our case we are using a Docker with ETCD and the excessive WAL files were not only consuming the assigned disk, but also creating a high memory consumption that was causing the Docker being to stop when it reached more than 4 Gb of memory.

We found that "ETCD was taking snapshots every 10,000 records by default, and the WAL files couldn’t be deleted". We identified that the Docker was not creating any snapshots. We decided to modify our ETCD Docker with the following values: "--max-snapshots=2 --max-wals=5 --enable-v2 -auto-compaction-retention 1 --snapshot-count 5000".

After these changes, the Docker started creating snapshots every 25 minutes and the WAL files are being deleted so the device is working with 2 snapshots and just 5 WALs. The disk is not being overloaded and the memory consumption goes from 1.2 Gb to 2.3 Gb maximum and then goes back to 1.2 Gb and as a result, the Docker is not getting stuck as the memory stays below 4 Gb.

I hope this comment is useful.

Regards,

Julian Gomez
Samana Group.

yguo0905 · 2023-06-23T18:17:43Z

The problem still exists. Could we reopen this issue?

@serathius

ahrtr · 2023-06-24T14:28:14Z

The problem still exists. Could we reopen this issue?

Could you raise a new issue and provide the following info?

etcd version
etcd logs

bsdnet · 2023-07-05T16:11:17Z

@yguo0905 @ahrtr

The etcd version is v3.4.13. What etcd logs we are looking for - client, server or?

thdrnsdk · 2023-11-09T03:47:40Z

same issue in our cluster

moonek · 2023-11-09T03:54:10Z

Since max-wals is not declared, the default value (5) is used, but the wal file exceeds 150 and the storage is full.
(etcd version: v3.5.9)

please reopen this issue.

jmhbnz · 2023-11-09T03:57:45Z

Hey @moonek, @thdrnsdk - We have a new bug template which makes sure we get information to ensure we can properly investigate bug reports.

Can one or both of you please raise a new issue using the Bug template and ensure all the requested fields are provided.

Many thanks!

bsdnet · 2023-12-31T16:56:34Z

Do we have new issue to track this?

tigertall · 2024-04-19T12:29:11Z

Since v3.2, the default value of --snapshot-count has changed from from 10,000 to 100,000.

If the snapshots are too infrequent, there can be more than --max-wals=5, as file-system level locks are protecting the files preventing them from being deleted too early.

maybe because this ? so change snapshot-count to a little value cause snapshot in little time, then wal file can clean more often 。

No5stranger · 2024-08-05T12:54:46Z

/mark

spzala added the type/question label Jul 11, 2019

dharmab closed this as completed Aug 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcd creates excessive number of WAL files #10885

etcd creates excessive number of WAL files #10885

dharmab commented Jul 9, 2019

xiang90 commented Jul 9, 2019 •

edited

Loading

dharmab commented Jul 9, 2019

dharmab commented Jul 16, 2019

xiang90 commented Jul 16, 2019 •

edited

Loading

dharmab commented Jul 17, 2019

dharmab commented Jul 17, 2019 •

edited

Loading

xiang90 commented Jul 18, 2019

dharmab commented Aug 7, 2019

juliangomez-samana commented Apr 29, 2021 •

edited

Loading

yguo0905 commented Jun 23, 2023

ahrtr commented Jun 24, 2023

bsdnet commented Jul 5, 2023

thdrnsdk commented Nov 9, 2023

moonek commented Nov 9, 2023

jmhbnz commented Nov 9, 2023

bsdnet commented Dec 31, 2023

tigertall commented Apr 19, 2024

No5stranger commented Aug 5, 2024

etcd creates excessive number of WAL files #10885

etcd creates excessive number of WAL files #10885

Comments

dharmab commented Jul 9, 2019

xiang90 commented Jul 9, 2019 • edited Loading

dharmab commented Jul 9, 2019

dharmab commented Jul 16, 2019

xiang90 commented Jul 16, 2019 • edited Loading

dharmab commented Jul 17, 2019

dharmab commented Jul 17, 2019 • edited Loading

xiang90 commented Jul 18, 2019

dharmab commented Aug 7, 2019

juliangomez-samana commented Apr 29, 2021 • edited Loading

yguo0905 commented Jun 23, 2023

ahrtr commented Jun 24, 2023

bsdnet commented Jul 5, 2023

thdrnsdk commented Nov 9, 2023

moonek commented Nov 9, 2023

jmhbnz commented Nov 9, 2023

bsdnet commented Dec 31, 2023

tigertall commented Apr 19, 2024

No5stranger commented Aug 5, 2024

xiang90 commented Jul 9, 2019 •

edited

Loading

xiang90 commented Jul 16, 2019 •

edited

Loading

dharmab commented Jul 17, 2019 •

edited

Loading

juliangomez-samana commented Apr 29, 2021 •

edited

Loading