-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Swarm's tasks.db takes up lots of disk space #2367
Comments
@joelchen can you provide more information about the services you're running? |
@nishanttotla Just had the same issue happen, the node was a worker running about 10 stacks making for 30+ services. The Docker process was using about 10GB of memory and I decided to restart it, I went with the following procedure:
Upon restarting I noticed two things:
After digging around I thought that maybe the task DB was probably corrupted and checked in It then proceeded to properly join the swarm and start the correct version of containers but now I have ghost tasks that are marked as being on that node but that node has no information about them. I saved a copy of the |
page 2282443: unreachable unfreed
1 errors found
invalid value
Aggregate statistics for 1 buckets
Page count statistics
Number of logical branch pages: 18
Number of physical branch overflow pages: 0
Number of logical leaf pages: 1257
Number of physical leaf overflow pages: 222
Tree statistics
Number of keys/value pairs: 11531
Number of levels in B+tree: 5
Page size utilization
Bytes allocated for physical branch pages: 73728
Bytes actually used for branch data: 34195 (46%)
Bytes allocated for physical leaf pages: 6057984
Bytes actually used for leaf data: 3324514 (54%)
Bucket statistics
Total number of buckets: 5321
Total number on inlined buckets: 4873 (91%)
Bytes used for inlined buckets: 1018794 (30%)
1 ==========
1 TYPE
19 branch
2276487 free
1 freelist
1259 leaf
2 meta
So it seems to be an issue with docker never compacting the database? |
@nishanttotla We are also seeing the same issue. /var/lib/docker/swarm/worker/task.db is of 5 GB. |
Any updates on this matter guys? I would like to recover some hard disk space on a small experimental docker swarm I'm running right now. |
Hi, Same for me ... can we just stop daemon / remove this file and restart? |
@Myself |
I believe I'm running into this issue as well. /var/lib/docker/swarm/worker/tasks.db has grown to 12GB and it seems that it's never going to stop. I'd rather not stop the daemon, delete the tasks.db file and the start the daemon again if possible. Is it possible to determine what's filing up this tasks database? Could the containers that I'm running be leaving stale tasks in the database? The bolt commands that @christopherobin used above are a bit of a mystery to me. Some Google searches have me thinking that this tasks.db file is a Bolt database. Docker version 17.06.0-ce, build 02c1d87 |
Just a note. Here's an even safer approach, but it requires some downtime. Even longer than just removing
This should do the trick in case your swarm has tasks scheduled, and you don't want to risk them, but can shut down a manager node for the time being (especially if it's redundant). Bolt databases cannot be shared between multiple processes (by design), so while the Docker daemon is alive there is no way of compacting it. |
Seeing this again. I've had updated to Docker version 18.03.1-ce, build 9ee9f40. |
same bug with docker 18.03.1-ce on ubuntu 16. |
Which kind of workloads you who have seen are running on Swarm? Only situation which I can image is that these must be some service which is constantly crashing and swarm is all the time scheduling to new container to be created (creating new tasks). That one can be easily tested by creating broken service with command like: That situation can be easily avoided by using Btw. You can fix this issue by leaving from swarm and joining again. That will clean up everything from /var/lib/docker/swarm/ |
Meet the same issue in AWS EC2 Swarm nodes. It continuously eating up disk space. For some long running tasks, it would be a disaster :(. |
@lybroman to be able to fix this we need first understand why only some users are seeing this one. Can you tell more which kind of workloads you have? (look my earlier message). |
@olljanat I am trying to investigate this possibility. BTW, is there any recommended tool to inspect the task.db file? That may help me figure out the potential issue. |
I was able to look inside of it using some general boltdb viewer but new records are only created there when new tasks are scheduled so you should be able to see much more useful data with Or if you like UI you can also use example Portainer to see those tasks. |
I face this issue too. Yes, swarm restarts some containers every few minutes for recurring tasks. It of course can be done other way but unfortunately as far as I know swarm still doesn't have convenient way to schedule recurring containers. |
I came here for that bug. Thanks @olljanat for the tip,
So I have a crashing service Another (brutally stupid) approach to the same problem is Another clue is with the crashing service, it gets a new text-guid each restart, so I look for those in the
but it looks like 2 ~ 4 records per restart, every 5~10 seconds.
|
I forgot version info. The Ubuntu might be a bit of a dog's dinner of partial upgrades.
|
I've also got this issue, unfortuantely on prod with a 16Gb tasks.db.
On Ubuntu 16.04 Xenial |
I've also got this issue, unfortuantely on prod with a 16Gb tasks.db.
I can't give production-grade advice on dealing with the symptoms, so my best offered help is
1. get a dev node
2. give it a few dozen services that fail persistently, and a couple that you want to keep up
3. wait an hour, so you have a mess to clear up
4. take the service down and back up the mess
5. ...try clearing up the mess various ways.
I don't know what the symptoms could be of a failed clearup.
It's possible that delving in the `tasks.db` would give clues to ways to tidy up without shutting down. I was not systematic in my examination, only looked for some keyword matches.
…--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
|
Hello! I have the same issue in one of envs:
|
Please, note that fix to bug was released as part of 19.03.9 version: https://docs.docker.com/engine/release-notes/#19039 @thaJeztah this issue can be closed. |
Thanks; yes, looks like fixed through #2938 |
Servers running a couple of Docker containers on Swarm has tasks.db using a few GBs of disk space. Docker containers are deployed globally on 2 servers, both Swarm managers, and their tasks.db gradually fill up 8 GBs of each server's disk space. Docker version is 17.06.1-ce.
What could be taking up such enormous space in tasks.db? How do I prevent tasks.db from growing?
The text was updated successfully, but these errors were encountered: