-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tasks stuck in PENDING with containers stuck in Created state #306
Comments
Hi @trharris78, thanks for providing logs. Your Docker logs show a panic that looks related to moby/moby#18481. The Agent logs show lines like this:
The Agent logs are consistent with what I'd expect from that Docker issue: the Agent is attempting to talk to the Docker daemon but it's not being responsive. Can you provide the output of |
I'm unable to provide the requested additional logs because we rearchitected our system to use services instead of tasks. Perhaps in the future we will revisit this. |
docker: add '/etc/alternatives' to mounts
I'm using RunTask to schedule hundreds of tasks on a small ECS cluster. The tasks are short-lived, maybe a minute or two at most. Tasks that I can't schedule with a RunTask call are left on a queue and I try the RunTask call again later. This seems to work for awhile, but eventually the cluster gets bogged down and tasks get stuck in the PENDING state. Running
docker ps -a
shows tons of containers in the Created state and they seem to be stuck indefinitely.I'm also using DescribeTasks to check on task states, and I'm getting DockerTimeoutError and CannotInspectContainerError fairly often. These start to happen at about the same time the containers start to get stuck in the Created state. Also,
docker ps -a
becomes unresponsive, sometimes sitting there for 10 minutes or more before I hit Ctrl+C.I originally thought that the Docker daemon was getting overwhelmed with hundreds of exited containers, so I built the amazon-ecs-agent dev branch to try the new ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION variable. Containers now get cleaned up after a few minutes, but the PENDING problem persists.
I also tried updating to Docker 1.10.0 but that didn't help, either.
I'm using the ECS-optimized AMI in us-east-1, ami-cb2305a1, which includes Docker 1.9.1. I see similar issues #296, #300, and #305 that might be related.
Docker and ecs-agent logs are attached.
logs.zip
The text was updated successfully, but these errors were encountered: