[Docker 1.7.1] Task stuck at PENDING #300

shibai · 2016-02-01T20:44:34Z

Hi there,

I am running DynamoDB Cross-Region Replication from here, but with the changes you provided here in which it uses ecs-init with Docker 1.7.1. The problem appears after it runs for about 3 days.

The problem is
#1 one of the EC2 instance crashes(or stops), it shuts down the task running on it, but it doesn't de-register itself from ECS.
#2 ECS starts a new task on that failing instance
#3 Autoscaling terminates that old instance and launches a new one. The new one registers.

In step2, the task is always on PENDING

This issue also happens when I manually turn one of the instance into STOP in EC2 console.
Thanks,
Shibai

davidkelley · 2016-02-02T08:44:47Z

We're also seeing this problem frequently. Running around ~200 tasks there
are always about 10-20 tasks stuck on PENDING due to the same problem as
described above.

On Monday, 1 February 2016, shibai notifications@github.com wrote:

Hi there,

I am running DynamoDB Cross-Region Replication from here
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.CrossRegionRepl.Walkthrough.Step2.html,
but with the changes you provided here
#277 in which it uses
ecs-init with Docker 1.7.1. The problem appears after it runs for about 3
days.

The problem is
#1 #1 one of the EC2
instance crashes(or stops), it shuts down the task running on it, but it
doesn't de-register itself from ECS.
#2 #2 ECS starts a new
task on that failing instance
#3 https://github.com/aws/amazon-ecs-agent/issues/3 Autoscaling
terminates that old instance and launches a new one. The new one registers.

In step2, the task is always on PENDING
[image: 1]
https://cloud.githubusercontent.com/assets/2662202/12730455/ec04e444-c8e0-11e5-879c-893b313a637e.png
[image: 2]
https://cloud.githubusercontent.com/assets/2662202/12730457/ec0d792e-c8e0-11e5-95b1-303253573246.png
[image: 3]
https://cloud.githubusercontent.com/assets/2662202/12730458/ec15fc5c-c8e0-11e5-84c9-6f4e96f091b8.png
[image: 4]
https://cloud.githubusercontent.com/assets/2662202/12730456/ec08070a-c8e0-11e5-8f11-745f665cb8fb.png
[image: 5]
https://cloud.githubusercontent.com/assets/2662202/12730598/808a1af8-c8e1-11e5-9f1d-94cddae0d289.png

This issue also happens when I manually turn one of the instance into STOP
in EC2 console.
Thanks,
Shibai

—
Reply to this email directly or view it on GitHub
#300.

ghost · 2016-02-04T15:20:48Z

Running only seven tasks on a t2.small container instance, I am seeing this on nearly every attempted deployment / service update. At the same time "docker ps" is taking nearly forever to complete:

real 1m21.857s
user 0m0.024s
sys 0m0.000s

There is an issue for Docker 1.9 that may be related here. It seems the agent's frequent API calls could be causing this issue? We are running ami-9886a0f2 on the container instance with Docker 1.9.1

samuelkarp · 2016-02-18T01:20:01Z

@davidkelley @grantatsyncbak Are you seeing this with Docker 1.7.1 (like @shibai) or with Docker 1.9.1?

gkeiser · 2016-02-18T14:22:15Z

We are using ami-9886a0f2 with Docker 1.9.1.

gkeiser · 2016-02-18T14:26:27Z

This has been the case for some time, and docker is unresponsive, will not even stop : "sudo service docker stop"

The latest log entries:

D, Known Sent: NONE"
2016-02-16T10:55:52Z [INFO] Sending container change module="eventhandler" event="ContainerChange: arn:aws:ecs:us-east-1:501027711207:task/185eaaf6-7d17-4d45-9fb7-fe0ddbb0b06a ImageServiceDev -> STOPPED, Reason CannotPullContainerError: dial unix /var/run/docker.sock: too many open files, Known Sent: NONE" change="ContainerChange: arn:aws:ecs:us-east-1:501027711207:task/185eaaf6-7d17-4d45-9fb7-fe0ddbb0b06a ImageServiceDev -> STOPPED, Reason CannotPullContainerError: dial unix /var/run/docker.sock: too many open files, Known Sent: NONE"
2016-02-16T10:55:52Z [INFO] Saving state! module="statemanager"
2016-02-16T10:55:52Z [ERROR] Error saving state; could not create temp file to save state module="statemanager" err="open /data/tmp_ecs_agent_data840630121: too many open files"

samuelkarp · 2016-02-19T22:43:40Z

@gkeiser Can you open a separate issue? (Or maybe what you're seeing is related to #313?) I'd like to keep this issue focused on @shibai's issue with Docker 1.7.1.

aaithal · 2016-05-02T23:36:51Z

@shibai Please let us know if you are still seeing this issue on the latest AMI.

ghost · 2016-06-13T18:43:55Z

@shibai Thank you for reporting this. We are aware of an issue where the ECS scheduler may place tasks on instances that are in the process of stopping. We are investigating this issue and will provide an update as soon as we have more to share.

jawang35 · 2016-12-07T20:32:16Z

@MarcelvR Is there any news on this? I am experiencing the same problem with my task instances in an unstable state of going up and down.

milla · 2017-06-09T08:15:41Z

I am experiencing the same problem

zaakiy · 2017-07-24T15:06:39Z

I am also experiencing this same problem on ECS. It has been working fine for 2 months, but now experiencing the issue exactly as described above.
Agent version 1.14.3
Docker version 17.03.1-ce

Update: I suspect it was a problem with the ECS Container Host. I launched another container host into the cluster and it seemed to handle it fine (i.e., no long delays in PENDING state). Terminated the old host, and now I don't seem to be having problems.

adnxn · 2017-11-28T21:33:35Z

@shibai This original issue appears to be related to reaping zombie tasks on unresponsive instances. I'm closing this in favor of #1115

trharris78 mentioned this issue Feb 8, 2016

Tasks stuck in PENDING with containers stuck in Created state #306

Closed

samuelkarp changed the title ~~Task stuck at PENDING~~ [Docker 1.7.1] Task stuck at PENDING Feb 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docker 1.7.1] Task stuck at PENDING #300

[Docker 1.7.1] Task stuck at PENDING #300

shibai commented Feb 1, 2016

davidkelley commented Feb 2, 2016

ghost commented Feb 4, 2016

samuelkarp commented Feb 18, 2016

gkeiser commented Feb 18, 2016

gkeiser commented Feb 18, 2016

samuelkarp commented Feb 19, 2016

aaithal commented May 2, 2016

ghost commented Jun 13, 2016

jawang35 commented Dec 7, 2016

milla commented Jun 9, 2017

zaakiy commented Jul 24, 2017 •

edited

Loading

adnxn commented Nov 28, 2017

[Docker 1.7.1] Task stuck at PENDING #300

[Docker 1.7.1] Task stuck at PENDING #300

Comments

shibai commented Feb 1, 2016

davidkelley commented Feb 2, 2016

ghost commented Feb 4, 2016

samuelkarp commented Feb 18, 2016

gkeiser commented Feb 18, 2016

gkeiser commented Feb 18, 2016

samuelkarp commented Feb 19, 2016

aaithal commented May 2, 2016

ghost commented Jun 13, 2016

jawang35 commented Dec 7, 2016

milla commented Jun 9, 2017

zaakiy commented Jul 24, 2017 • edited Loading

adnxn commented Nov 28, 2017

zaakiy commented Jul 24, 2017 •

edited

Loading