-
Notifications
You must be signed in to change notification settings - Fork 464
Cannot pull private AWS ECR image from controller node #620
Comments
Hi, I haven't seen the a problem with Daemonsets specifically, but when I did have issues with ~0.7 I found the journelctl logs on the nodes pretty informative. They normally said why the image couldn't be pulled, e.g. DNS or authentication. Does the image pull work if you manually specify your own ImagePullSecret? |
Good ideas, thanks. I checked the controller journelctl logs and found this:
At first it seems like it can't find the image. Obviously the fact that the DaemonSet manages to pull the exact same definition on the workers means this is not true, but I heard on the AWS forums that there are still a few discrepancies between the Docker Registry API and the AWS ECR implementation. Out of curiosity I checked one of the workers and strangely it has this log line as well but still manages to pull the image:
I also tried to set the ImagePullSecret as you mentioned. I logged into ECR locally, grabbed the auth out of This seems to indicate a specific problem with ECR auth for DaemonSets on the controller. |
Good news @c-knowles - a workaround at least. The ECR tokens cycle every 12 hours in every region, so you'll need to update it often. You can fetch the token from the API with If you would like to automate that you can use a script like this to update a Secret, either before each deployment or as a scheduled job. You can then use the Secret name with imagePullSecrets. |
Yeah, it's an ok workaround for now. I'd like to see if we can work out the root cause, should I report in the Kubernetes project as well? I wondered whether reporting in both is standard practice for people using kube-aws when it seems like a core issue like this. |
I think this issue is due to kube-aws setting the ECR permissions on the IAMWorker role, but not on the IAMController role. That's probably why setting imagePullSecrets manually worked for you. |
@ewang I've just tested that, seems like it's something more involved unfortunately. Shame as editing the stack-template.json before deploy would have been a better workaround. On my existing cluster, I first ensured the master could not pull the private image for the daemon set. Then I manually added the below to the
|
@c-knowles is your ec2 registry in the same region as your cluster? Otherwise the endpoint would still be accessible, but the Kubelet will be unable to fetch the credentials from the metadata service. If you checkout out the kubelet logs ( |
Afraid not, they are the same region (both |
@c-knowles: Adding the necessary argument to the argument list of the kubelet.service file on the controller and adding a policy to the IAM role of the controller worked for me. I'm running 0.71 or 0.80. Can't remember. Same problem as #518 |
I had to rebuild the cluster today with the newest version and I run into the same problem as c-knowles. |
Should I give that a go or are you saying that workaround no longer works on the latest version (0.8.1)? |
@c-knowles It started working after a couple of hours. I guess kubernetes tries fetching the credentials every couple of hours, irrespective of the fact that the previous attempt failed. TL;DR: Works with the added policy and the required argument (which is there in the latest version of coreos-kubernetes). So based on the default installation, you just have to add the IAM policy to the controller role and wait or restart the controller. |
Thanks, I've just tried that and looks like it works. The bit I had not tried before was waiting a little time, I expected it to just work. This time around I had already previously updated the |
Goes some way to resolve coreos/coreos-kubernetes#620
I'm going to close this issue as it seems like it has been resolved. Please let me know if it should be re-opened |
I'm not sure if this is a kube-aws issue or a Kubernetes issue so hopefully someone can shed some light on it. I'm using
kube-aws
0.8.0 and when launching a DaemonSet with a private AWS ECR image it seems that the controller node won't pull the image but all worker nodes will.The error I get in the pod logs is:
Steps to reproduce
I will try with the latest kube-aws version soon just to check whether it's still an issue.
Current workaround
None that I know of that will get the private image running on the controller. Currently I added a label to the workers and then I used
nodeSelector
in the DaemonSet to avoid launching it on the controller at all. In my case the daemon is monitoring related so I need it to run on the controller as well.The text was updated successfully, but these errors were encountered: