-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pods are sometimes assigned to the incorrect IAM role #46
Comments
I guess it's possible that the pod tried to access its IAM role before the |
After investigating a little bit it seems that the precision on this race condition is quite tight (i.e. sub-second).
There are two edge cases I can think of here:
To fix both of these at once I can only think of adding a static delay of somewhere between 0 and 1 second. Exponential backoff isn't really useful to fix the second case. Hopefully it won't have too much of an impact on apps either, since they should be caching credentials. We could add this delay to replace (or augment) the existing exponential backoff in server.getRole. |
Thanks for looking into this. I guess adding the static delay is the best option for now. It would be great if that could be implemented. |
I don't think we'd want to add a static delay before getting the IAM role but maybe I misunderstand what you mean. There is another issue that getting the role on-demand (when a request for credentials comes in) triggers timeouts in some AWS clients. The issue from @smelchior here is specifically about fallback i.e. "Using fallback role for IP 10.233.109.12" so I think an exponential backoff wrapping the block that retrieves from the |
IndexInformer will probably fix this too, so maybe we should wait. |
I agree with Jerome - sleeps are not the best way. The best course of action at the moment is to refactor it to use indexed informer, but it will not magically fix this issue (if I understand it correctly). p.s. I'm sorry if I misunderstood the issue and this is completely off. |
Looking at the code for updating a pod (OnAdd/OnDelete) and the store logic: isn't the easiest fix here to extend the coverage of the store mutex so that it covers the whole of OnAdd/OnDelete? Now there are points in the process where Get would see an inconsistent state. |
What @ankon said is correct, I didn't even think of that. It will work unless the request for the role is received before the api update is sent. As he said, I suspect they are received almost in tandem right now, and the mutex covering more space might always allow the api server event to complete firing first. |
BTW looks like
|
Good catch! Although unlikely to be the root cause of this issue as it was opened before namespaces were added. I don't think the store mutex is going to cover everything e.g. pod making trying to get IAM credentials before it has been added to the store but it would be a good improvement already. The exponential backoff strategy is actually probably not the right answer either because it would delay the fallback even when fallback is desired. I am not sure the channel approach would solve the underlying issue either. I think the underlying issue is the same as #31. Prefetching credentials would solve the issue here as well, there was a PR originally but I rejected it because it could potentially DDOS the AWS API limit and create too many routines in kube2iam. We may want to review the approach and coming up with a gentler approach that prefetches credentials and refresh them. |
I'm pretty sure the channel approach will at least reduce the response latency. It is instantaneous "push" as soon as information is available instead of polling with exponential backoff. |
There is no polling with exponential backoff here, it's reading directly from a map which is instantaneous. I was proposing to add exponential backoff instead of the sleep proposed by @struz but as said above, I don't think that's the right strategy either. The only exponential backoff in the code is around this block for the entire |
One problem is that it will fall back to the default role even when a role is specified sometimes. Because the role is asked for before the map is finished updating the system just assumes it has no role and serves it the default one with no back off |
It uses a mutex to read and write from the map so I don't think that's correct:
|
Sorry you're right for the update case, there could be a race condition that a mutex around delete and add could solve. |
We had this problem in one of our production clusters just over a week ago- we noticed that some processes would go unhealthy periodically and through the logs noticed they were attempting to use credentials for a different role (and thus being denied access to the AWS resources they needed). We'd been fine for a few months but we crossed some kind of cluster load that meant stuff started breaking. I started working on moving code around to support broader locks but made enough changes (both to address this and to incorporate prefetching and other bits) it ended up easier to just start from scratch. To address this (the incorrect IAM role) issue we found a thread-safe cache inside kiam (which I'd like to think is something like the child of kube2iam 😄 ) is at https://github.com/uswitch/kiam if its helpful. |
Ah... I just added a lock to the update of pods and namespaces as originally pointed out by @ankon. That should be enough to fix the problem. @pingles I'd be interested to know how many pods you have running on your cluster with the prefetch of credentials and if you're hitting issues with AWS API limits. It's one of the things I'm wary of with this, especially when the daemon starts and need to fetch the credentials for all the pods (potentially on multiple nodes), as after a cursory glance at your code, it doesn't seem you have a rate limiter or backoff with exponential retries. |
@jtblin we're running 10s of nodes in each cluster (around 15-30 depending on load) with 6,600 pods running now, although a far smaller number would have role annotations (we don't attach default roles to stuff that doesn't need AWS access). We've not had any errors from AWS API limits as yet. As you say, the number of API calls grows as We were planning on wrapping the AWS client with a rate limiter/retry gateway but we needed something we could run asap and we didn't hit the problem so pushed it down the priority. |
Thanks for fixing this @jtblin! Can you tell already when there will be a new release that includes the fix? |
Hi Was this fixed? |
@sibrahim001982 The known causes for this situation have been fixed and integrated, yes. There have been a few explicit improvements to the way that Kube2iam handles the pod inventory, but there may be times when there is a very large spike in pods that would still result in some observed issues. If you are experiencing a specific issue I would recommend opening up a ticket and providing as much information as possible about the issue. Some things to do beforehand are:
|
In our cluster sometimes the pods that have a proper "iam.amazonaws.com/role" annotation do not receive their role when they start up. kube2iam returns the default role to them which in our case does not have any permissions. After some time of the application requests the credentials again, it gets the proper assignment.
Relevant log messages:
I am not really sure how to debug this further, it might be related to the issues described in #32
The text was updated successfully, but these errors were encountered: