-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vpc.amazonaws.com/pod-ips annotation unset by aws-node for statefulset pods #2281
Comments
@sahilbadla thank you for the thorough debugging! On CNI ADD, we patch I was under the impression that kubelet ordered ADDs and DELs for the same pod name such that the DEL would always be delivered first, but your findings contradict that. Essentially, our decision to patch is not granular enough. In order to fix this, I think we should make sure on DEL that the pod IP we are deallocating matches the annotation we are deleting (or something similar). This is a much simpler and lighter-weight solution than adding any caching. I will mark this as a good first issue and add it to our internal backlog. If you are interesting and willing in implementing the fix and testing, please let me know! Otherwise, no worries, we will find someone. |
@jdn5126 Looks like there's more to it. Even if i start fresh, i see this problem on my statefulset pods. The annotation As you can see in the logs, annotation is set at 2:29:29:01 and unset at 2:31:49:655. This behavior is pretty consistent as i tried multiple times. I do the see the log message when the annotation gets set "Annotates pod dev-druid-usw2-qal with vpc.amazonaws.com/pod-ips: 10.195.15.31" but nothing when it gets unset. I feel it has to do something with these log statements from data_store.go
Please help us answer these two questions: 1) "why the annotation gets unset and never reconciled" 2) "where does it get unset and how can i find the logs" |
Well that is very strange, as the only way that annotation gets set is via the patch API call that I linked to. Is it possible that you are running a custom webhook in your cluster that rejects or reverts this? Otherwise, there has been some refactoring in this area, so would you be willing to try reproducing this with the latest VPC CNI (currently v1.12.5)? I have not been able to reproduce this, but I still think there is an issue when pod DELs and ADDs for the same pod name are ordered with DELs last. That issue we are planning to fix, but this new information is quite confusing. If it is not reproducible with v1.12.5, that would imply that the refactoring done helped resolve this, but that refactoring was done to prevent unnecessary retries, and I do not see any retries here (otherwise we would see the relevant logs). |
@jdn5126 I am able to reproduce it with v1.12.5 as well. |
Thank you, we will work on the fix soon and target it for a future release |
@jdn5126 Thanks for getting the ball rolling on this. Will there be a release anytime soon to pickup this change |
@sahilbadla no release date planned currently, I will have to check on this before making any promises |
Closing as fixed by #2328 |
|
What happened:
I am running a druid statefulset with 2 replicas that are spawned with the same name always historicals-0 & historicals-1. I've had issues while testing it where the network calls to another service in the same namespace would fail during graceful shutdown using a prestop hook. Example errors:
java.util.concurrent.ExecutionException: java.net.UnknownHostException: druid-exporter.dev-druid-usw2-qal.svc.cluster.local
Graceful shutdown time is being honored but it felt like the pod network was getting removed before the prestop hook had finished. Upon digging further into aws-node logs I found that my pods have the
vpc.amazonaws.com/pod-ips
annotation empty although in the AWS CNI config we have it set to true.That lead to further investigation why the annotation in missing in my statefulset pods while other replicaset pods had the annotation set. I looked at k8s audit logs and realized the annotation is set by aws-node(via patch) when the pod is spawned but within a couple of seconds the annotation gets unset by aws-node.
patch call where annotation is set:
Patch call where annotation is unset:
Upon digging deeper into cni logs to see why annotation is unset, I was able to find some delete network request logs around the same time.
Now looking at this rpc_handler code https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/ipamd/rpc_handler.go#L276, I know the logic that removed the annotation but the question "why delete network" remains. Then i closely looked at the containerID in cni logs that points to b92a369fa0a0390676d31bdb9e7e79bff66ed823309c4286ccc4465beafadd8e that doesnt exist anymore. It was the containerID of the previous historicals-1 pod that i had deleted. New containerID for historicals-1 pod is 6d0e8a780f636f40438eee7e1371a195f14980428814e1ecffb82b40a1d00d2d(as seen in the audit log below):
Looks like there's a caching/sync problem that happens with statefulset pods that are recreated quickly or with the same name. For some reason the cni keeps referring to the old containerID that got deleted and the new pod spec never gets annotated with
vpc.amazonaws.com/pod-ips
set to its ip. That's the reason why I see failures in prestop hook because with this annotation set to empty, cni removes the network from this pod as soon as it enters terminating state. This is making our pipelines unstable and causing problems in our druid workloads. Please help with the same.What you expected to happen:
Stateful set pods should be recreated with the right pod annotations to work properly. Prestop hook should work with network access until the script succeeds/returns.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
): EKS v1.22.16-eks-ffeb93dcat /etc/os-release
):NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
uname -a
):Linux ip-xx 5.4.228-131.415.amzn2.x86_64 Initial commit of amazon-vpc-cni-k8s #1 SMP Tue Dec 20 12:51:02 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: