-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IP Address allocation fails #256
Comments
ye we recently observed the same, but with IPVLAN. I have no idea how is this possible after first glance, because this error is thrown by the kernel itself, not by DANM |
sry your issue might be a little bit different, as it indeed comes from IPAM not the kernel. our error is thrown when we try to allocate an already reserved IP to an IPVLAN slave, and kernel says it cannot do that |
so, our issue is different and I solved it in the other PR your issue I can't reproduce at all in my environment. I deploy a network with an allocation pool of 3 IPs, I create 3 deployments with one Pod each from the same file, pin all three on the same node, I ask a different static IP for all three Pods: I have 100% success rate please share the output of kubectl get danmeps command when the issue happens |
This issue is observed once in a while. I will take the o/p of the command you asked, when the issue is reproduced. |
@sriramec any news? reason why the DanmEp listing would be interesting cause I suspect we might be dealing with a "normal" case of synching issues. |
|
ye so basically confirmed what I explained above. the TL;DR version is that you should install DANM Cleaner in your environment, and you won't see this issue |
I installed the danm cleaner in the setup, where the issue is seen and redeployed the pods. but still seeing the same issue
|
Please find the logs of danm-cleaner pod.
danm-cleaner running in controller-1
danm-cleaner running in controller-0 kubectl logs -f danm-cleaner-s7zxz -n kube-system
|
@eMGabriel "2021/06/30 08:35:47 WARNING: Danmep '0023eea8-f0aa-49ff-8426-154c65d3e683' in namespace 'default' with network type 'sriov' could not be cleaned because of error: unable to release ipv4 IP because: no releaseIP Service selected" |
@eMGabriel I think the problem is that the Cleaner Service Farmework implementation assumes DanmEps always have a V4 address. so it 100% tries to find an implementation for both V4 and V6, and treats it is an error when the DANM API returns a "no" for an empty address |
@sriramec hopefully following PR addresses your issue: nokia/danm-utils#30 in case any issues, let's move the discussion under that PR! |
We observed one ip leak issue in one of our systems. To solve the ip leak, that I deployed cleaner pods.(with the fix suggested in the PR). Reinstalled the other pod, although ip leak is not seen. We see these messages and pod is not getting deployed. Warning FailedCreatePodSandBox 32s kubelet, controller-1 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "bb4e568f406cd516ec765d00eb5eb360159e6cf83a48722c9a7eedb959c4ff3c": CNI network could not be set up: CNI operation for network:sriov-oranu failed with:CNI delegation failed due to error:Error delegating ADD to CNI plugin:sriov because:OS exec call failed:netplugin failed with no error message Please let me know if further information is required |
CNI operation for network:sriov-o1c-host1 failed with:CNI delegation failed due to error:Error delegating ADD to CNI plugin:sriov because:OS exec call failed:netplugin failed with no error message that part is not thrown by DANM. you need to look for why the invoked CNI plugin returns error to us |
One of the pods which uses sriov interfaces configured using DANM meta CNI is stuck in container creating state
If I do describe of that pod I m seeing like below,
There is ip leak for two Cluster Networks sriov-e1c and sriov-o1c-host0. I have deployed the danm cleaner pods. In kube-system namespace.
I have deployed danm-cleaner using this manifest yaml Let me know where could be the problem of ip leak. I didnt see these two ip's(2001:4000:aa:34::1 & 2005:11:6:1::1) getting used in any other pods. One more question, I have not deployed danm-cleaner for calico(https://github.com/nokia/danm-utils/blob/master/integration/manifests/cleaner/cleaner-for-calico.yaml). I am attaching the o/p of danmeps
Let me know if any information is required ? |
Is this a BUG REPORT or FEATURE REQUEST?:
bug
What happened:
Deployed a SRIOV-Network with 3 ips and attached ip's statically from that network to every pod. Sometimes I have observed an issue where one of the pod doesn't come up. (error is : ip address is already in use)
What you expected to happen:
All the three pods should come up
How to reproduce it:
Deploy the sriov network with 3 ips.
Deploy three pods.
one of the pod doesn't come up. It says that "IP address is already in use"
Anything else we need to know?:
I have not deployed any pods in any other namespace which is using this ip "10.208.122.88". This issue is seen sometimes.
Environment:
DANM version (use
danm -version
):controller-1:~# /usr/libexec/cni/danm --version
2021/05/20 07:22:22 DANM binary was built from release: v4.2.1
2021/05/20 07:22:22 DANM binary was built from commit: abd3c48d_dirty
Kubernetes version (use
kubectl version
):controller-1:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.1", GitCommit:"7879fc12a63337efff607952a323df90cdc7a335", GitTreeState:"archive", BuildDate:"2020-08-05T05:08:32Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.1", GitCommit:"7879fc12a63337efff607952a323df90cdc7a335", GitTreeState:"clean", BuildDate:"2020-04-08T17:30:47Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
DANM configuration (K8s manifests, kubeconfig files, CNI config file):
Kernel (e.g.
uname -a
):controller-1:~# uname -a
Linux controller-1 4.18.0-147.3.1.rt24.96.el8_1.tis.8.x86_64 Correcting the LICENSE link #1 SMP PREEMPT RT Wed Aug 5 06:21:07 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Others:
The text was updated successfully, but these errors were encountered: