Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to parse Node PodCIDR #993

Closed
pierluigilenoci opened this issue Jul 23, 2021 · 15 comments
Closed

Failed to parse Node PodCIDR #993

pierluigilenoci opened this issue Jul 23, 2021 · 15 comments
Labels
bug Something isn't working

Comments

@pierluigilenoci
Copy link

pierluigilenoci commented Jul 23, 2021

Describe the bug

The network-costs pod produces thousands of logs like this:

I0723 15:44:10.014782       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address: 

How to solve the problem?

To Reproduce

apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    meta.helm.sh/release-name: cost-analyzer
    meta.helm.sh/release-namespace: [REDACTED]
  labels:
    app: cost-analyzer
    app.kubernetes.io/instance: cost-analyzer
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: cost-analyzer
    helm.sh/chart: cost-analyzer-1.83.2
  name: cost-analyzer-network-costs
  namespace: [REDACTED]
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: cost-analyzer-network-costs
  template:
    metadata:
      labels:
        app: cost-analyzer-network-costs
    spec:
      containers:
      - env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: HOST_PORT
          value: "3001"
        - name: TRAFFIC_LOGGING_ENABLED
          value: "false"
        - name: GODEBUG
          value: madvdontneed=1
        image: gcr.io/kubecost1/kubecost-network-costs:v15.4
        imagePullPolicy: IfNotPresent
        name: cost-analyzer-network-costs
        ports:
        - containerPort: 3001
          hostPort: 3001
          name: http-server
          protocol: TCP
        resources:
          requests:
            cpu: 50m
            memory: 20Mi
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /net
          name: nf-conntrack
        - mountPath: /netfilter
          name: netfilter
        - mountPath: /network-costs/config
          name: network-costs-config
      dnsPolicy: ClusterFirst
      hostNetwork: true
      priorityClassName: addon-priority
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: cost-analyzer
      serviceAccountName: cost-analyzer
      terminationGracePeriodSeconds: 30
      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoExecute
        operator: Exists
      - effect: NoSchedule
        operator: Exists
      - effect: PreferNoSchedule
        operator: Exists
      volumes:
      - configMap:
          defaultMode: 420
          name: network-costs-config
        name: network-costs-config
      - hostPath:
          path: /proc/net
          type: ""
        name: nf-conntrack
      - hostPath:
          path: /proc/sys/net/netfilter
          type: ""
        name: netfilter
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate

Expected behavior

Fewer error messages

Screenshots

Not relevant

Collect logs (please complete the following information):

  • Run helm ls and paste the output here:
helm ls -n k8s-kubecost
NAME         	NAMESPACE   	REVISION	UPDATED                                	STATUS  	CHART               	APP VERSION
cost-analyzer	[REDACTED]	17      	2021-07-23 15:19:27.083695032 +0000 UTC	deployed	cost-analyzer-1.83.2	1.83.2
  • If the pod is stuck in init, run kubectl logs <kubecost-cost-analyzer pod name> -n kubecost -c cost-analyzer-init and paste output here:
I0723 15:50:31.492116       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:51:21.425348       1 networktraffic.go:200] Removed 13 expired entries.
I0723 15:51:25.539630       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:51:35.610065       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:51:35.625338       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:51:41.751209       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:51:51.797186       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:52:05.802549       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:52:42.003523       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:52:56.163739       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:53:12.099341       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:53:22.146107       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:53:22.146878       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:53:36.462039       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:53:42.231947       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:54:12.316837       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:54:46.907980       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:

gz#595

@pierluigilenoci pierluigilenoci added the bug Something isn't working label Jul 23, 2021
@kbrwn
Copy link
Contributor

kbrwn commented Jul 23, 2021

Hi @pierluigilenoci, thanks for this report. which cni/network plugin is in use by the cluster?

@andronux
Copy link

I'm experiencing the same issue and I'm running Calico on my EKS cluster

@dwbrown2
Copy link
Contributor

cc @mbolt35 in case he has not seen

@mbolt35
Copy link
Contributor

mbolt35 commented Jul 26, 2021

@pierluigilenoci @andronux v15.5 of the image was cut a while back for to address this noisy log - I'll make sure the helm chart is updated with this version.

Just for what it's worth, this is nothing more than a really verbose log - it occurs when a pod/node comes up and isn't immediately assigned an IP.

If you upgrade to v15.5 and this is still occurring, let me know!

@pierluigilenoci
Copy link
Author

@kbrwn on AKS clusters we use Azure CNI [1] as network plugin and Calico as network policy configured using configurations managed by Azure.

kubectl get pod -n calico-system                                                                                                  NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-688cf4bc8b-lz65q   1/1     Running   0          16d
calico-node-7zvsl                          1/1     Running   0          17d
calico-node-hgfzs                          1/1     Running   0          17d
calico-node-m6zn9                          1/1     Running   0          16d
calico-node-vhqs2                          1/1     Running   0          17d
calico-node-zqztt                          1/1     Running   0          17d
calico-typha-95dddd9cf-9bm69               1/1     Running   0          17d
calico-typha-95dddd9cf-chccb               1/1     Running   0          16d
calico-typha-95dddd9cf-m5kqm               1/1     Running   0          17d
kubectl get pod -n tigera-operator
NAME                               READY   STATUS    RESTARTS   AGE
tigera-operator-5cc64b87bd-mfqhl   1/1     Running   6          17d

On EKS clusters we use Amazon VPC CNI plugin as network plugin and Calico as network policy. The network policy is using the managed configurations by AWS, Calico is deployed using the official helm chart of AWS. [3] The only change to the default configuration of the chart is the enabling of PodSecurityPolicy.

kubectl get pods -n kube-system
NAME                                                  READY   STATUS    RESTARTS   AGE
aws-node-bwtbc                                        1/1     Running   0          17d
aws-node-dbxb7                                        1/1     Running   0          17d
aws-node-npnqb                                        1/1     Running   0          17d
aws-node-scpx5                                        1/1     Running   0          17d
aws-node-wmpk5                                        1/1     Running   0          17d
calico-node-22scv                                     1/1     Running   0          63s
calico-node-4x2c9                                     1/1     Running   0          17d
calico-node-7pkjl                                     1/1     Running   0          63s
calico-node-c72dm                                     1/1     Running   0          63s
calico-node-nljm5                                     1/1     Running   0          63s
calico-typha-76cddff5d8-rjzkd                         1/1     Running   0          17d
calico-typha-horizontal-autoscaler-57f4c9d57d-8ptgg   1/1     Running   0          17d

We get the same notification from both sides, Azure and AWS.

[1] https://docs.microsoft.com/en-us/azure/aks/configure-azure-cni
[2] https://docs.aws.amazon.com/eks/latest/userguide/pod-networking.html
[3] https://github.com/aws/eks-charts/tree/master/stable/aws-calico

@pierluigilenoci
Copy link
Author

If you upgrade to v15.5 and this is still occurring, let me know!

@mbolt35 I'm sorry but I have no good news.

I upgraded in AKS cluster and the logs are still there:

I0726 10:43:44.092459       1 conntrackwatcher.go:112] Initial Load: 2804 entries
I0726 10:43:47.897901       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 10:44:27.929924       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 10:44:37.944101       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 10:44:37.977465       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 10:44:44.422083       1 cachingnetworkmap.go:144] Removing Cached Pod: [REDACTED]/redis-cluster-5
I0726 10:44:47.965558       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 10:45:09.484462       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 10:45:27.977721       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 10:45:28.081084       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 10:45:28.684958       1 cachingnetworkmap.go:144] Removing Cached Pod: [REDACTED]/redis-cluster-4

The same in EKS:

I0726 11:07:27.329934       1 watchcontroller.go:202] Starting *v1.Service controller
I0726 11:07:27.330975       1 netroutes.go:61] +----------------------- Routing Table -----------------------------
I0726 11:07:27.330987       1 netroutes.go:63] | Destination: 10.241.0.0, Route: 0.0.0.0
I0726 11:07:27.330991       1 netroutes.go:63] | Destination: 10.241.11.226, Route: 0.0.0.0
I0726 11:07:27.330995       1 netroutes.go:63] | Destination: 10.241.13.24, Route: 0.0.0.0
I0726 11:07:27.330999       1 netroutes.go:63] | Destination: 10.241.16.22, Route: 0.0.0.0
I0726 11:07:27.331003       1 netroutes.go:63] | Destination: 10.241.31.65, Route: 0.0.0.0
I0726 11:07:27.331007       1 netroutes.go:63] | Destination: 169.254.169.254, Route: 0.0.0.0
I0726 11:07:27.331010       1 netroutes.go:63] | Destination: 0.0.0.0, Route: 10.241.0.1
I0726 11:07:27.331015       1 netroutes.go:65] +-------------------------------------------------------------------
I0726 11:07:32.354791       1 conntrackwatcher.go:112] Initial Load: 1036 entries
I0726 11:07:35.646003       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 11:07:47.960608       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 11:07:57.933567       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 11:08:12.544521       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 11:08:13.910989       1 cachingnetworkmap.go:144] Removing Cached Pod: kube-system/ebs-csi-node-j567n
I0726 11:08:20.191671       1 cachingnetworkmap.go:144] Removing Cached Pod: [REDACTED]/csi-secrets-store-provider-aws-v4g6v
I0726 11:12:05.392821       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 11:12:13.710993       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 11:12:37.114505       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 11:12:49.174769       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 11:12:59.214960       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 11:13:20.188612       1 cachingnetworkmap.go:144] Removing Cached Pod: kube-system/ebs-csi-controller-68cf7bd986-zjp5d
I0726 11:13:21.044803       1 cachingnetworkmap.go:144] Removing Cached Pod: kube-system/ebs-csi-node-n4jsp

@mbolt35
Copy link
Contributor

mbolt35 commented Jul 26, 2021

@pierluigilenoci Ok, this would take care of the pod specific logs, but I'm curious, do your Node's not have the PodCIDR block populated? Seems odd, as I'm pretty sure this is part of the CNI contract.

@mbolt35
Copy link
Contributor

mbolt35 commented Jul 26, 2021

By the way, if it's continuing to spam logs, then that's a problem, but the pure existence of the log is ok, especially on the Node. I will reduce the severity, but I don't expect that log to be as noisy as the pod specific log.

@pierluigilenoci
Copy link
Author

@mbolt35 our AKS and EKS clusters do not have podCIDR configured because it is supported only by kubenet and the two clouds have their own specific network plugins (Azure CNI [1] and Amazon VPC CNI [2]).

[1] https://docs.microsoft.com/en-us/azure/templates/microsoft.containerservice/managedclusters?tabs=json#containerservicenetworkprofile-object
[2] aws/containers-roadmap#315

@mbolt35
Copy link
Contributor

mbolt35 commented Aug 2, 2021

@pierluigilenoci Sigh, yeah, I've caught up a bit on those implementations - definitely an oversight on my end. This was added as just a secondary mechanism for identifying traffic, but isn't a hard requirement. Are you seeing any unusual classifications of network traffic?

I'll update the logging severity to avoid spamming. Thanks for the feedback!

@mbolt35
Copy link
Contributor

mbolt35 commented Aug 3, 2021

I've cut the network-costs image v15.6 with the log severity reduced. You should no longer be seeing this Warning.

@kirbsauce
Copy link
Contributor

thanks @mbolt35 ! @pierluigilenoci , let us know when you've had a chance to confirm!

@mbolt35
Copy link
Contributor

mbolt35 commented Aug 3, 2021

Resolved in #1011

@pierluigilenoci
Copy link
Author

@kirbsauce the warning messages disappeared with version v15.6.

@kirbsauce
Copy link
Contributor

thanks @pierluigilenoci !

closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants