Failed to parse Node PodCIDR #993

pierluigilenoci · 2021-07-23T15:55:55Z

Describe the bug

The network-costs pod produces thousands of logs like this:

I0723 15:44:10.014782       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:

How to solve the problem?

To Reproduce

apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    meta.helm.sh/release-name: cost-analyzer
    meta.helm.sh/release-namespace: [REDACTED]
  labels:
    app: cost-analyzer
    app.kubernetes.io/instance: cost-analyzer
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: cost-analyzer
    helm.sh/chart: cost-analyzer-1.83.2
  name: cost-analyzer-network-costs
  namespace: [REDACTED]
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: cost-analyzer-network-costs
  template:
    metadata:
      labels:
        app: cost-analyzer-network-costs
    spec:
      containers:
      - env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: HOST_PORT
          value: "3001"
        - name: TRAFFIC_LOGGING_ENABLED
          value: "false"
        - name: GODEBUG
          value: madvdontneed=1
        image: gcr.io/kubecost1/kubecost-network-costs:v15.4
        imagePullPolicy: IfNotPresent
        name: cost-analyzer-network-costs
        ports:
        - containerPort: 3001
          hostPort: 3001
          name: http-server
          protocol: TCP
        resources:
          requests:
            cpu: 50m
            memory: 20Mi
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /net
          name: nf-conntrack
        - mountPath: /netfilter
          name: netfilter
        - mountPath: /network-costs/config
          name: network-costs-config
      dnsPolicy: ClusterFirst
      hostNetwork: true
      priorityClassName: addon-priority
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: cost-analyzer
      serviceAccountName: cost-analyzer
      terminationGracePeriodSeconds: 30
      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoExecute
        operator: Exists
      - effect: NoSchedule
        operator: Exists
      - effect: PreferNoSchedule
        operator: Exists
      volumes:
      - configMap:
          defaultMode: 420
          name: network-costs-config
        name: network-costs-config
      - hostPath:
          path: /proc/net
          type: ""
        name: nf-conntrack
      - hostPath:
          path: /proc/sys/net/netfilter
          type: ""
        name: netfilter
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate

Expected behavior

Fewer error messages

Screenshots

Not relevant

Collect logs (please complete the following information):

Run helm ls and paste the output here:

helm ls -n k8s-kubecost
NAME         	NAMESPACE   	REVISION	UPDATED                                	STATUS  	CHART               	APP VERSION
cost-analyzer	[REDACTED]	17      	2021-07-23 15:19:27.083695032 +0000 UTC	deployed	cost-analyzer-1.83.2	1.83.2

If the pod is stuck in init, run kubectl logs <kubecost-cost-analyzer pod name> -n kubecost -c cost-analyzer-init and paste output here:

I0723 15:50:31.492116       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:51:21.425348       1 networktraffic.go:200] Removed 13 expired entries.
I0723 15:51:25.539630       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:51:35.610065       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:51:35.625338       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:51:41.751209       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:51:51.797186       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:52:05.802549       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:52:42.003523       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:52:56.163739       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:53:12.099341       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:53:22.146107       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:53:22.146878       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:53:36.462039       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:53:42.231947       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:54:12.316837       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0723 15:54:46.907980       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:

gz#595

The text was updated successfully, but these errors were encountered:

kbrwn · 2021-07-23T20:38:06Z

Hi @pierluigilenoci, thanks for this report. which cni/network plugin is in use by the cluster?

andronux · 2021-07-24T10:33:53Z

I'm experiencing the same issue and I'm running Calico on my EKS cluster

dwbrown2 · 2021-07-26T01:24:19Z

cc @mbolt35 in case he has not seen

mbolt35 · 2021-07-26T03:54:22Z

@pierluigilenoci @andronux v15.5 of the image was cut a while back for to address this noisy log - I'll make sure the helm chart is updated with this version.

Just for what it's worth, this is nothing more than a really verbose log - it occurs when a pod/node comes up and isn't immediately assigned an IP.

If you upgrade to v15.5 and this is still occurring, let me know!

pierluigilenoci · 2021-07-26T07:26:46Z

@kbrwn on AKS clusters we use Azure CNI [1] as network plugin and Calico as network policy configured using configurations managed by Azure.

kubectl get pod -n calico-system                                                                                                  NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-688cf4bc8b-lz65q   1/1     Running   0          16d
calico-node-7zvsl                          1/1     Running   0          17d
calico-node-hgfzs                          1/1     Running   0          17d
calico-node-m6zn9                          1/1     Running   0          16d
calico-node-vhqs2                          1/1     Running   0          17d
calico-node-zqztt                          1/1     Running   0          17d
calico-typha-95dddd9cf-9bm69               1/1     Running   0          17d
calico-typha-95dddd9cf-chccb               1/1     Running   0          16d
calico-typha-95dddd9cf-m5kqm               1/1     Running   0          17d

kubectl get pod -n tigera-operator
NAME                               READY   STATUS    RESTARTS   AGE
tigera-operator-5cc64b87bd-mfqhl   1/1     Running   6          17d

On EKS clusters we use Amazon VPC CNI plugin as network plugin and Calico as network policy. The network policy is using the managed configurations by AWS, Calico is deployed using the official helm chart of AWS. [3] The only change to the default configuration of the chart is the enabling of PodSecurityPolicy.

kubectl get pods -n kube-system
NAME                                                  READY   STATUS    RESTARTS   AGE
aws-node-bwtbc                                        1/1     Running   0          17d
aws-node-dbxb7                                        1/1     Running   0          17d
aws-node-npnqb                                        1/1     Running   0          17d
aws-node-scpx5                                        1/1     Running   0          17d
aws-node-wmpk5                                        1/1     Running   0          17d
calico-node-22scv                                     1/1     Running   0          63s
calico-node-4x2c9                                     1/1     Running   0          17d
calico-node-7pkjl                                     1/1     Running   0          63s
calico-node-c72dm                                     1/1     Running   0          63s
calico-node-nljm5                                     1/1     Running   0          63s
calico-typha-76cddff5d8-rjzkd                         1/1     Running   0          17d
calico-typha-horizontal-autoscaler-57f4c9d57d-8ptgg   1/1     Running   0          17d

We get the same notification from both sides, Azure and AWS.

[1] https://docs.microsoft.com/en-us/azure/aks/configure-azure-cni
[2] https://docs.aws.amazon.com/eks/latest/userguide/pod-networking.html
[3] https://github.com/aws/eks-charts/tree/master/stable/aws-calico

pierluigilenoci · 2021-07-26T11:52:11Z

If you upgrade to v15.5 and this is still occurring, let me know!

@mbolt35 I'm sorry but I have no good news.

I upgraded in AKS cluster and the logs are still there:

I0726 10:43:44.092459       1 conntrackwatcher.go:112] Initial Load: 2804 entries
I0726 10:43:47.897901       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 10:44:27.929924       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 10:44:37.944101       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 10:44:37.977465       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 10:44:44.422083       1 cachingnetworkmap.go:144] Removing Cached Pod: [REDACTED]/redis-cluster-5
I0726 10:44:47.965558       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 10:45:09.484462       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 10:45:27.977721       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 10:45:28.081084       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 10:45:28.684958       1 cachingnetworkmap.go:144] Removing Cached Pod: [REDACTED]/redis-cluster-4

The same in EKS:

I0726 11:07:27.329934       1 watchcontroller.go:202] Starting *v1.Service controller
I0726 11:07:27.330975       1 netroutes.go:61] +----------------------- Routing Table -----------------------------
I0726 11:07:27.330987       1 netroutes.go:63] | Destination: 10.241.0.0, Route: 0.0.0.0
I0726 11:07:27.330991       1 netroutes.go:63] | Destination: 10.241.11.226, Route: 0.0.0.0
I0726 11:07:27.330995       1 netroutes.go:63] | Destination: 10.241.13.24, Route: 0.0.0.0
I0726 11:07:27.330999       1 netroutes.go:63] | Destination: 10.241.16.22, Route: 0.0.0.0
I0726 11:07:27.331003       1 netroutes.go:63] | Destination: 10.241.31.65, Route: 0.0.0.0
I0726 11:07:27.331007       1 netroutes.go:63] | Destination: 169.254.169.254, Route: 0.0.0.0
I0726 11:07:27.331010       1 netroutes.go:63] | Destination: 0.0.0.0, Route: 10.241.0.1
I0726 11:07:27.331015       1 netroutes.go:65] +-------------------------------------------------------------------
I0726 11:07:32.354791       1 conntrackwatcher.go:112] Initial Load: 1036 entries
I0726 11:07:35.646003       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 11:07:47.960608       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 11:07:57.933567       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 11:08:12.544521       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 11:08:13.910989       1 cachingnetworkmap.go:144] Removing Cached Pod: kube-system/ebs-csi-node-j567n
I0726 11:08:20.191671       1 cachingnetworkmap.go:144] Removing Cached Pod: [REDACTED]/csi-secrets-store-provider-aws-v4g6v
I0726 11:12:05.392821       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 11:12:13.710993       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 11:12:37.114505       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 11:12:49.174769       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 11:12:59.214960       1 cachingnetworkmap.go:356] Warning: Failed to parse Node PodCIDR:  due to: invalid CIDR address:
I0726 11:13:20.188612       1 cachingnetworkmap.go:144] Removing Cached Pod: kube-system/ebs-csi-controller-68cf7bd986-zjp5d
I0726 11:13:21.044803       1 cachingnetworkmap.go:144] Removing Cached Pod: kube-system/ebs-csi-node-n4jsp

mbolt35 · 2021-07-26T16:31:05Z

@pierluigilenoci Ok, this would take care of the pod specific logs, but I'm curious, do your Node's not have the PodCIDR block populated? Seems odd, as I'm pretty sure this is part of the CNI contract.

mbolt35 · 2021-07-26T16:32:29Z

By the way, if it's continuing to spam logs, then that's a problem, but the pure existence of the log is ok, especially on the Node. I will reduce the severity, but I don't expect that log to be as noisy as the pod specific log.

pierluigilenoci · 2021-07-27T09:56:42Z

@mbolt35 our AKS and EKS clusters do not have podCIDR configured because it is supported only by kubenet and the two clouds have their own specific network plugins (Azure CNI [1] and Amazon VPC CNI [2]).

[1] https://docs.microsoft.com/en-us/azure/templates/microsoft.containerservice/managedclusters?tabs=json#containerservicenetworkprofile-object
[2] aws/containers-roadmap#315

mbolt35 · 2021-08-02T19:05:36Z

@pierluigilenoci Sigh, yeah, I've caught up a bit on those implementations - definitely an oversight on my end. This was added as just a secondary mechanism for identifying traffic, but isn't a hard requirement. Are you seeing any unusual classifications of network traffic?

I'll update the logging severity to avoid spamming. Thanks for the feedback!

mbolt35 · 2021-08-03T17:54:12Z

I've cut the network-costs image v15.6 with the log severity reduced. You should no longer be seeing this Warning.

kirbsauce · 2021-08-03T17:59:26Z

thanks @mbolt35 ! @pierluigilenoci , let us know when you've had a chance to confirm!

mbolt35 · 2021-08-03T19:28:40Z

Resolved in #1011

pierluigilenoci · 2021-08-12T14:35:24Z

@kirbsauce the warning messages disappeared with version v15.6.

kirbsauce · 2021-08-12T15:40:05Z

thanks @pierluigilenoci !

closing

pierluigilenoci added the bug Something isn't working label Jul 23, 2021

mbolt35 mentioned this issue Jul 26, 2021

Update Network Costs Image to v15.5 #996

Merged

kirbsauce closed this as completed Aug 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to parse Node PodCIDR #993

Failed to parse Node PodCIDR #993

pierluigilenoci commented Jul 23, 2021 •

edited by kbrwn

Loading

kbrwn commented Jul 23, 2021

andronux commented Jul 24, 2021

dwbrown2 commented Jul 26, 2021

mbolt35 commented Jul 26, 2021 •

edited

Loading

pierluigilenoci commented Jul 26, 2021

pierluigilenoci commented Jul 26, 2021

mbolt35 commented Jul 26, 2021

mbolt35 commented Jul 26, 2021

pierluigilenoci commented Jul 27, 2021

mbolt35 commented Aug 2, 2021

mbolt35 commented Aug 3, 2021

kirbsauce commented Aug 3, 2021

mbolt35 commented Aug 3, 2021

pierluigilenoci commented Aug 12, 2021

kirbsauce commented Aug 12, 2021

Failed to parse Node PodCIDR #993

Failed to parse Node PodCIDR #993

Comments

pierluigilenoci commented Jul 23, 2021 • edited by kbrwn Loading

kbrwn commented Jul 23, 2021

andronux commented Jul 24, 2021

dwbrown2 commented Jul 26, 2021

mbolt35 commented Jul 26, 2021 • edited Loading

pierluigilenoci commented Jul 26, 2021

pierluigilenoci commented Jul 26, 2021

mbolt35 commented Jul 26, 2021

mbolt35 commented Jul 26, 2021

pierluigilenoci commented Jul 27, 2021

mbolt35 commented Aug 2, 2021

mbolt35 commented Aug 3, 2021

kirbsauce commented Aug 3, 2021

mbolt35 commented Aug 3, 2021

pierluigilenoci commented Aug 12, 2021

kirbsauce commented Aug 12, 2021

pierluigilenoci commented Jul 23, 2021 •

edited by kbrwn

Loading

mbolt35 commented Jul 26, 2021 •

edited

Loading