weave-net doesn't update the `NetworkUnavailable` node status on GCE #3249

mikebryant · 2018-02-28T22:04:52Z

What you expected to happen?

For weave-net networking to work on GCE

What happened?

weave-net runs on a node, but the node has a NetworkUnavailable status so no pods are scheduled

How to reproduce it?

Anything else we need to know?

GCE, custom

Versions:

weave is 2.0.5

Logs:

kubectl describe node includes:
  NetworkUnavailable   True    Mon, 01 Jan 0001 00:00:00 +0000   Wed, 28 Feb 2018 21:49:54 +0000   NoRouteCreated               Node created without a route

The text was updated successfully, but these errors were encountered:

mikebryant · 2018-02-28T22:05:37Z

Presumably weave-net or weave-kube should update the node status once weave comes up and is working?

mikebryant · 2018-02-28T22:06:13Z

(Also related #2896)

bboreham · 2018-02-28T22:17:13Z

Was that the resolved "correct" way to do it? Is it documented anywhere?
There was a lot of discussion starting from kubernetes/kubernetes#33438 but none of those PRs got merged.

I would think Weave should signal to kubelet which updates the node status. Pretty sure there isn't a way to do that.

mikebryant · 2018-02-28T22:43:21Z

I'm looking in particular at comments like kubernetes/kubernetes#33573 (comment)

It looks to me like the current behaviour is that the gce controller sets this up and expects any network plugin in use to set it, which is why it doesn't work.

And currently either the network plugin needs to set it, or you disable the gce cloud provider (which then means volumes etc don't work, so not really an option)

I'm not confident what the right overall solution should be, but it does seem clear that the current behaviour doesn't work.

mikebryant · 2018-02-28T22:49:47Z

current code has a special case for gce in providerRequiresNetworkingConfiguration

https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_network.go#L150-L160

This then gets used here to unconditionally set this to false

https://github.com/kubernetes/kubernetes/blob/b1ef631e5971f649effd51905444c2eeee58e8b1/pkg/kubelet/kubelet_node_status.go#L257

So not sure if it's in the docs anywhere, but it does seem clear that networking plugins setting this status is expected behaviour in the current tree

mikebryant · 2018-02-28T22:59:32Z

In the interest of getting my cluster working, I might hack something together that I can run as a sidecar in the weave-net DaemonSet. It would be useful to validate if that's a useful approach, anyway. (If it does some basic healthchecking on weave-net, might be able to solve #2896 too)

Not sure if this is the correct long-term solution

mikebryant · 2018-03-01T16:45:04Z

Current WIP:
Add to ClusterRole:

- apiGroups:
  - ""
  resources:
  - nodes/status
  verbs:
  - patch
  - update

Add another container to DaemonSet:

      - name: network-availability
        command:
        - /bin/sh
        - -c
        - >
          set -e;
          KUBE_TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token);
          /usr/bin/curl
          -sS
          --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          -H "Authorization: Bearer $KUBE_TOKEN"
          -H "Accept: application/json"
          -H "Content-Type: application/strategic-merge-patch+json"
          -X PATCH
          -d '
            {
              "status":{
                "conditions": [
                  {
                    "type": "NetworkUnavailable",
                    "status": "False",
                    "reason": "WeaveIsUp",
                    "message": "Weave pod has set this"
                  }
                ]
              }
            }
          '
          https://  https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/api/v1/nodes/${HOSTNAME}/status;
          while true; do sleep 3600; done;
        env:
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        image: hub.docker.tech.lastmile.com/weaveworks/weave-kube:2.2.0
        resources:
          requests:
            cpu: 10m
            memory: 50Mi

Obvioiusly not doing actual healthchecking of weave etc, but testing this now to see if I can get a cluster up and running

primeroz · 2018-05-16T16:30:05Z

Is there any progress on this issue ? At he moment my workaround is to enable the "create route" in the cloud-controller-manager but is ugly at best.

bboreham · 2018-05-26T14:15:41Z

Nitpick: the precondition isn't quite "on GCE" - we create Kubernetes clusters on GCE for every integration test run - but "when the GCE cloudprovider is in use".

@mikebryant I took your WIP and slapped it into the Go code at #3307

This is needed to update the node NetworkUnavailable conditon see #3249

mikebryant added bug [component/kube] labels Feb 28, 2018

justinsb mentioned this issue May 25, 2018

AWS: network providers required to update NetworkUnavailable kubernetes/kubernetes#33573

Closed

bboreham mentioned this issue May 26, 2018

Set Kubernetes NodeNetworkUnavailable to false when starting #3307

Merged

brb added this to the 2.4 milestone Jun 11, 2018

brb closed this as completed in #3307 Jun 11, 2018

brb mentioned this issue Jun 17, 2018

weave-kube: Cannot patch nodes/status at the cluster scope #3332

Closed

bboreham added a commit that referenced this issue Jun 21, 2018

Add permissions to set node status

d513e68

This is needed to update the node NetworkUnavailable conditon see #3249

bboreham mentioned this issue Jun 21, 2018

Add permissions to set node status #3334

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

weave-net doesn't update the `NetworkUnavailable` node status on GCE #3249

weave-net doesn't update the `NetworkUnavailable` node status on GCE #3249

mikebryant commented Feb 28, 2018

mikebryant commented Feb 28, 2018

mikebryant commented Feb 28, 2018

bboreham commented Feb 28, 2018

mikebryant commented Feb 28, 2018

mikebryant commented Feb 28, 2018

mikebryant commented Feb 28, 2018

mikebryant commented Mar 1, 2018

primeroz commented May 16, 2018

bboreham commented May 26, 2018

weave-net doesn't update the NetworkUnavailable node status on GCE #3249

weave-net doesn't update the NetworkUnavailable node status on GCE #3249

Comments

mikebryant commented Feb 28, 2018

What you expected to happen?

What happened?

How to reproduce it?

Anything else we need to know?

Versions:

Logs:

mikebryant commented Feb 28, 2018

mikebryant commented Feb 28, 2018

bboreham commented Feb 28, 2018

mikebryant commented Feb 28, 2018

mikebryant commented Feb 28, 2018

mikebryant commented Feb 28, 2018

mikebryant commented Mar 1, 2018

primeroz commented May 16, 2018

bboreham commented May 26, 2018

weave-net doesn't update the `NetworkUnavailable` node status on GCE #3249

weave-net doesn't update the `NetworkUnavailable` node status on GCE #3249