Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

weave-net doesn't update the NetworkUnavailable node status on GCE #3249

Closed
mikebryant opened this issue Feb 28, 2018 · 9 comments
Closed

weave-net doesn't update the NetworkUnavailable node status on GCE #3249

mikebryant opened this issue Feb 28, 2018 · 9 comments

Comments

@mikebryant
Copy link
Collaborator

What you expected to happen?

For weave-net networking to work on GCE

What happened?

weave-net runs on a node, but the node has a NetworkUnavailable status so no pods are scheduled

How to reproduce it?

Anything else we need to know?

GCE, custom

See also:

Versions:

weave is 2.0.5

Logs:

kubectl describe node includes:
  NetworkUnavailable   True    Mon, 01 Jan 0001 00:00:00 +0000   Wed, 28 Feb 2018 21:49:54 +0000   NoRouteCreated               Node created without a route
@mikebryant
Copy link
Collaborator Author

Presumably weave-net or weave-kube should update the node status once weave comes up and is working?

@mikebryant
Copy link
Collaborator Author

(Also related #2896)

@bboreham
Copy link
Contributor

Was that the resolved "correct" way to do it? Is it documented anywhere?
There was a lot of discussion starting from kubernetes/kubernetes#33438 but none of those PRs got merged.

I would think Weave should signal to kubelet which updates the node status. Pretty sure there isn't a way to do that.

@mikebryant
Copy link
Collaborator Author

I'm looking in particular at comments like kubernetes/kubernetes#33573 (comment)

It looks to me like the current behaviour is that the gce controller sets this up and expects any network plugin in use to set it, which is why it doesn't work.

And currently either the network plugin needs to set it, or you disable the gce cloud provider (which then means volumes etc don't work, so not really an option)

I'm not confident what the right overall solution should be, but it does seem clear that the current behaviour doesn't work.

@mikebryant
Copy link
Collaborator Author

current code has a special case for gce in providerRequiresNetworkingConfiguration

https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_network.go#L150-L160

This then gets used here to unconditionally set this to false

https://github.com/kubernetes/kubernetes/blob/b1ef631e5971f649effd51905444c2eeee58e8b1/pkg/kubelet/kubelet_node_status.go#L257

So not sure if it's in the docs anywhere, but it does seem clear that networking plugins setting this status is expected behaviour in the current tree

@mikebryant
Copy link
Collaborator Author

In the interest of getting my cluster working, I might hack something together that I can run as a sidecar in the weave-net DaemonSet. It would be useful to validate if that's a useful approach, anyway. (If it does some basic healthchecking on weave-net, might be able to solve #2896 too)

Not sure if this is the correct long-term solution

@mikebryant
Copy link
Collaborator Author

Current WIP:
Add to ClusterRole:

- apiGroups:
  - ""
  resources:
  - nodes/status
  verbs:
  - patch
  - update

Add another container to DaemonSet:

      - name: network-availability
        command:
        - /bin/sh
        - -c
        - >
          set -e;
          KUBE_TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token);
          /usr/bin/curl
          -sS
          --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          -H "Authorization: Bearer $KUBE_TOKEN"
          -H "Accept: application/json"
          -H "Content-Type: application/strategic-merge-patch+json"
          -X PATCH
          -d '
            {
              "status":{
                "conditions": [
                  {
                    "type": "NetworkUnavailable",
                    "status": "False",
                    "reason": "WeaveIsUp",
                    "message": "Weave pod has set this"
                  }
                ]
              }
            }
          '
          https://  https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/api/v1/nodes/${HOSTNAME}/status;
          while true; do sleep 3600; done;
        env:
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        image: hub.docker.tech.lastmile.com/weaveworks/weave-kube:2.2.0
        resources:
          requests:
            cpu: 10m
            memory: 50Mi

Obvioiusly not doing actual healthchecking of weave etc, but testing this now to see if I can get a cluster up and running

@primeroz
Copy link

Is there any progress on this issue ? At he moment my workaround is to enable the "create route" in the cloud-controller-manager but is ugly at best.

@bboreham
Copy link
Contributor

Nitpick: the precondition isn't quite "on GCE" - we create Kubernetes clusters on GCE for every integration test run - but "when the GCE cloudprovider is in use".

@mikebryant I took your WIP and slapped it into the Go code at #3307

@brb brb added this to the 2.4 milestone Jun 11, 2018
@brb brb closed this as completed in #3307 Jun 11, 2018
bboreham added a commit that referenced this issue Jun 21, 2018
This is needed to update the node NetworkUnavailable conditon
see #3249
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants