Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Difficulties trying to reactivate fastdatapath when changing WEAVE_NO_FASTDP #4004

Open
arthurzenika opened this issue Apr 26, 2024 · 1 comment

Comments

@arthurzenika
Copy link

What you expected to happen?

At some point in the history of our kubernetes cluster the WEAVE_NO_FASTDP was set to true and sleeve is being used by default since. We'd like to switch back to default (fastdatapath and fallback to sleeve on certain conditions).

What happened?

│ INFO: 2024/04/26 08:46:52.444799 weave  2.8.1                                                                                                              │
│ FATA: 2024/04/26 08:46:52.816690 Existing bridge type "bridge" is different than requested "bridged_fastdp". Please do 'weave reset' and try again   

with a debug pod I tried to do the weave reset but it fails :

kubectl debug -it weave-net-w6wpv -n kube-system --image=weaveworks/weave-kube:2.8.1 -- /bin/sh
[snip]
/home/weave # WEAVE_DEBUG=1 ./weave --local reset
+ SCRIPT_VERSION=2.8.1
+ IMAGE_VERSION=latest
+ '[' 2.8.1 '=' unreleased ]
+ IMAGE_VERSION=2.8.1
+ IMAGE_VERSION=2.8.1
+ MIN_DOCKER_VERSION=1.10.0
+ DOCKERHUB_USER=weaveworks
+ BASE_EXEC_IMAGE=weaveworks/weaveexec
+ EXEC_IMAGE=weaveworks/weaveexec:2.8.1
+ WEAVEDB_IMAGE=weaveworks/weavedb:latest
+ BASE_IMAGE=weaveworks/weave
+ IMAGE=weaveworks/weave:2.8.1
+ echo 
+ cut -s -d: -f1
+ PROXY_HOST=
+ PROXY_HOST=127.0.0.1
+ DOCKER_CLIENT_HOST=
+ IP_REGEXP='[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}'
+ CIDR_REGEXP='[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/[0-9]{1,2}'
+ '[' --local '=' --local ]
+ shift 1
+ IS_LOCAL=1
+ '['  '=' --help ]
+ '[' -z 1 ]
+ RESTART_POLICY='--restart always'
+ CONTAINER_NAME=weave
+ PLUGIN_NAME=weaveworks/net-plugin
+ OLD_PLUGIN_CONTAINER_NAME=weaveplugin
+ CNI_PLUGIN_NAME=weave-plugin-2.8.1
+ CNI_PLUGIN_DIR=/opt/cni/bin
+ VOLUMES_LABEL=weavevolumes
+ VOLUMES_CONTAINER_NAME=weavevolumes-2.8.1
+ DB_CONTAINER_NAME=weavedb
+ DOCKER_BRIDGE=docker0
+ BRIDGE=weave
+ DATAPATH=datapath
+ CONTAINER_IFNAME=ethwe
+ BRIDGE_IFNAME=vethwe-bridge
+ DATAPATH_IFNAME=vethwe-datapath
+ PORT=6783
+ HTTP_ADDR=127.0.0.1:6784
+ STATUS_ADDR=127.0.0.1:6782
+ PROXY_PORT=12375
+ OLD_PROXY_CONTAINER_NAME=weaveproxy
+ PROC_PATH=/proc
+ COVERAGE_ARGS=
+ '[' -n  ]
+ id -u
+ '[' 0 '=' 0 ]
+ uname -s -r
+ sed -n -e 's|^\([^ ]*\) \([0-9][0-9]*\)\.\([0-9][0-9]*\).*|\1 \2 \3|p'
+ read sys maj min
+ '[' Linux '!=' Linux ]
+ '[' '(' 4 -eq 3 -a 19 -ge 8 ')' -o 4 -gt 3 ]
+ command_exists ip
+ command -v ip
+ '[' 1 -gt 0 ]
+ COMMAND=reset
+ shift 1
+ '[' 0 -eq 0 ]
+ res=0
+ '['  '=' --force ]
+ check_running weave
+ res=1
+ stop
+ util_op remove-plugin-network weave
+ command_exists weaveutil
+ command -v weaveutil
+ weaveutil remove-plugin-network weave
unable to connect to docker: Get "http://unix.sock/v1.21/version": dial unix /var/run/docker.sock: connect: no such file or directory
+ true
+ warn_if_stopping_proxy_in_env
+ proxy_addr
+ PROXY_ADDR=
+ util_op stop-container weave
+ echo 'Weave is not running (ignore on Kubernetes).'
Weave is not running (ignore on Kubernetes).
+ util_op stop-container weaveplugin
+ true
+ util_op stop-container weaveproxy
+ true
+ conntrack -D -p udp --dport 6783
+ true
+ util_op remove-container -f weave
+ true
+ util_op remove-container -f weaveproxy
+ true
+ util_op remove-container -f weaveplugin
+ true
+ protect_against_docker_hang
+ rm -f /run/docker/plugins/weave.sock /run/docker/plugins/weavemesh.sock
+ util_op list-containers weavevolumes
+ command_exists weaveutil
+ command -v weaveutil
+ weaveutil list-containers weavevolumes
unable to list containers: Get "http://unix.sock/v1.18/version": dial unix /var/run/docker.sock: connect: no such file or directory
+ VOLUME_CONTAINERS=

How to reproduce it?

  • deploy weavnet with WEAVE_NO_FASTDP = true
  • check with weavenet status connections that sleeve is used
  • edit the daemonset and remove the WEAVE_NO_FASTDP variable
  • check the pod in CrashLoopBack

Anything else we need to know?

Versions:

$ weave version


        Version: 2.8.1 (failed to check latest version - see logs; next check at 2024/04/26 11:21:25)

        Service: router
       Protocol: weave 1..2
           Name: 5a:63:d7:f6:d1:3e(k8s-mod46-worker-1)
     Encryption: disabled
  PeerDiscovery: enabled
        Targets: 4
    Connections: 4 (4 established)
          Peers: 5 (with 20 established connections)
 TrustedSubnets: none

        Service: ipam
         Status: ready
          Range: 10.42.0.0/16
  DefaultSubnet: 10.42.0.0/16

$ docker version

n/a containerd

$ uname -a

Linux k8s-mod46-worker-1 4.19.0-21-amd64 #1 SMP Debian 4.19.249-2 (2022-06-30) x86_64 Linux

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3+rke2r1", GitCommit:"aef86a93758dc3cb2c658dd9657ab4ad4afc21cb", GitTreeState:"clean", BuildDate:"2022-07-13T20:19:26Z", GoVersion:"go1.18.3b7", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3+rke2r1", GitCommit:"aef86a93758dc3cb2c658dd9657ab4ad4afc21cb", GitTreeState:"clean", BuildDate:"2022-07-13T20:19:26Z", GoVersion:"go1.18.3b7", Compiler:"gc", Platform:"linux/amd64"}


Logs:

$ docker logs weave

or, if using Kubernetes:

$ kubectl logs -n kube-system <weave-net-pod> weave

Network:

$ ip route
$ ip -4 -o addr
$ sudo iptables-save

Thanks in advance if anyone is reading this but I fully understand that weavenet is no longer maintained. But I wanted to at least document this problem in case it can help others facing something similar.

@arthurzenika
Copy link
Author

Some colleagues have worked on a work around which seems to enable switching to fastdp (with some downtime) :

  • modify the DaemonSet to remove the WEAVE_NO_FASTDP
  • first pod that restart goes into error status
  • on node : apt install openvswitch-switch openvswitch-common
  • on node : ip link delete vethwe-pcap
  • on node : ip link delete weave
  • this triggers all weave pods to restart and go into error mode
  • on node : reboot
  • when node reboots the pods start a new network with fastdp and networking resumes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant