Skip to content
This repository has been archived by the owner on Dec 7, 2023. It is now read-only.

extremely slow vm networking? #827

Open
lukemarsden opened this issue Apr 27, 2021 · 19 comments
Open

extremely slow vm networking? #827

lukemarsden opened this issue Apr 27, 2021 · 19 comments

Comments

@lukemarsden
Copy link

Has anyone seen VM networking being very slow and flaky? This is with ignite 0.8.0.

Speed test (python speedtest-cli package) outside the VM is 1gig+

luke@phat:~$ /home/ubuntu/.local/bin/speedtest
Retrieving speedtest.net configuration...
Testing from OVH SAS (51.210.209.124)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by OVH Cloud (Gravelines) [5879.73 km]: 9.144 ms
Testing download speed................................................................................
Download: 1123.13 Mbit/s
Testing upload speed......................................................................................................
Upload: 615.87 Mbit/s

Speed test inside the VM is like 14mbit...

root@baeaca45b27e2576:~# speedtest
Retrieving speedtest.net configuration...
Testing from OVH SAS (51.210.209.124)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Eurafibre (Lille) [5956.08 km]: 34.197 ms
Testing download speed................................................................................
Download: 14.26 Mbit/s
Testing upload speed......................................................................................................
Upload: 17.47 Mbit/s

Possibly related, there are 100k+ files in /var/lib/cni. But I'm seeing networking flakiness and slowness even when I clean out /var/lib/cni. Starting VMs does speed up again when /var/lib/cni is cleared out though.

@stealthybox
Copy link
Contributor

That's really peculiar.
When the VM's "speed up again", how is the performance?

With just a small number of VM's on my WSL2 ignite host, I'm seeing effectively native bandwidth on my gigabit uplink using the CNI bridge.

Maybe @bboreham or CNI bridge-plugin maintainer would know how extensive usage could cause the host kernel to slow down WRT bridge networking?

@lukemarsden
Copy link
Author

lukemarsden commented May 4, 2021

Came back to this because I saw the issue again. There are a great many files in /var/lib/cni again, iptables is using a lot of CPU as the system adds new VMs, and iptables --list is slow and returns a lot of of results.

There are only 15 VMs running on this system. VMs are terminating normally using ignite rm. Their IP addresses are left over in the /var/lib/cni directory structure. Why isn't their networking being cleaned up?

This issue seems to be stopping simple things like git clone https://github.com/... working inside the VMs intermittently!

root@ns1003380:/var/lib/cni# iptables --list|wc -l
30295
root@ns1003380:/var/lib/cni# find . |wc -l
26516
root@ns1003380:/var/lib/cni# find . |head -n 10
.
./networks
./networks/ignite-cni-bridge
./networks/ignite-cni-bridge/10.61.32.102
./networks/ignite-cni-bridge/10.61.1.253
./networks/ignite-cni-bridge/10.61.22.246
./networks/ignite-cni-bridge/10.61.13.29
./networks/ignite-cni-bridge/10.61.23.196
./networks/ignite-cni-bridge/10.61.26.100
./networks/ignite-cni-bridge/10.61.9.211
root@ns1003380:/var/lib/cni# iptables --list|head -n 100
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy DROP)
target     prot opt source               destination         
CNI-FORWARD  all  --  anywhere             anywhere             /* CNI firewall plugin rules */
DOCKER-USER  all  --  anywhere             anywhere            
DOCKER-ISOLATION-STAGE-1  all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
DOCKER     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain CNI-ADMIN (1 references)
target     prot opt source               destination         

Chain CNI-FORWARD (1 references)
target     prot opt source               destination         
CNI-ADMIN  all  --  anywhere             anywhere             /* CNI firewall plugin rules */
ACCEPT     all  --  anywhere             10.61.225.110        ctstate RELATED,ESTABLISHED
ACCEPT     all  --  10.61.225.110        anywhere            
ACCEPT     all  --  anywhere             10.61.225.111        ctstate RELATED,ESTABLISHED
ACCEPT     all  --  10.61.225.111        anywhere            
ACCEPT     all  --  anywhere             10.61.225.112        ctstate RELATED,ESTABLISHED
ACCEPT     all  --  10.61.225.112        anywhere            
ACCEPT     all  --  anywhere             10.61.225.113        ctstate RELATED,ESTABLISHED
ACCEPT     all  --  10.61.225.113        anywhere            
ACCEPT     all  --  anywhere             10.61.225.114        ctstate RELATED,ESTABLISHED
ACCEPT     all  --  10.61.225.114        anywhere            
ACCEPT     all  --  anywhere             10.61.225.115        ctstate RELATED,ESTABLISHED
[lots more like this]
root@ns1003380:/var/lib/cni# ignite version
Ignite version: version.Info{Major:"0", Minor:"8", GitVersion:"v0.8.0", GitCommit:"77f6859fa4f059f7338738e14cf66f5b9ec9b21c", GitTreeState:"clean", BuildDate:"2020-11-09T20:50:50Z", GoVersion:"go1.14.2", Compiler:"gc", Platform:"linux/amd64", SandboxImage:version.Image{Name:"weaveworks/ignite", Tag:"v0.8.0", Delimeter:":"}, KernelImage:version.Image{Name:"weaveworks/ignite-kernel", Tag:"4.19.125", Delimeter:":"}}
Firecracker version: v0.21.1

@lukemarsden
Copy link
Author

Inside VM:

root@7a899265f32c2013:~# speedtest
Retrieving speedtest.net configuration...
Testing from OVH SAS (51.81.244.112)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Sherwood Broadband (Sherwood, OR) [22.55 km]: 7.635 ms
Testing download speed................................................................................
Download: 101.72 Mbit/s
Testing upload speed......................................................................................................
Upload: 60.04 Mbit/s

Outside VM:

root@ns1003380:/var/lib/cni# /usr/local/bin/speedtest
Retrieving speedtest.net configuration...
Testing from OVH SAS (51.81.244.112)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Sherwood Broadband (Sherwood, OR) [22.55 km]: 2.823 ms
Testing download speed................................................................................
Download: 2531.19 Mbit/s
Testing upload speed......................................................................................................
Upload: 2350.14 Mbit/s

@lukemarsden
Copy link
Author

Any ideas @bboreham? Hi btw :-) 👋

@lukemarsden
Copy link
Author

#442 (comment) indicates that iptables rules were once cleaned up, but I'm seeing them not being cleaned up on stop or rm:

root@ns1003380:~# ignite ps                                                                                                                                                                                         
VM ID                   IMAGE                                                           KERNEL                                  SIZE    CPUS    MEMORY  CREATED STATUS  IPS             PORTS   NAME                
8e4540b9a832a296        testfaster-image:b6d693c8c85646fd0b1e45583c4a2637e1e1fb2f-final quay.io/testfaster/ignite-kernel:latest 50.0 GB 4       16.0 GB 16s ago Up 16s  10.61.36.194            tfastpool-908423616798f6b97fac539e92ae239bf3ddf818b45128ed36d1d62a0dc97037-vm-c28f0ba2t2jv25vc3e90                                                                                                                                       
ad091cdb9e5522b7        testfaster-image:b6d693c8c85646fd0b1e45583c4a2637e1e1fb2f-final quay.io/testfaster/ignite-kernel:latest 50.0 GB 4       16.0 GB 5s ago  Up 5s   10.61.36.195            tfastpool-908423616798f6b97fac539e92ae239bf3ddf818b45128ed36d1d62a0dc97037-vm-c28f0bi2t2jv25vc3eb0                                                                                                                                       
root@ns1003380:~# iptables --list |grep 10.61.36.194                                                                                                                                                                
Another app is currently holding the xtables lock. Perhaps you want to use the -w option?                                                                                                                                                                                                                                                                                                                                                
root@ns1003380:~# iptables --list |grep 10.61.36.194                                                                                                                                                                                                                                                                                                                                                                                     
ACCEPT     all  --  anywhere             10.61.36.194         ctstate RELATED,ESTABLISHED                                                                                                                           
ACCEPT     all  --  10.61.36.194         anywhere                                                                                                                                                                   
root@ns1003380:~# iptables --list |grep 10.61.36.195                                                                                                                                                                                                                                                                                                                                                                                     
ACCEPT     all  --  anywhere             10.61.36.195         ctstate RELATED,ESTABLISHED                                                                                                                                                                                                                                                                                                                                                
ACCEPT     all  --  10.61.36.195         anywhere                                                                                                                                                                   
root@ns1003380:~# iptables --list |grep "10.61.36.194\|10.61.36.195"                                                                                                                                                
ACCEPT     all  --  anywhere             10.61.36.194         ctstate RELATED,ESTABLISHED                                                                                                                           
ACCEPT     all  --  10.61.36.194         anywhere                                                                                                                                                                   
ACCEPT     all  --  anywhere             10.61.36.195         ctstate RELATED,ESTABLISHED                                                                                                                           
ACCEPT     all  --  10.61.36.195         anywhere                                                                                                                                                                   
root@ns1003380:~# ignite stop 8e4540b9a832a296                                                                                                                                                                                                                                                                                                                                                                                           
INFO[0000] Removing the container with ID "ignite-8e4540b9a832a296" from the "cni" network                                                                                                                          
INFO[0012] Stopped VM with name "tfastpool-908423616798f6b97fac539e92ae239bf3ddf818b45128ed36d1d62a0dc97037-vm-c28f0ba2t2jv25vc3e90" and ID "8e4540b9a832a296"                                                      
root@ns1003380:~# iptables --list |grep "10.61.36.194\|10.61.36.195"                                                                                                                                                
ACCEPT     all  --  anywhere             10.61.36.194         ctstate RELATED,ESTABLISHED                                                                                                                           
ACCEPT     all  --  10.61.36.194         anywhere                                                                                                                                                                   
ACCEPT     all  --  anywhere             10.61.36.195         ctstate RELATED,ESTABLISHED                                                                                                                           
ACCEPT     all  --  10.61.36.195         anywhere                                                                                                                                                                   
root@ns1003380:~# ignite rm -f ad091cdb9e5522b7                                                                                                                                                                                                                                                                                                                                                                                          
INFO[0000] Removing the container with ID "ignite-ad091cdb9e5522b7" from the "cni" network                                                                                                                          
INFO[0002] Removed VM with name "tfastpool-908423616798f6b97fac539e92ae239bf3ddf818b45128ed36d1d62a0dc97037-vm-c28f0bi2t2jv25vc3eb0" and ID "ad091cdb9e5522b7"                                                      
root@ns1003380:~# iptables --list |grep "10.61.36.194\|10.61.36.195"                                                                                                                                                
ACCEPT     all  --  anywhere             10.61.36.194         ctstate RELATED,ESTABLISHED                                                                                                                                                                                                                                                                                                                                                
ACCEPT     all  --  10.61.36.194         anywhere                                                                                                                                                                                                                                                                                                                                                                                        
ACCEPT     all  --  anywhere             10.61.36.195         ctstate RELATED,ESTABLISHED                                                                                                                           
ACCEPT     all  --  10.61.36.195         anywhere                                                                                                                                                                   
root@ns1003380:~# iptables-save |grep "10.61.36.194\|10.61.36.195"
-A CNI-FORWARD -d 10.61.36.194/32 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A CNI-FORWARD -s 10.61.36.194/32 -j ACCEPT
-A CNI-FORWARD -d 10.61.36.195/32 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A CNI-FORWARD -s 10.61.36.195/32 -j ACCEPT
-A POSTROUTING -s 10.61.36.194/32 -m comment --comment "name: \"ignite-cni-bridge\" id: \"c6d0f77a0e01c76e8f194590ed1e435bec584d494b2c4f8cffb2e724d786537e\"" -j CNI-2cb101d210755961201c5e71
-A POSTROUTING -s 10.61.36.195/32 -m comment --comment "name: \"ignite-cni-bridge\" id: \"1a75bd1dccb3a3b8e2a81c82ebab99fa7936819ee50b56164e11fc30d04d267f\"" -j CNI-1cfc4e25d76275bf9b32e5b5
root@ns1003380:~# /opt/cni/bin/bridge
CNI bridge plugin v0.8.5

@lukemarsden
Copy link
Author

same behaviour with newer CNI as well:

root@ns1003380:~# iptables-save |grep "10.61.36.231"
-A CNI-FORWARD -d 10.61.36.231/32 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A CNI-FORWARD -s 10.61.36.231/32 -j ACCEPT
-A POSTROUTING -s 10.61.36.231/32 -m comment --comment "name: \"ignite-cni-bridge\" id: \"72b2e7aaca38c4347f98bcaf9b1afbceec54d43d9c33dc21dc60030a000b132e\"" -j CNI-9933bc049210f7e454e72191
root@ns1003380:~# sudo ignite rm -f tfastpool-c7a784be464ec4544aa5501862310cca977ca1171769d535f3f364ed1fc99ead-vm-c28f19q2t2jplfecqnjg
INFO[0000] Removing the container with ID "ignite-0b65a0382f2dc7be" from the "cni" network 
INFO[0001] Removed VM with name "tfastpool-c7a784be464ec4544aa5501862310cca977ca1171769d535f3f364ed1fc99ead-vm-c28f19q2t2jplfecqnjg" and ID "0b65a0382f2dc7be" 
root@ns1003380:~# iptables-save |grep "10.61.36.231"
-A CNI-FORWARD -d 10.61.36.231/32 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A CNI-FORWARD -s 10.61.36.231/32 -j ACCEPT
-A POSTROUTING -s 10.61.36.231/32 -m comment --comment "name: \"ignite-cni-bridge\" id: \"72b2e7aaca38c4347f98bcaf9b1afbceec54d43d9c33dc21dc60030a000b132e\"" -j CNI-9933bc049210f7e454e72191
root@ns1003380:~# /opt/cni/bin/bridge
CNI bridge plugin v0.9.1

@lukemarsden
Copy link
Author

lukemarsden commented May 4, 2021

to see what was going on, I moved /opt/cni/bin/bridge to /opt/cni/bin/bridge.real and dropped this debug script into /opt/cni/bin/bridge

ubuntu@ns1003380:/opt/cni/bin$ cat bridge
#!/bin/bash
myvar=`cat`
(echo "Run with $@:"
 env |grep CNI
 echo "$myvar"
) >> /tmp/log
ret=$(echo "$myvar" | /opt/cni/bin/bridge.real "$@" 2>&1)
exitcode=$?
(echo "exit $exitcode"
 echo "response: $ret"
 echo
) >> /tmp/log
echo $ret
exit $exitcode

I am seeing both ADD and DEL commands:

Run with :
CNI_CONTAINERID=325a0d5d03717212114df55719b4518961051c4faf05001ccd54c9ce1e2d7dfd
CNI_IFNAME=eth0
CNI_NETNS=/proc/601842/ns/net
CNI_COMMAND=ADD
CNI_PATH=/opt/cni/bin
CNI_ARGS=
{"bridge":"ignite0","cniVersion":"0.4.0","ipMasq":true,"ipam":{"subnet":"10.61.0.0/16","type":"host-local"},"isDefaultGateway":true,"isGateway":true,"name":"ignite-cni-bridge","promiscMode":true,"type":"bridge"}
exit 0
response: {
    "cniVersion": "0.4.0",
    "interfaces": [
        {
            "name": "ignite0",
            "mac": "ca:14:6d:b0:5d:1c"
        },
        {
            "name": "veth75a84de0",
            "mac": "e2:cb:04:21:a5:f4"
        },
        {
            "name": "eth0",
            "mac": "0e:ad:e8:bb:2e:77",
            "sandbox": "/proc/601842/ns/net"
        }
    ],
    "ips": [
        {
            "version": "4",
            "interface": 2,
            "address": "10.61.1.22/16",
            "gateway": "10.61.0.1"
        }
    ],
    "routes": [
        {
            "dst": "0.0.0.0/0",
            "gw": "10.61.0.1"
        }
    ],
    "dns": {}
}

Run with :
CNI_CONTAINERID=ignite-22dd96eb9003a4b1
CNI_IFNAME=eth0
CNI_NETNS=/proc/595238/ns/net
CNI_COMMAND=DEL
CNI_PATH=/opt/cni/bin
CNI_ARGS=
{"bridge":"ignite0","cniVersion":"0.4.0","ipMasq":true,"ipam":{"subnet":"10.61.0.0/16","type":"host-local"},"isDefaultGateway":true,"isGateway":true,"name":"ignite-cni-bridge","promiscMode":true,"type":"bridge"}
exit 0
response: 

@lukemarsden
Copy link
Author

lukemarsden commented May 4, 2021

this seems to be operating correctly, so my assumption is now that ignite is doing something with iptables rules itself that it's failing to clean up. I'm not sure though. I'm not sure why the CNI bridge plugin doesn't release the IP addresses, very many files in /var/run/cni still strike me as suspicious.

@lukemarsden
Copy link
Author

possibly related, i am running ignite run --runtime docker, i.e. using the legacy docker runtime (so that i can use docker images that are built locally by docker)

@lukemarsden
Copy link
Author

i guess actually we are using the firewall plugin in CNI to create the iptables rules that aren't being cleared up? and host-local plugin for IPAM?

@lukemarsden
Copy link
Author

lukemarsden commented May 4, 2021

adding instrumentation to firewall and host-local, looks like they all think they are succeeding, so why are we leaking IPs and iptables rules??

firewall:                    
CNI_CONTAINERID=ignite-4b79b9c398095e46
CNI_IFNAME=eth0
CNI_NETNS=/proc/743882/ns/net
CNI_COMMAND=DEL
CNI_PATH=/opt/cni/bin
CNI_ARGS=
{"cniVersion":"0.4.0","name":"ignite-cni-bridge","type":"firewall"}             
exit 0         
response:                    
                                                                                                          
bridge:              
CNI_CONTAINERID=ignite-4b79b9c398095e46
CNI_IFNAME=eth0                                                                                                                                                                                                                                                                                                                                                                                                                          CNI_NETNS=/proc/743882/ns/net
CNI_COMMAND=DEL
CNI_PATH=/opt/cni/bin
CNI_ARGS=                 
{"bridge":"ignite0","cniVersion":"0.4.0","ipMasq":true,"ipam":{"subnet":"10.61.0.0/16","type":"host-local"},"isDefaultGateway":true,"isGateway":true,"name":"ignite-cni-bridge","promiscMode":true,"type":"bridge"}
exit 0   
response:                  

host-local:                                                                                                                                                                                                                                                                                                                                                                                                                              
CNI_CONTAINERID=ignite-abf40d722d788f33                                                                                                                                                                                                                                                                                                                                                                                                  
CNI_IFNAME=eth0                                                                                                                                                                                                                                                                                                                                                                                                                          
CNI_NETNS=/proc/742360/ns/net                                                                                                                                                                                                                                                                                                                                                                                                            
CNI_COMMAND=DEL                                                                                                                                                                                                                                                                                                                                                                                                                          
CNI_PATH=/opt/cni/bin                                                                                                                                                                                                                                                                                                                                                                                                                    
CNI_ARGS=                                                                                                                                                                                                                                                                                                                                                                                                                                
{"bridge":"ignite0","cniVersion":"0.4.0","ipMasq":true,"ipam":{"subnet":"10.61.0.0/16","type":"host-local"},"isDefaultGateway":true,"isGateway":true,"name":"ignite-cni-bridge","promiscMode":true,"type":"bridge"}                                                                                                                                                                                                                      
exit 0                                                                                                                                                                                                                                                                                                                                                                                                                                   
response:                                                                                                                                                                                                                                                                                                                                                                                                                                

@lukemarsden
Copy link
Author

for reference:

ubuntu@ns1003380:/opt/cni/bin$ cat bridge
#!/bin/bash
myvar=`cat`
me=`basename "$0"`
(echo "$me:"
 env |grep CNI
 echo "$myvar"
) >> /tmp/log
ret=$(echo "$myvar" | /opt/cni/bin/$me.real "$@" 2>&1)
exitcode=$?
(echo "exit $exitcode"
 echo "response: $ret"
 echo
) >> /tmp/log
echo $ret
exit $exitcode
ubuntu@ns1003380:/opt/cni/bin$ ls -alh
total 71M
drwxrwxr-x 2 root   root 4.0K May  4 08:14 .
drwxr-xr-x 3 root   root 4.0K Dec  9 08:00 ..
-rwxr-xr-x 1 root   root 4.0M Feb  5 15:42 bandwidth
-rwxr-xr-x 1 ubuntu root  258 May  4 08:13 bridge
-rwxr-xr-x 1 root   root 4.4M May  4 07:36 bridge.real
-rwxr-xr-x 1 root   root 9.8M Feb  5 15:42 dhcp
lrwxrwxrwx 1 root   root    6 May  4 08:14 firewall -> bridge
-rwxr-xr-x 1 root   root 4.6M May  4 08:14 firewall.real
-rwxr-xr-x 1 root   root 3.3M Feb  5 15:42 flannel
-rwxr-xr-x 1 root   root 4.0M Feb  5 15:42 host-device
lrwxrwxrwx 1 root   root    6 May  4 08:14 host-local -> bridge
-rwxr-xr-x 1 root   root 3.5M May  4 08:13 host-local.real
-rwxr-xr-x 1 root   root 4.1M Feb  5 15:42 ipvlan
-rwxr-xr-x 1 root   root 3.4M Feb  5 15:42 loopback
-rwxr-xr-x 1 root   root 4.2M Feb  5 15:42 macvlan
-rwxr-xr-x 1 root   root 3.8M Feb  5 15:42 portmap
-rwxr-xr-x 1 root   root 4.3M Feb  5 15:42 ptp
-rwxr-xr-x 1 root   root 3.6M Feb  5 15:42 sbr
-rwxr-xr-x 1 root   root 3.1M Feb  5 15:42 static
-rwxr-xr-x 1 root   root 3.5M Feb  5 15:42 tuning
-rwxr-xr-x 1 root   root 4.1M Feb  5 15:42 vlan
-rwxr-xr-x 1 root   root 3.6M Feb  5 15:42 vrf

@lukemarsden
Copy link
Author

I've worked around this for now by writing my own code which interacts with iptables and /var/lib/cni to do the cleanup that ignite + docker + CNI fails to do.

@networkop
Copy link
Contributor

I'm not sure if ignite forces docker runtime to use CNI (it's not trivial) but wouldn't it make sense to use --runtime docker together with --network-plugin docker-bridge? In this case I don't see any stale entries in my iptables

@lukemarsden
Copy link
Author

lukemarsden commented May 7, 2021 via email

@networkop
Copy link
Contributor

not sure why you'd have a problem doing this via API, it should work the same way.

But as for the IPT leaking with docker runtime, I think I've found the issue -- IPT rules are setup by the CNI plugin using proper docker container ID as the "id" in IPT rule comments, however, when they are being removed, the call to RemoveContainerNetwork is made with vm.PrefixedID(), which is docker container Name, not ID. So you can try patching /pkg/operations/remove.go with the below to see if it helps:

- if err = removeNetworking(vm.PrefixedID(), vm.Spec.Network.Ports...); err != nil {
+ if err = removeNetworking(vm.Status.Runtime.ID, vm.Spec.Network.Ports...); err != nil {

@lukemarsden
Copy link
Author

thanks @networkop good spot. Any chance we could get this fix into a release please?

@lukemarsden
Copy link
Author

@darkowlzz
Copy link
Contributor

@lukemarsden yes, it did 2f840ad .

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants