Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[workspacekit] Use veth instead of slirp4netns #8106

Closed
csweichel opened this issue Feb 8, 2022 · 15 comments · Fixed by #8955
Closed

[workspacekit] Use veth instead of slirp4netns #8106

csweichel opened this issue Feb 8, 2022 · 15 comments · Fixed by #8955
Assignees
Labels
aspect: performance anything related to performance component: workspacekit component: ws-daemon team: workspace Issue belongs to the Workspace team

Comments

@csweichel
Copy link
Contributor

csweichel commented Feb 8, 2022

Is your feature request related to a problem? Please describe

Today we use slirp4netns to connect the network namespace of our workspaces. While convenient and easy to use, this comes at a runtime cost because we make networking a userland issue.

Describe the behaviour you'd like

Instead, we should defer to the ws-dameon IWS to create veth pairs between ring1 and the workspace pod.

Also, if there's an opportunity to improve the logging, that would be great.

Describe alternatives you've considered

Maybe IPVLAN or MACVLAN work nicer because we don't have to setup routing in the workspace pod - but the interaction with CNI is unclear.

@csweichel csweichel added component: ws-daemon aspect: performance anything related to performance component: workspacekit team: workspace Issue belongs to the Workspace team labels Feb 8, 2022
@kylos101 kylos101 moved this to Scheduled in 🌌 Workspace Team Feb 14, 2022
@Furisto
Copy link
Member

Furisto commented Mar 7, 2022

Did some research into how a veth pair can be provided. The assumption is that you have some shell running with

unshare -m -n -U -r --propagation unchanged bash
echo $$

The propagation unchanged part is import as unshare by default will mount the rootfs as private in the new mount namespace, so mount events in the host mount namespace will not propagate to the new mount namespace. This means that after moving one end of the veth pair in the net namespace it will show "peer reference not found" when listing the network interfaces with ip a. The reason for this is that attaching to the net namespace is done by creating a shared bind-mount in /run/netns and if this is not visible from the mount namespace of the child the reference cannot be resolved.

I recommend saving your iptables rules with iptables-save or using a virtual machine to avoid messing up your system.

# Create veth pair
ip link add veth-<instanceId> type veth peer name ceth-<instanceId>

# Turn anonymous network namespace into named namespace and 
# attach one end of veth pair to network namespace 
ip netns attach netns-<instanceId> <pid>
ip link set ceth-<instanceId> netns netns-<instanceId>

# Assign IP addresses to both ends of veth pair
ip addr add 10.0.3.1/24 dev veth-<instanceId>
ip netns exec netns-<instanceId> ip addr add 10.0.3.2/24 dev ceth0-<instanceId>

# Bring up network interfaces
ip link set veth-<instanceId> up
ip netns exec netns-<instanceId> ip link set ceth-0<instanceId> up
ip netns exec netns-<instanceId> ip link set lo up

# Set default gateway
ip netns exec netns-<instanceId> ip route add default via 10.0.3.1

# Enable forwarding
echo 1 > /proc/sys/net/ip4/ip_forward 

# Add iptables rules
iptables -A FORWARD -o eth0 -i veth-<instanceId> -j ACCEPT
iptables -A FORWARD -i eth0 -o veth-<instanceId> -j ACCEPT
iptables -t nat -A POSTROUTING -s 10.0.3.2/24 -o eth0 -j MASQUERADE

# Setup DNS
mkdir -p /etc/netns/netns-<instanceId>
ln -s /etc/resolv.conf /etc/netns/netns-<instanceId>/resolv.conf

@kylos101 kylos101 assigned Furisto and unassigned Furisto Mar 7, 2022
@utam0k utam0k self-assigned this Mar 10, 2022
@utam0k utam0k moved this from Scheduled to In Progress in 🌌 Workspace Team Mar 11, 2022
@utam0k
Copy link
Contributor

utam0k commented Mar 11, 2022

I feel that perhaps @Furisto's method would work with IWS and nsinsider. (I will make a sequence diagram at a later date for more details)

@csweichel @Furisto
Apart from this, there is bypass4nets available from kernel 5.9, which is worth considering. This is a technique that uses seccomp notify to rewrite the fd of the network communication to the host. What do you think?

@csweichel
Copy link
Contributor Author

Bypass4netns was a hot contender. We can't just drop it in and use it though because we already use seccomp-notify. We'd need to integrate it with workspacekit.

It does solve a different problem though. We do have a privileged process we can externalise operations to, unlike what bypass4netns was built for.

@utam0k
Copy link
Contributor

utam0k commented Mar 14, 2022

I had not come up with a good way to go about this either. I was thinking that if there is a way to do it, it would be the best option.

Bypass4netns was a hot contender. We can't just drop it in and use it though because we already use seccomp-notify. We'd need to integrate it with workspacekit.

@utam0k
Copy link
Contributor

utam0k commented Mar 14, 2022

Manually created a pair of veth and successfully communicated with the outside world.

##### IN HOST ######

# Enable forwarding
echo 1 > /proc/sys/net/ipv4/ip_forward

# enter into the target pod pid
nsenter -t 212137 -n -m -p bash

##### IN POD'S NAMESPACE ######

# Set the values
INSTANCE_ID=w2
WORKSPACE_PID=57

# Create veth pair
ip link add veth-$INSTANCE_ID type veth peer name ceth-$INSTANCE_ID

# Turn anonymous network namespace into named namespace and
# attach one end of veth pair to network namespace
# ip netns attach netns-<instanceId> <pid>
mkdir -p /var/run/netns
ln -s /proc/$WORKSPACE_PID/ns/net /var/run/netns/netns-$INSTANCE_ID
ip link set ceth-$INSTANCE_ID netns netns-$INSTANCE_ID

# Assign IP addresses to both ends of veth pair
ip addr add 10.0.5.1/24 dev veth-$INSTANCE_ID
ip netns exec netns-$INSTANCE_ID ip addr add 10.0.5.2/24 dev ceth-$INSTANCE_ID

# Bring up network interfaces
ip link set veth-$INSTANCE_ID up
ip netns exec netns-$INSTANCE_ID ip link set ceth-$INSTANCE_ID up
ip netns exec netns-$INSTANCE_ID ip link set lo up

# Add iptables rules
iptables -A FORWARD -o eth0 -i veth-$INSTANCE_ID -j ACCEPT
iptables -A FORWARD -i eth0 -o veth-$INSTANCE_ID -j ACCEPT
iptables -t nat -A POSTROUTING -s 10.0.5.0/24 -o eth0 -j MASQUERADE

# Setup DNS
mkdir -p /etc/netns/netns-$INSTANCE_ID
ln -s /etc/resolv.conf /etc/netns/netns-$INSTANCE_ID/resolv.conf

# Set default gateway
ip netns exec netns-$INSTANCE_ID ip route replace default via 10.0.5.1
### in workspace's namespace ###
root@gfe3de62b9ab20671cf9d9f:/# traceroute 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
 1  10.0.5.1 (10.0.5.1)  0.036 ms  0.005 ms  0.004 ms
 2  10-132-0-67.kubernetes.default.svc.cluster.local (10.132.0.67)  0.091 ms  0.063 ms  0.040 ms
 3  * * *
...
12  * * *
13  dns.google (8.8.8.8)  1.023 ms  0.786 ms  0.338 ms

However, when I changed the default gateway, the workspace kept reconnecting. I am not sure if this is because I manually changed the settings of a workspace that has finished starting, or if the original default gateway (10.0.0.2) has a special meaning.
@csweichel @Furisto Do you know anything about this?

@Furisto
Copy link
Member

Furisto commented Mar 14, 2022

@utam0k Can reproduce, even switching the default route back to the tap device does not help.

@utam0k
Copy link
Contributor

utam0k commented Mar 14, 2022

@Furisto Thanks for your challenge! But, in my environment, reverting to 10.0.2.2 restored it properly.
Can you tell me how you did this?

@utam0k
Copy link
Contributor

utam0k commented Mar 17, 2022

@csweichel @Furisto
(edited) I got why. I have successfully started workspace and supervisor itself, but the supervisor's health check fails and WebUI does not start. If you know anything about it, I would like to hear from you 🙏
The change is this commit.
4244f49

Here is in ring2. Preview environment is here. Logs is here

$ ps auwxf
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root        1416  0.0  0.0   4240  3576 pts/0    S    10:38   0:00 bash
root        1429  0.0  0.0   5892  2832 pts/0    R+   10:44   0:00  \_ ps auwxf
gitpod         1  0.0  0.0 719908 14780 ?        Sl   10:32   0:00 supervisor init
gitpod        27  0.3  0.0 723876 24364 ?        Sl   10:32   0:02 supervisor run
133332        48  0.0  0.0   2612   604 ?        S    10:32   0:00  \_ sh /ide/bin/gitpod-code --start-server --install-builtin-extension github.vscode-pull-request-github --install-extension golang.go --port 23000 --host 0.0.0.0 --without-connection-token --server-data-dir /workspace/.vscode-remo
133332      1113  0.2  0.0 927784 65436 ?        Sl   10:32   0:02  |   \_ /ide/node /ide/out/server-main.js --start-server --install-builtin-extension github.vscode-pull-request-github --install-extension golang.go --port 23000 --host 0.0.0.0 --without-connection-token --server-data-dir /workspac
133332      1130  0.0  0.0 613672 35444 ?        Sl   10:32   0:00  |       \_ /ide/node /ide/out/bootstrap-fork --type=ptyHost
133332        56  0.0  0.0  12796  9640 pts/0    Ss+  10:32   0:00  \_ /bin/bash

$ lsof -i:23000
lsof: no pwd entry for UID 133332
COMMAND  PID     USER   FD   TYPE     DEVICE SIZE/OFF NODE NAME
lsof: no pwd entry for UID 133332
node    1113   133332   19u  IPv4 1271519670      0t0  TCP *:23000 (LISTEN)

$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
4: ceth-aaa@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 6a:61:e9:54:47:5d brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.2.2/24 scope global ceth-aaa
       valid_lft forever preferred_lft forever
    inet6 fe80::6861:e9ff:fe54:475d/64 scope link 
       valid_lft forever preferred_lft forever

$ docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
2db29710123e: Pull complete 
Digest: sha256:4c5f3db4f8a54eb1e017c385f683a2de6e06f75be442dc32698c9bbe6c861edd
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

@utam0k
Copy link
Contributor

utam0k commented Mar 18, 2022

Now I understand why. We needed the ability to dynamically expose ports.

@utam0k
Copy link
Contributor

utam0k commented Mar 28, 2022

(Updated 2022-03-31)
The implementation is designed to be a network of the following form. We have to implement two function

If you are not familiar with this field(me too), this article would be very helpful.
https://iximiuz.com/en/posts/container-networking-is-simple/

          Pod Network Namespace(ring1)
+------------------------------------------------+
|                                                |
|       Workspace Network Namesapce(ring2)       |
| +--------------------------------------------+ |
| |                                            | |
| |              default via veth0             | |
| |                                            | |
| |                                            | |
| |     +------+  +--------------+             | |
| |     |  lo  |  |    ceth0     | 10.0.2.2/24 | |
| |     +------+  +--^--------+--+             | |
| |                  |        |                | |
| +------------------+--------+----------------+ |
|                    |        |                  |
|                 +--+--------v--+               |
|   +-----------> |    veth0     | 10.0.2.1/24   |
|   |             +-----------+--+               |
|   |                         |                  |
|   |          +--------------v-----+            |
|   |          |                    |            |
|   |          |      nftables      |            |
|   |          |   (ip masquerade)  |            |
|   |          +--------------+-----+            |
|   |                         |                  |
|   |   +------+  +-----------v--+               |
|   |   |  lo  |  |     eth0     |               |
|   |   +------+  +--^--------+--+               |
|   |                |        |                  |
|   |          +-----+--------v-----+            |
|   |          |                    |            |
|   +----------+      nftables      |            |
| if with port | (port redirecter)  |            |
|              +-----^--------+-----+            |
|                    |        |                  |
+--------------------+--------+------------------+
                     |        |
                     |        |
                     |        v
                    o u t s i d e

ASCIIFlow link

previous picture
                 Pod Network Namespace(ring1)
+----------------------------------------------------------------+
|                                                                |
|              Workspace Network Namesapce(ring2)                |
| +------------------------------------------------------------+ |
| |                                                            | |
| |                              default via veth0             | |
| |                                                            | |
| |                                                            | |
| |                     +------+  +--------------+             | |
| |  supervisor         |  lo  |  |    ceth0     | 10.0.2.2/24 | |
| |      |              +------+  +--^--------+--+             | |
| |      |                           |        |                | |
| +------+---------------------------+--------+----------------+ |
|        |                           |        |                  |
| a command to expose             +--+--------v--+               |
|or close a port via UDS          |    veth0     | 10.0.2.1/24   |
|        |                        +--^--------+--+               |
|        |                           |        |                  |
|        |                      +----+--------v----+             |
|        v         controll     |                  |             |
|   port-manager--------------> |     iptables     |             |
|                               |                  |             |
|                               +----^--------+----+             |
|                                    |        |                  |
|                       +------+  +--+--------v--+               |
|                       |  lo  |  |     eth0     | 10.96.6.47/32 |
|                       +------+  +--^--------+--+               |
|                                    |        |                  |
+------------------------------------+--------+------------------+
                                     |        |
                                     |        |
                                     |        v
                                    o u t s i d e

ASCIIFlow link

Repository owner moved this from In Progress to Done in 🌌 Workspace Team Apr 8, 2022
@utam0k utam0k reopened this Apr 10, 2022
@utam0k utam0k moved this from Done to In Progress in 🌌 Workspace Team Apr 10, 2022
@utam0k
Copy link
Contributor

utam0k commented Apr 11, 2022

A deployment plan that ensures that the IDE and workspace teams do not influence each other. @utam0k will create all PRs we need. Each checklist was assigned a person in charge of marking the checklist. This is to avoid miscommunication by marking the checklist when deployment is complete. The person in charge should mark the checklist when the item is completely finished.

@kylos101 kylos101 moved this from In Progress to Blocked in 🌌 Workspace Team Apr 20, 2022
@utam0k
Copy link
Contributor

utam0k commented Apr 28, 2022

@iQQBot Deployment to #9212 to all workspace clusters is complete. We can take the next step.

@utam0k
Copy link
Contributor

utam0k commented May 9, 2022

Hi, @iQQBot, Was no3 finished? Can we go to the next step?

@iQQBot
Copy link
Contributor

iQQBot commented May 9, 2022

@utam0k We've deployed the new version of supervisor and can move on to the next step

@utam0k
Copy link
Contributor

utam0k commented May 9, 2022

@iQQBot Thank you. I made the last PR open.
#9214

@utam0k utam0k moved this from Blocked to In Progress in 🌌 Workspace Team May 9, 2022
@utam0k utam0k closed this as completed May 10, 2022
Repository owner moved this from In Progress to Done in 🌌 Workspace Team May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aspect: performance anything related to performance component: workspacekit component: ws-daemon team: workspace Issue belongs to the Workspace team
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants