VXLAN tests are not working on packet #68

Bolodya1997 · 2021-05-24T03:37:49Z

https://github.com/networkservicemesh/integration-k8s-packet/actions/runs/856582486

Bolodya1997 · 2021-05-31T06:42:04Z

So the reason is somehow related to hostNetwork: true, because commenting this line makes VXLAN tests working.

denis-tingaikin · 2021-05-31T14:48:13Z

denis-tingaikin · 2021-05-31T14:53:02Z

As a workaround, we could disable hostNetwork for packet clusters.

Currently, it seems to me that something missed in the packet configuration, because packet working fine in monorepo: https://github.com/networkservicemesh/networkservicemesh/blob/master/deployments/helm/nsm/templates/forwarding-plane.tpl#L15

denis-tingaikin · 2021-06-01T20:02:07Z

@Bolodya1997 , @d-uzlov I think we also need to check will work if we remove this line in fwrder:
https://github.com/networkservicemesh/cmd-forwarder-vpp/blob/main/internal/vppinit/vppinit.go#L205-L207

Note: We are not filtering arps in the monorepo.

denis-tingaikin · 2021-06-04T19:52:07Z

@d-uzlov Could you please all logs that we captured related to this issue?

edwarnicke · 2021-06-04T19:54:22Z

@DVEfremov It would be helpful to also have a trimmed down summary to what it looks like might be going wrong.

I can spot a lot of potential issues from that summary.

Things like:

Are the tests failing because the ping isn't working?
or
Are the test failing because the Request is returning an error?
If the tests are failing because the Request is returning an error, what error? Can we trace that error back to a deeper error? In what component is it originating?

Are the test failing because Close is returning an error?

Is some component panicking?

Etc

denis-tingaikin · 2021-06-04T20:17:56Z

/cc @d-uzlov

Bolodya1997 · 2021-06-05T03:45:11Z

@edwarnicke

Are the tests failing because the ping isn't working?
or
Are the test failing because the Request is returning an error?
If the tests are failing because the Request is returning an error, what error? Can we trace that error back to a deeper error? In what component is it originating?

Are the test failing because Close is returning an error?

Is some component panicking?

NSM chain works as expected - NSC receives success response with all IP/routes set. There is no panics.
Kernel interfaces, routes are also set both in NSC and NSE.
Ping is not working.

edwarnicke · 2021-06-06T16:07:47Z

I think I've traced this back to a cause (not yet root).

On Packet, interfaces have multiple ipV5 addresses (136.144.51.109/31 - the Pod IP and 10.99.35.131/31) :

6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether e4:43:4b:5f:6d:50 brd ff:ff:ff:ff:ff:ff
    inet 136.144.51.109/31 brd 255.255.255.255 scope global bond0
       valid_lft forever preferred_lft forever
    inet 10.99.35.131/31 brd 255.255.255.255 scope global bond0:0
       valid_lft forever preferred_lft forever
    inet6 2604:1380:0:2c00::3/127 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::e643:4bff:fe5f:6d50/64 scope link 
       valid_lft forever preferred_lft forever

and multiple routes:

# ip route
default via 136.144.51.108 dev bond0 onlink 
10.0.0.0/8 via 10.99.35.130 dev bond0 
10.99.35.130/31 dev bond0 proto kernel scope link src 10.99.35.131 
136.144.51.108/31 dev bond0 proto kernel scope link src 136.144.51.109 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
192.168.0.0/16 dev weave proto kernel scope link src 192.168.192.0

(ignore the last two for docker and weave)

This is different for most of our other environments which are much simpler, having a single ipv4 address for the main interface (the hostNetwork:true PodIP).

None of this is intrinsically a problem. It's just a difference.

VPP correctly picked up the PodIP for the host-bond0 address:

# vppctl show int addr
host-bond0 (up):
  L3 136.144.51.109/31

and mac address:

# vppctl show hardware   
              Name                Idx   Link  Hardware
host-bond0                         1     up   host-bond0
  Link speed: unknown
  Ethernet address e4:43:4b:5f:6d:50
  Linux PACKET socket interface

which matches the bond0 interface above.

and also correctly picks up the neighbor for it:

# vppctl show ip neighbor
    Time                       IP                    Flags      Ethernet              Interface       
      2.2429             136.144.51.108                S    b0:33:a6:fe:79:d7 host-bond0

which matches the neighbor from the kernel:

# ip neighbor | grep 136.144.51.108
136.144.51.108 dev bond0 lladdr b0:33:a6:fe:79:d7 REACHABLE

Where things go wrong is on routes:

# vppctl show ip fib
ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto ] epoch:0 flags:none locks:[adjacency:1, default-route:1, nat-hi:2, ]
0.0.0.0/0
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:1 buckets:1 uRPF:12 to:[1:96]]
    [0] [@3]: arp-ipv4: via 38.4.19.128 host-bond0
0.0.0.0/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:2 buckets:1 uRPF:1 to:[0:0]]
    [0] [@0]: dpo-drop ip4
10.0.0.0/8
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:11 buckets:1 uRPF:11 to:[0:0]]
    [0] [@3]: arp-ipv4: via 10.99.35.130 host-bond0
10.99.35.130/31
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:12 buckets:1 uRPF:11 to:[0:0]]
    [0] [@3]: arp-ipv4: via 10.99.35.130 host-bond0
136.144.51.108/31
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:13 buckets:1 uRPF:11 to:[0:0]]
    [0] [@3]: arp-ipv4: via 10.99.35.130 host-bond0
136.144.51.108/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:9 buckets:1 uRPF:9 to:[0:0]]
    [0] [@5]: ipv4 via 136.144.51.108 host-bond0: mtu:1500 next:4 b033a6fe79d7e4434b5f6d500800
136.144.51.109/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:10 buckets:1 uRPF:10 to:[1760:2611356]]
    [0] [@2]: dpo-receive: 136.144.51.109 on host-bond0
147.75.199.143/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:16 buckets:1 uRPF:12 to:[0:0]]
    [0] [@3]: arp-ipv4: via 38.4.19.128 host-bond0
224.0.0.0/4
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:4 buckets:1 uRPF:3 to:[0:0]]
    [0] [@0]: dpo-drop ip4
240.0.0.0/4
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:3 buckets:1 uRPF:2 to:[0:0]]
    [0] [@0]: dpo-drop ip4
255.255.255.255/32
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:5 buckets:1 uRPF:4 to:[0:0]]
    [0] [@0]: dpo-drop ip4

Most of these simply match the routes from the kernel (as expected), but there is one that is really screwed up:

0.0.0.0/0
  unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:1 buckets:1 uRPF:12 to:[1:96]]
    [0] [@3]: arp-ipv4: via 38.4.19.128 host-bond0

It should be going to via 136.144.51.108 host-bond0 but is instead going via 38.4.19.128 host-bond0.

I have no idea where via 38.4.19.128 host-bond0 came from. Its clearly wrong.

The code that sets it comes from here:
https://github.com/networkservicemesh/cmd-forwarder-vpp/blob/0eb8dcca85c0ba98beb5d8bb89c626c13fe9b5e7/internal/vppinit/vppinit.go#L160

Its correctly adding the other routes with the correct gateway addresses.

d-uzlov · 2021-06-07T05:06:18Z

38.4.19.128 is the beginning of 2604:1380:0:2c00::2, when you take first 4 bytes.
In line 160 we lose the metadata that the address is ipv6, and it gets interpreted as ipv4.

d-uzlov · 2021-06-16T08:54:29Z

Now that packet CI is working properly again, and everything is green, we can finally close this.

Bolodya1997 self-assigned this May 24, 2021

Bolodya1997 added the ASAP as soon as possible label May 24, 2021

Bolodya1997 mentioned this issue May 24, 2021

Update from update/networkservicemesh/integration-k8s-kind #67

Merged

denis-tingaikin assigned d-uzlov and unassigned Bolodya1997 Jun 1, 2021

This was referenced Jun 4, 2021

DIsable hostNetwork for forwarder-vpp #69

Closed

Disable hostNetwork for forwarder-vpp #70

Closed

d-uzlov mentioned this issue Jun 7, 2021

Fix ip family mismatch for routes networkservicemesh/cmd-forwarder-vpp#233

Merged

d-uzlov closed this as completed Jun 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VXLAN tests are not working on packet #68

VXLAN tests are not working on packet #68

Bolodya1997 commented May 24, 2021

Bolodya1997 commented May 31, 2021

denis-tingaikin commented May 31, 2021 •

edited

Loading

denis-tingaikin commented May 31, 2021

denis-tingaikin commented Jun 1, 2021

denis-tingaikin commented Jun 4, 2021

edwarnicke commented Jun 4, 2021

denis-tingaikin commented Jun 4, 2021

Bolodya1997 commented Jun 5, 2021

edwarnicke commented Jun 6, 2021

d-uzlov commented Jun 7, 2021 •

edited

Loading

d-uzlov commented Jun 16, 2021

VXLAN tests are not working on packet #68

VXLAN tests are not working on packet #68

Comments

Bolodya1997 commented May 24, 2021

Bolodya1997 commented May 31, 2021

denis-tingaikin commented May 31, 2021 • edited Loading

denis-tingaikin commented May 31, 2021

denis-tingaikin commented Jun 1, 2021

denis-tingaikin commented Jun 4, 2021

edwarnicke commented Jun 4, 2021

denis-tingaikin commented Jun 4, 2021

Bolodya1997 commented Jun 5, 2021

edwarnicke commented Jun 6, 2021

d-uzlov commented Jun 7, 2021 • edited Loading

d-uzlov commented Jun 16, 2021

denis-tingaikin commented May 31, 2021 •

edited

Loading

d-uzlov commented Jun 7, 2021 •

edited

Loading