Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

need-fragmentation ICMPs are not propagated back to pods in OVN #1278

Closed
1 of 4 tasks
Tracked by #96
mangelajo opened this issue Apr 21, 2021 · 16 comments
Closed
1 of 4 tasks
Tracked by #96

need-fragmentation ICMPs are not propagated back to pods in OVN #1278

mangelajo opened this issue Apr 21, 2021 · 16 comments
Assignees
Labels
bug Something isn't working OVN priority:medium size:medium This can be implemented in a single sprint

Comments

@mangelajo
Copy link
Contributor

mangelajo commented Apr 21, 2021

What happened:

A bug in core-ovn avoids ICMPs being propagated back from the the gateway nodes
to the pods.

for example:

14:47:19.896996 ovn-k8s-gw0 In  IP ip-10-132-2-85.us-east-2.compute.internal.40069 > ip-10-131-0-162.us-east-2.compute.internal.1234: Flags [P.], seq 1:3201, ack 1, win 208, options [nop,nop,TS val 2213744343 ecr 765938601,nop,nop,sack 1 {3151:3152}], length 3200
14:47:19.897058 ovn-k8s-mp0 Out IP ip-10-134-2-2.us-east-2.compute.internal > ip-10-132-2-85.us-east-2.compute.internal: ICMP ip-10-131-0-162.us-east-2.compute.internal unreachable - need to frag (mtu 1438), length 556

is not propagated back into the geneve tunnel down to the pod ^.

Previously this workaround was effective in avoiding such issue, but in later ovn-kubernetes or core-ovn implementations the ovn bug is triggering again.

64f68da

What you expected to happen:

E2E tests passing, the ICMPs for need-frag being propagated back to the pdos.

How to reproduce it (as minimally and precisely as possible):
Install with OCP 4.5..4.7 and run the E2E tests.

Anything else we need to know?:

This workaround (far from ideal), mitigates the issue:

iptables -I FORWARD -o ovn-k8s-gw0 -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1398

but introduces performance penalties, and it's only limited to TCP, this won't work for UDP packets.

We should:

  • Introduce the workaround (being more selective about IP ranges to remote clusters)
  • Handle the issue with the network kernel team
  • Handle the issue with the ovn-kubernetes and/or the core-ovn team
  • Remove this workaround.

This is also related (not exactly equal, to: #1022 )

Environment:

  • Submariner version (use subctl version): 0.8.1 or 0.9.0-rc0
  • Network plugin and version (if this is a network-related bug): ovn-kubernetes
@mangelajo mangelajo added bug Something isn't working OVN labels Apr 21, 2021
@mangelajo mangelajo added this to the 0.9.0 milestone Apr 21, 2021
@mangelajo mangelajo self-assigned this Apr 21, 2021
@mangelajo mangelajo changed the title need-fragmentation ICMPs are not propagated back to ports in OVN need-fragmentation ICMPs are not propagated back to pods in OVN Apr 21, 2021
@mangelajo
Copy link
Contributor Author

mangelajo commented Apr 28, 2021

I have attempted forcing the return of icmp packets related to known connections into the ovn-k8s-sub0 interface, with no luck.

iptables -A OUTPUT -t mangle --dst 10.132.0.0/14 -j MARK --set-mark 1
iptables -A OUTPUT -t mangle --dst 172.31.0.0/14 -j MARK --set-mark 1
#also..  iptables -I FORWARD -t mangle --dst 10.132.0.0/14 -j MARK --set-mark # just in case...
ip rule add from all fwmark 1 table 149



# only when something as wide as  
ip rule add ipproto 1 lookup 149

# is introduced, we would then see
12:57:43.226029 fc7f42806c279dd Out IP 10.133.2.8.1234 > 10.129.2.6.57284: Flags [.], ack 5, win 208, options [nop,nop,TS val 2927014584 ecr 4235024924], length 0                                                                            
12:58:22.613113 fc7f42806c279dd P   IP 10.129.2.6.57284 > 10.133.2.8.1234: Flags [P.], seq 5:4101, ack 1, win 208, options [nop,nop,TS val 4235064312 ecr 2927014584], length 4096                                                            
12:58:22.613839 ovn-k8s-sub0 In  IP 10.129.2.6.57284 > 10.133.2.8.1234: Flags [P.], seq 5:4101, ack 1, win 208, options [nop,nop,TS val 4235064312 ecr 2927014584], length 4096                                                               
12:58:22.613881 ovn-k8s-sub0 Out IP 169.254.254.10 > 10.129.2.6: ICMP 10.133.2.8 unreachable - need to frag (mtu 1438), length 556                                                                                                            
12:58:22.614189 fc7f42806c279dd Out IP 169.254.254.10 > 10.129.2.6: ICMP 10.133.2.8 unreachable - need to frag (mtu 1438), length 556          

# and works because the container interface  fc7f42806c279dd is on the gateway,
# otherwise the destination worker node drops the packet and anyway this solution
# is not viable because it redirects all icmp traffic to submariner, not only the required packets.

One possibility could be creating an ICMP reflector in the submariner-gateway go that sees the ICMP and pushes it to the right interface, still... we need to figure out what's happening when a pod is not on the gateway and the icmp is encapsulated via geneve

mangelajo added a commit to mangelajo/submariner that referenced this issue Apr 28, 2021
This clamps the maximum segment size on the TCP negotiation
when addressing remote clusters with OVN. With the OVN
implementation we have an issue where the need-frag ICMPs
used for PMTU discovery are sent into the wront interface
by the kernel, this limits the scope of the issue, although
it introduces a performance penalty for jumboframe capable
networks.

Fixes-Issue: submariner-io#1278

Signed-off-by: Miguel Angel Ajo <majopela@redhat.com>
@mangelajo mangelajo removed this from the 0.9.0 milestone Apr 28, 2021
@mangelajo
Copy link
Contributor Author

Moving this to 0.10 to keep track, since #1294 only proposes a workaround.

skitt pushed a commit that referenced this issue Apr 29, 2021
This clamps the maximum segment size on the TCP negotiation
when addressing remote clusters with OVN. With the OVN
implementation we have an issue where the need-frag ICMPs
used for PMTU discovery are sent into the wront interface
by the kernel, this limits the scope of the issue, although
it introduces a performance penalty for jumboframe capable
networks.

Fixes-Issue: #1278

Signed-off-by: Miguel Angel Ajo <majopela@redhat.com>
mangelajo added a commit to mangelajo/submariner that referenced this issue Apr 29, 2021
Otherwise the host-network traffic from the gateway
to a remote cluster fails to work when fragmentation
is necessary.

Fixes-Issue: submariner-io#1278

Signed-off-by: Miguel Angel Ajo <majopela@redhat.com>
mangelajo added a commit that referenced this issue Apr 29, 2021
Otherwise the host-network traffic from the gateway
to a remote cluster fails to work when fragmentation
is necessary.

Fixes-Issue: #1278

Signed-off-by: Miguel Angel Ajo <majopela@redhat.com>
@mangelajo mangelajo added size:large This needs more than one sprint to be implemented priority:high labels May 4, 2021
@stale
Copy link

stale bot commented Jul 3, 2021

This issue has been automatically marked as stale because it has not had activity for 60 days. It will be closed if no further activity occurs. Please make a comment if this issue/pr is still valid. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Jul 3, 2021
@tpantelis
Copy link
Contributor

bump

@nyechiel nyechiel removed the wontfix This will not be worked on label Jul 7, 2021
@tpantelis tpantelis removed the wontfix This will not be worked on label Nov 6, 2021
@dfarrell07 dfarrell07 added the help wanted Looking for someone to work on this label Dec 21, 2021
@dfarrell07
Copy link
Member

This still seems important, but we don't have anyone working on OVN right now. If someone wants to dig in, that would be much appreciated.

@dfarrell07 dfarrell07 assigned yboaron and unassigned mangelajo Mar 15, 2022
@dfarrell07
Copy link
Member

@yboaron volunteered to take a look at this, @astoycos pointed out there used to be an OVN-upstream bug that seemed related, maybe should ask in their Slack if we still see issues/have questions.

@sridhargaddam sridhargaddam added next-version-candidate size:medium This can be implemented in a single sprint and removed size:large This needs more than one sprint to be implemented labels Mar 17, 2022
@nyechiel
Copy link
Member

@vthapar @sridhargaddam is this bug still relevant with the new 0.13 OVN implementation?

@vthapar
Copy link
Contributor

vthapar commented Jun 14, 2022

New implementation won't fix this bug, but it may no longer be an issue in latest OVN/OCP. So we need to test and confirm if this is still a valid bug.

@nyechiel
Copy link
Member

New implementation won't fix this bug, but it may no longer be an issue in latest OVN/OCP. So we need to test and confirm if this is still a valid bug.

Should we close and reopen if needed, or would you prefer to keep this open for now?

@sridhargaddam
Copy link
Member

sridhargaddam commented Jun 14, 2022

Should we close and reopen if needed, or would you prefer to keep this open for now?

I think @aswinsuryan wanted to try this once in his OSP/AWS Cluster setup and if there is no issue, we can close it.

@nyechiel nyechiel removed the help wanted Looking for someone to work on this label Jun 14, 2022
@nyechiel nyechiel assigned aswinsuryan and unassigned yboaron Jun 21, 2022
@nyechiel
Copy link
Member

@aswinsuryan my understating is that you are planning to verify this. Please update if the issue is still seen with recent devel/0.13.0-rc0

@aswinsuryan
Copy link
Contributor

@nyechiel yes I am trying to verify this, the setup has some issue and the network-plugin syncer pod is not coming up. Will keep posted.

@nyechiel
Copy link
Member

nyechiel commented Jun 21, 2022

@nyechiel yes I am trying to verify this, the setup has some issue and the network-plugin syncer pod is not coming up. Will keep posted.

Thanks. Please keep us posted. If the network-plugin-syncer pod does not come up, we might have a bigger problem...

@aswinsuryan
Copy link
Contributor

aswinsuryan commented Jun 21, 2022

The test are passing in the below setup

$ oc version
Client Version: 4.9.6
Server Version: 4.11.0-0.nightly-2022-06-15-222801
Kubernetes Version: v1.24.0+25f9057

Cluster1

$ subctl show all
Cluster "asuryanaaws"
 ✓ Detecting broker(s)
NAMESPACE                NAME                     COMPONENTS                              
submariner-k8s-broker    submariner-broker        service-discovery, connectivity         

 ✓ Showing Connections
GATEWAY                          CLUSTER       REMOTE IP       NAT  CABLE DRIVER  SUBNETS                       STATUS     RTT avg.    
asuryanarhos-2jzx5-submariner-g  asuryanarhos  66.187.232.129  yes  libreswan     172.90.0.0/16, 10.168.0.0/14  connected  18.61304ms  

 ✓ Showing Endpoints
CLUSTER ID                    ENDPOINT IP     PUBLIC IP       CABLE DRIVER        TYPE            
asuryanaaws                   10.0.28.32      3.15.188.109    libreswan           local           
asuryanarhos                  192.168.198.54  66.187.232.129  libreswan           remote          

 ✓ Showing Gateways
NODE                            HA STATUS       SUMMARY                         
ip-10-0-28-32                   active          All connections (1) are established

 ✓ Showing Network details
    Discovered network details via Submariner:
        Network plugin:  OVNKubernetes
        Service CIDRs:   [172.30.0.0/16]
        Cluster CIDRs:   [10.128.0.0/14]

 ✓ Showing versions
COMPONENT                       REPOSITORY                                            VERSION         
submariner                      quay.io/submariner                                    devel           
submariner-operator             quay.io/submariner                                    devel           
service-discovery               quay.io/submariner                                    devel           

Cluster2

$ subctl show all
Cluster "asuryanarhos"
 ✓ Detecting broker(s) 

 ✓ Showing Connections 
GATEWAY        CLUSTER      REMOTE IP     NAT  CABLE DRIVER  SUBNETS                       STATUS     RTT avg.     
ip-10-0-28-32  asuryanaaws  3.15.188.109  yes  libreswan     172.30.0.0/16, 10.128.0.0/14  connected  18.485885ms  

 ✓ Showing Endpoints 
CLUSTER ID                    ENDPOINT IP     PUBLIC IP       CABLE DRIVER        TYPE            
asuryanarhos                  192.168.198.54  66.187.232.129  libreswan           local           
asuryanaaws                   10.0.28.32      3.15.188.109    libreswan           remote          

 ✓ Showing Gateways 
NODE                            HA STATUS       SUMMARY                         
asuryanarhos-2jzx5-submariner-g active          All connections (1) are established

 ✓ Showing Network details
    Discovered network details via Submariner:
        Network plugin:  OVNKubernetes
        Service CIDRs:   [172.90.0.0/16]
        Cluster CIDRs:   [10.168.0.0/14]

 ✓ Showing versions 
COMPONENT                       REPOSITORY                                            VERSION         
submariner                      quay.io/submariner                                    devel           
submariner-operator             quay.io/submariner                                    devel           
service-discovery               quay.io/submariner                                    devel   

The test passed with AWS as first cluster and RHOS as the second and vice-versa.

Ran 23 of 41 Specs in 596.055 seconds
SUCCESS! -- 23 Passed | 0 Failed | 0 Pending | 18 Skipped

@aswinsuryan
Copy link
Contributor

@nyechiel yes I am trying to verify this, the setup has some issue and the network-plugin syncer pod is not coming up. Will keep posted.

Thanks. Please keep us posted. If the network-plugin-syncer pod does not come up, we might have a bigger problem...

@nyechiel raised this issue #1878

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working OVN priority:medium size:medium This can be implemented in a single sprint
Projects
None yet
Development

No branches or pull requests

9 participants