Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multi-cluster with networkPolicyOnly mode #4383

Closed
luolanzone opened this issue Nov 8, 2022 · 3 comments
Closed

Support multi-cluster with networkPolicyOnly mode #4383

luolanzone opened this issue Nov 8, 2022 · 3 comments
Labels
area/multi-cluster Issues or PRs related to multi cluster. kind/design Categorizes issue or PR as related to design.

Comments

@luolanzone
Copy link
Contributor

Describe what you are trying to solve
For now, Antrea multi-cluster is running with encap mode by default, it can't support the case when Antrea is running in a networkPolicyOnly mode.

Describe the solution you have in mind

Make Antrea multi-cluster work when Antrea is running in a networkPolicyOnly mode.

Describe how your solution impacts user flows

The primary CNI manages Pod IPs and routes. Antrea doesn't set up any Pod routes in OVS, and it has no tunnel interface for any in-cluster or cross-cluster traffic. So we need to create the tunnel interface and a general way to route cross-cluster traffic correctly.

  1. Create the default antrea-tun0 interface when Antrea is deployed with networkPolicyOnly mode and multi-cluster enabled.
  2. Set up openflow rules to allow Gateway to learn cross-cluster traffic and forward them correctly.
  • When the cross-cluster is sent by Pod A to Gateway A inside of Cluster A, The Gateway will save the request packet's source tunnel IP to NXM_NX_REG10.
table=Classifier, priority=210,in_port="antrea-tun0",dl_dst=aa:bb:cc:dd:ee:f0 actions=load:0x1- >NXM_NX_REG0[0. . 3], load: 0x1->NXM_NX_REG0[9], move: NXM_NX_ TUN_IPV4_SRC[0. . 31]- >NXM_NX_REG10[0..31],resubmit(,UnSNAT)

And commit the connection and save the source tunnel IP into ct_label[64..95]. Each general Node will have one following rules.

table=EgressMark, priority=210,reg10=0x6e0d34e2,dl_dst=aa:bb:cc:dd:ee:f0,ip
actions=ct(commit, table=L3DecTTL, zone=65520, exec(load: 0x6e0d34e2->NXM_NX_CT_LABEL[64. . 95]))
  • When the cross-cluster traffic from Pod A goes out of Gateway A and reaches the Gateway B on Cluster B. The Gateway B need to know which general Node this traffic belongs to. Antrea-agent can watch exported Service's Pod event to add Pod IP route to let Gateway B know how to forward this traffic. For each exported Service's backend Pod, it will have one following L3Forwarding rule. (110.14.0.25 and 110.14.40.30 are sample Pod IPs)
table=L3Forwarding, priority=210,ip,dl_dst=aa:bb:cc:dd:ee:f0,nw_dst=110.14.0.25 actions=mod_dl_src: 72: c4: f1: 24: 07: 0b, load: 0x6e0e072c->NXM_NX_ TUN_IPV4_DST[], load: 0x1- >NXM_NX_REG0[4. . 7], resubmit(, L3DecTTL)
...
table=L3Forwarding, priority=210,ip,dl_dst=aa:bb:cc:dd:ee:f0,nw_dst=110.14.40.30 actions=mod_dl_src: 72: c4: f1: 24: 07: 0b, load: 0x6e0e162e->NXM_NX_ TUN_IPV4_DST[], load: 0x1- >NXM_NX_REG0[4. . 7], resubmit(, L3DecTTL)
  • When the cross-cluster is replied by remote Pod B on cluster B and the traffic is back to Gateway on Cluster A. It will check the source tunnel IP on the ct_label[64..95] and set the tunnel destination IP correspondingly. For each general Node, it will have this kind of rule in the Gateway.
table=L3Forwarding, priority=210,ip, ct_state=+rpl+trk, ct_label=0x6e0d34e20000000000000000/0xffffffff0000000000000000, actions=mod_dl_src: 22: 65: c5: 61: e5: a9, mod_ dl_dst:aa:bb:cc:dd:ee:f0,load:0x6e0d34e2->NXM_NX_TUN_IPV4_DST[], load:0x1->NXM_NX_REG0[4..7], resubmit(, L3Dec TTL)

Describe the main design/architecture of your solution

The whole Antrea multi-cluster architecture has no impact. The main change is on Antrea agent side. And the flow changes are listed above.

Test plan

manual test in first phase.

@luolanzone luolanzone added kind/design Categorizes issue or PR as related to design. area/multi-cluster Issues or PRs related to multi cluster. labels Nov 8, 2022
@luolanzone
Copy link
Contributor Author

After a few discussion and performance test, We choose another solution for Pod routes which is simpler and easier to implement and maintain. In order to let Gateway know how to forward the Pod traffic back to general Node, Antrea-agent will simply watch all Pods and set up one rule per Pod in L3Fowarding table as below as long as the Pod is running in a general Node instead of Gateway itself.

table=L3Forwarding, priority=200,ip,dl_dst=aa:bb:cc:dd:ee:f0,nw_dst=110.13.37.137 actions=mod_dl_src:22:65:c5:61:e5:a9,load:0x6e0d34e2->NXM_NX_TUN_IPV4_DST[],load:0x1->NXM_NX_REG0[4..7],resubmit(,L3DecTTL)

And a regular tunnel classifier flow is also needed:

table=Classifier, priority=200,in_port="antrea-tun0" actions=load:0x1->NXM_NX_REG0[0..3],load:0x1->NXM_NX_REG0[9],resubmit(,UnSNAT)

Besides Pod routes issue in networkPolicyOnly mode, we still have following issues need to be addressed.

  1. Container Interface MTU
    The MTU is configured by primary CNI. After antrea-agent takes over the multi-cluster traffic over tunnel, it needs to update all Pods' interface's MTU to minus tunnel overload.
  • For any new Pods, chained antrea-agent can update MTU via Pod configuration process.
  • For any existing Pods, we need to ask user to restart them all, or apply a file like antrea-eks-node-init.yml to restart them automatically.
  1. In-cluster traffic for multi-cluster service
    When a client is trying to access a multi-cluster Service, the backend Service ClusterIP might be the Service in the same cluster instead of the Service from remote member cluster. In this case, multi-cluster Service is accessible but the traffic is not through tunnel.
    Regarding this difference between different MC Service's Endpoint (remote Service ClusterIP vs local Service ClusterIP), I feel stretched NetworkPolicy won't work as expected when the local Endpoint is chosen. @jianjuns Do you think this is acceptable or we need to make all traffics go through tunnel as long as it's multi-clustser Service's traffic? I am wondering do we need to support stretched NetworkPolicy on networkPolicyOnly mode? or we just need to ensure the multi-cluster traffic working on networkPolicyOnly mode.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 20, 2023
@luolanzone luolanzone removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 20, 2023
@luolanzone
Copy link
Contributor Author

This feature is supported in v1.11 by PR #4407, so close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/multi-cluster Issues or PRs related to multi cluster. kind/design Categorizes issue or PR as related to design.
Projects
None yet
Development

No branches or pull requests

1 participant