[Multicast] Use group as flow actions for multicast traffic #3508

wenyingd · 2022-03-23T07:32:47Z

Add local Pod receivers into an OpenFlow type "all" group for each
multicast group, and use such groups in the flow actions. Remove a Pod
from group buckets if the Pod has left the multicast group or is deleted
before leaving the multicast group.
Improve e2e tests.

Signed-off-by: wenyingd wenyingd@vmware.com

wenyingd · 2022-03-23T07:46:34Z

/test-all
/test-multicast-e2e
/skip-windows-all
/skip-ipv6-all
/skip-ipv6-only-all

codecov-commenter · 2022-03-23T08:46:29Z

Codecov Report

Merging #3508 (95758d1) into main (be1116e) will decrease coverage by 14.85%.
The diff coverage is 36.78%.

@@             Coverage Diff             @@
##             main    #3508       +/-   ##
===========================================
- Coverage   64.50%   49.65%   -14.86%     
===========================================
  Files         278      391      +113     
  Lines       39500    55277    +15777     
===========================================
+ Hits        25481    27447     +1966     
- Misses      12026    25711    +13685     
- Partials     1993     2119      +126

Flag	Coverage Δ
integration-tests	`38.18% <ø> (?)`
kind-e2e-tests	`32.19% <7.38%> (-19.96%)`	⬇️
unit-tests	`43.78% <34.71%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pkg/agent/openflow/framework.go	`88.77% <0.00%> (-0.92%)`	⬇️
pkg/agent/openflow/multicast.go	`0.00% <0.00%> (ø)`
pkg/agent/openflow/pipeline.go	`62.47% <0.00%> (-12.45%)`	⬇️
pkg/agent/proxy/proxier.go	`57.07% <0.00%> (-5.86%)`	⬇️
pkg/agent/openflow/client.go	`57.81% <5.88%> (-14.37%)`	⬇️
pkg/agent/cniserver/pod_configuration.go	`35.37% <50.00%> (-18.39%)`	⬇️
pkg/agent/multicast/mcast_controller.go	`61.38% <50.52%> (-6.07%)`	⬇️
pkg/ovs/openflow/ofctrl_bridge.go	`56.07% <60.00%> (-0.84%)`	⬇️
pkg/agent/controller/egress/egress_controller.go	`58.60% <100.00%> (-15.08%)`	⬇️
pkg/agent/controller/networkpolicy/cache.go	`85.71% <100.00%> (-3.02%)`	⬇️
... and 237 more

wenyingd · 2022-03-28T03:08:16Z

/test-multicast-e2e

wenyingd · 2022-03-28T09:27:52Z

/test-all
/test-multicast-e2e
/skip-windows-all
/skip-ipv6-all
/skip-ipv6-only-all

wenyingd · 2022-03-30T02:13:41Z

/test-all
/test-multicast-e2e
/skip-windows-all
/skip-ipv6-all
/skip-ipv6-only-all

wenyingd · 2022-03-30T03:36:26Z

/test-multicast-e2e

wenyingd · 2022-03-31T08:59:44Z

/test-all
/test-multicast-e2e
/skip-windows-all
/skip-ipv6-all
/skip-ipv6-only-all

jianjuns

In the commit message:

Add local pod receivers into an OpenFlow "all" type group for each
multicast group, and use such group in the flow action for multicast
traffic. Remove pod from group buckets if the pod has left the
multicast group or is deleted before leaving multicast group.

"all" type group -> type "all" group

such group in the flow action -> such groups in the flow actions

Remove pod -> Remove a Pod

pod -> Pod

leaving multicast group -> leaving the multicast group

pkg/agent/multicast/mcast_controller.go

jianjuns · 2022-03-31T18:12:28Z

pkg/agent/multicast/mcast_controller.go

+// leave event so that antrea can remove the corresponding interface from local multicast receivers on OVS. This function
+// should be called if the removed Pod receiver fails to send IGMP leave message before deletion.
+func (c *Controller) removeLocalInterface(ifConfig *interfacestore.InterfaceConfig) {
+	groupStatuses := c.getGroupMemberStatusesByPod(ifConfig.InterfaceName)


Can we have a race condition that a Pod is recreated with the same interface name? Maybe the chance (a new interface is created and even joined groups) is too low to consider. Just asking.

The group cache is actually written by only one worker, so here we only constructs the event, and let the writable worker to modify cache.

In theory, the race condition should happens when a Pod with same name and namespace is created before we find the old Pod interfaces is completely removed from memory interface store. That requirs the time to receive the event is longer than preparing for a new Pod. But the event is triggered in memory. I don't think this race condition exists.

pkg/agent/multicast/mcast_controller.go

pkg/agent/openflow/multicast.go

ceclinux · 2022-04-06T00:53:14Z

pkg/ovs/openflow/ofctrl_bridge.go

@@ -189,8 +189,16 @@ type OFBridge struct {
 	multipartReplyChs map[uint32]chan *openflow13.MultipartReply
 }

+func (b *OFBridge) CreateGroupTypeAll(id GroupIDType) Group {


CreateGroupTypeAll(id GroupIDType) and CreateGroup(id GroupIDType) confuses me. Why not CreateGroupTypeAll(id GroupIDType) and CreateGroupTypeSelect(id GroupIDType) or just CreateGroup(id GroupIDType, , groupType ofctrl.GroupType) ?

Since CreateGroup is an existing function to use type "select" when creating OpenFlow group, which is called in AntreaProxy function. I don't want to change the existing code unrelated with this change, that's why I didn't rename it. @hongliangl Would you share your point, do you think the existing function CreateGroup as CreateGroupTypeSelect is better?

I think we can make another PR to rename CreateGroup to CreateGroupTypeSelect since we have another type of group.

pkg/agent/multicast/mcast_controller_test.go

pkg/agent/multicast/mcast_controller.go

ceclinux · 2022-04-07T02:05:36Z

/test-multicast-e2e

ceclinux · 2022-04-08T03:26:25Z

After checking the test code and running several rounds on e2e testbed,
I suggest adding two t.Parallel() in testMulticastBetweenPodsInThreeNodes and testMulticastBetweenPodsInTwoNodes to parallelize tests to further speed up e2e tests, cutting total running time in half approximately.

t.Run(mc.name, func(t *testing.T) {
		t.Parallel()
		runTestMulticastBetweenPods(t, data, mc, nodeMulticastInterfaces)
})

wenyingd · 2022-04-25T11:16:00Z

/test-all
/test-multicast-e2e
/skip-windows-all
/skip-ipv6-all
/skip-ipv6-only-all

wenyingd · 2022-04-26T03:37:50Z

/test-networkpolicy
/test-e2e

wenyingd · 2022-04-26T08:32:57Z

/test-all
/test-multicast-e2e
/skip-windows-all
/skip-ipv6-all
/skip-ipv6-only-all

tnqn · 2022-04-28T16:23:34Z

pkg/util/channel/channel.go

@@ -37,7 +37,20 @@ type Subscriber interface {

 type Notifier interface {
 	// Notify sends an event to the channel.
-	Notify(string) bool
+	Notify(string, EventType, ...string) bool


The channel is designed to be generic. Having EventType would limit its usage scenarios.
Since string cannot meet your requirement. It could use a more generic struct: interface{}. A specific channel's consumer should convert it to a specific type, just like the generic workqueue and store interfaces.
The type could be defined in pkg/agent/types/event.go:

type CNIEvent struct { PodNamespace string PodName string IsAdd bool ContainerID string }

Make sence, I would change. One question is, where would you suggest to place such event types, e.g., "CNIEvent", as it should be accssed by different packages? @tnqn

By now, I put such event types in package pkg/util/channel

The package for generic channel is not suitable for concrete event types. I suggested a place in above comment:

The type could be defined in pkg/agent/types/event.go

And perhaps name it PodUpdate to avoid confusion between it and the real CNI event.

tnqn

Since this PR switches to use group for multicast, are the previous configurations like mcast_snooping_enable and mcast-snooping-disable-flood-unregistered still required?

tnqn · 2022-05-05T07:19:12Z

pkg/agent/multicast/mcast_controller.go

 	// installedGroups saves the groups which are configured on both OVS and the host.
 	installedGroups      sets.String
 	installedGroupsMutex sync.RWMutex
 	mRouteClient         *MRouteClient
 	ovsBridgeClient      ovsconfig.OVSBridgeClient
+	podDeletionCh        chan *interfacestore.InterfaceConfig


forget to remove?

pkg/agent/multicast/mcast_controller.go

pkg/agent/openflow/multicast.go

tnqn · 2022-05-05T07:38:24Z

pkg/agent/openflow/multicast.go

+		group := value.(binding.Group)
+		group.Reset()
+		if err := group.Add(); err != nil {
+			klog.Errorf("Error when replaying cached group %d: %v", id, err)


Suggested change

klog.Errorf("Error when replaying cached group %d: %v", id, err)

klog.ErrorS(err, "Error when replaying cached group", "group", id)

wenyingd · 2022-05-05T09:36:01Z

Since this PR switches to use group for multicast, are the previous configurations like mcast_snooping_enable and mcast-snooping-disable-flood-unregistered still required?

No, these configurations are not required. I removed the call in my latest update. But do you think we should also remove the settings in OVSDB for upgrade case? It doesn't take bad effect even if we don't remove them because we don't use "normal" actions in OpenFlow entries, so the traffic is actually forwarded by Antrea flows but not OVS multicast db cache.

tnqn · 2022-05-05T09:43:23Z

Since this PR switches to use group for multicast, are the previous configurations like mcast_snooping_enable and mcast-snooping-disable-flood-unregistered still required?

No, these configurations are not required. I removed the call in my latest update. But do you think we should also remove the settings in OVSDB for upgrade case? It doesn't take bad effect even if we don't remove them because we don't use "normal" actions in OpenFlow entries, so the traffic is actually forwarded by Antrea flows but not OVS multicast db cache.

I think no need to remove the setting for ugprade case. It was alpha and harmless as you pointed out.

wenyingd · 2022-05-05T10:45:27Z

/test-all
/test-multicast-e2e
/skip-windows-all
/skip-ipv6-all
/skip-ipv6-only-all

tnqn

LGTM, a minor comment

pkg/agent/cniserver/pod_configuration.go

wenyingd · 2022-05-05T11:19:19Z

/test-all
/test-multicast-e2e
/skip-windows-all
/skip-ipv6-all
/skip-ipv6-only-all

pkg/agent/controller/networkpolicy/cache.go

1. Add local Pod receivers into an OpenFlow type "all" group for each multicast group, and use such groups in the flow actions. Remove a Pod from group buckets if the Pod has left the multicast group or is deleted before leaving the multicast group. 2. Improve multicast e2e tests. Signed-off-by: wenyingd <wenyingd@vmware.com> Co-authored-by: Ruochen Shen <src655@gmail.com>

wenyingd · 2022-05-06T01:26:05Z

/test-all
/test-multicast-e2e
/skip-windows-all
/skip-ipv6-all
/skip-ipv6-only-all

tnqn

LGTM

wenyingd force-pushed the multicast_flexible_pipeline branch from 6becda3 to 995fb39 Compare March 28, 2022 09:16

wenyingd force-pushed the multicast_flexible_pipeline branch 2 times, most recently from 67ec708 to 0f7e808 Compare March 29, 2022 09:05

wenyingd force-pushed the multicast_flexible_pipeline branch from 0f7e808 to 1abbd7d Compare March 31, 2022 02:32

wenyingd requested review from ceclinux, liu4480, jianjuns and tnqn March 31, 2022 04:01

wenyingd force-pushed the multicast_flexible_pipeline branch from 1abbd7d to 6db7759 Compare March 31, 2022 08:36

jianjuns reviewed Mar 31, 2022

View reviewed changes

liu4480 reviewed Apr 1, 2022

View reviewed changes

pkg/agent/multicast/mcast_controller.go Outdated Show resolved Hide resolved

liu4480 reviewed Apr 1, 2022

View reviewed changes

pkg/agent/multicast/mcast_controller.go Outdated Show resolved Hide resolved

wenyingd force-pushed the multicast_flexible_pipeline branch from 6db7759 to 1f3ffd8 Compare April 1, 2022 15:33

jianjuns reviewed Apr 1, 2022

View reviewed changes

pkg/agent/multicast/mcast_controller.go Outdated Show resolved Hide resolved

pkg/agent/multicast/mcast_controller.go Outdated Show resolved Hide resolved

ceclinux reviewed Apr 6, 2022

View reviewed changes

pkg/agent/multicast/mcast_controller.go Outdated Show resolved Hide resolved

ceclinux reviewed Apr 6, 2022

View reviewed changes

pkg/agent/openflow/multicast.go Outdated Show resolved Hide resolved

ceclinux reviewed Apr 6, 2022

View reviewed changes

pkg/agent/multicast/mcast_controller_test.go Outdated Show resolved Hide resolved

ceclinux reviewed Apr 6, 2022

View reviewed changes

pkg/agent/multicast/mcast_controller.go Show resolved Hide resolved

wenyingd force-pushed the multicast_flexible_pipeline branch 2 times, most recently from e52a6e0 to 0c43c42 Compare April 11, 2022 10:29

wenyingd force-pushed the multicast_flexible_pipeline branch from 98fcbe9 to bdb43b6 Compare April 25, 2022 07:11

wenyingd force-pushed the multicast_flexible_pipeline branch from bdb43b6 to 87629c0 Compare April 26, 2022 07:07

tnqn reviewed Apr 28, 2022

View reviewed changes

wenyingd force-pushed the multicast_flexible_pipeline branch 2 times, most recently from daa38db to 57a6bc5 Compare April 29, 2022 03:24

wenyingd requested review from tnqn and removed request for liu4480 April 29, 2022 14:23

tnqn reviewed May 5, 2022

View reviewed changes

wenyingd force-pushed the multicast_flexible_pipeline branch from 57a6bc5 to 0de756a Compare May 5, 2022 08:35

wenyingd requested a review from tnqn May 5, 2022 09:27

wenyingd force-pushed the multicast_flexible_pipeline branch from 0de756a to 1ca467a Compare May 5, 2022 09:41

wenyingd force-pushed the multicast_flexible_pipeline branch from 1ca467a to fa7bca7 Compare May 5, 2022 09:51

tnqn previously approved these changes May 5, 2022

View reviewed changes

pkg/agent/cniserver/pod_configuration.go Outdated Show resolved Hide resolved

wenyingd dismissed tnqn’s stale review via 95758d1 May 5, 2022 11:18

wenyingd force-pushed the multicast_flexible_pipeline branch from fa7bca7 to 95758d1 Compare May 5, 2022 11:18

tnqn reviewed May 5, 2022

View reviewed changes

pkg/agent/controller/networkpolicy/cache.go Outdated Show resolved Hide resolved

wenyingd force-pushed the multicast_flexible_pipeline branch from 95758d1 to f50f7ca Compare May 6, 2022 00:21

tnqn approved these changes May 6, 2022

View reviewed changes

tnqn merged commit 50d33be into antrea-io:main May 6, 2022

wenyingd deleted the multicast_flexible_pipeline branch May 6, 2022 05:57

wenyingd mentioned this pull request Sep 14, 2022

Multicast Support in Antrea #2251

Closed

12 tasks

	klog.Errorf("Error when replaying cached group %d: %v", id, err)
	klog.ErrorS(err, "Error when replaying cached group", "group", id)

[Multicast] Use group as flow actions for multicast traffic #3508

[Multicast] Use group as flow actions for multicast traffic #3508

Conversation

wenyingd commented Mar 23, 2022 • edited Loading

wenyingd commented Mar 23, 2022

codecov-commenter commented Mar 23, 2022 • edited Loading

Codecov Report

wenyingd commented Mar 28, 2022

wenyingd commented Mar 28, 2022

wenyingd commented Mar 30, 2022

wenyingd commented Mar 30, 2022

wenyingd commented Mar 31, 2022

jianjuns left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hongliangl Apr 11, 2022 • edited Loading

Choose a reason for hiding this comment

ceclinux commented Apr 7, 2022

ceclinux commented Apr 8, 2022

wenyingd commented Apr 25, 2022

wenyingd commented Apr 26, 2022

wenyingd commented Apr 26, 2022

tnqn Apr 28, 2022 • edited Loading

Choose a reason for hiding this comment

wenyingd Apr 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tnqn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wenyingd commented May 5, 2022 • edited Loading

tnqn commented May 5, 2022

wenyingd commented May 5, 2022

tnqn left a comment

Choose a reason for hiding this comment

wenyingd commented May 5, 2022

wenyingd commented May 6, 2022

tnqn left a comment

Choose a reason for hiding this comment

wenyingd commented Mar 23, 2022 •

edited

Loading

codecov-commenter commented Mar 23, 2022 •

edited

Loading

hongliangl Apr 11, 2022 •

edited

Loading

tnqn Apr 28, 2022 •

edited

Loading

wenyingd Apr 29, 2022 •

edited

Loading

wenyingd commented May 5, 2022 •

edited

Loading