Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an agent config parameter for enabling flexible IPAM (bridging mode) #3297

Merged
merged 1 commit into from
Feb 26, 2022

Conversation

jianjuns
Copy link
Contributor

@jianjuns jianjuns commented Feb 9, 2022

In the current implementation, enabling the AntreaIPAM feature gate will
enable both AntreaIPAM which allocates IPs from IPPools, and Node side
datapath configurations to support traffic forwarding of AntreaIPAM Pods
whose IPs are allocated from IPPools by AntreaIPAM.
This commit adds a separate config parameter - enableBridgingMode - to
antrea-agent, to enable/disable support for AntreaIPAM Pods on a Node.
Only when enableBridgingMode is set to true, Antrea Agent connects the
uplink interface to the OVS bridge, requests IP allocation from
AntreaIPAM and implements traffic forwarding for AntreaIPAM (bridging
mode) Pods. This change is also needed to extend AntreaIPAM for other
IPAM use cases, in addition to IPAM for bridging mode Pods.

Signed-off-by: Jianjun Shen shenj@vmware.com

@jianjuns jianjuns added the area/ipam Issues or PRs related to IP address management (IPAM). label Feb 9, 2022
@jianjuns
Copy link
Contributor Author

jianjuns commented Feb 9, 2022

@gran-vmv @annakhm @tnqn @antoninbas : my intention is to separate the Antrea IPAM and the datapath changes to support "flexible IPAM" Pods, and later we can enable Antrea IPAM for other use cases like secondary network. I do not have a good idea how to name the parameter for the latter, and now call it "enableBridgingName", given the forwarding pipeline is like VLAN bridging. But please share if you have a better name.

@codecov-commenter
Copy link

Codecov Report

Merging #3297 (35ed1a1) into main (5ccd3d1) will decrease coverage by 17.21%.
The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #3297       +/-   ##
===========================================
- Coverage   59.04%   41.82%   -17.22%     
===========================================
  Files         331      199      -132     
  Lines       28420    24099     -4321     
===========================================
- Hits        16781    10080     -6701     
- Misses       9798    12987     +3189     
+ Partials     1841     1032      -809     
Flag Coverage Δ
kind-e2e-tests ?
unit-tests 41.82% <0.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/agent_linux.go 0.00% <0.00%> (-3.63%) ⬇️
pkg/ovs/openflow/default.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/util/runtime/runtime.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/agent/nodeportlocal/k8s/annotations.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/ovs/openflow/logs.go 9.52% <0.00%> (-90.48%) ⬇️
pkg/apis/controlplane/register.go 0.00% <0.00%> (-90.00%) ⬇️
pkg/agent/controller/networkpolicy/reject.go 0.00% <0.00%> (-87.91%) ⬇️
pkg/agent/client.go 0.00% <0.00%> (-77.42%) ⬇️
pkg/ovs/ovsconfig/ovs_client_linux.go 0.00% <0.00%> (-76.93%) ⬇️
pkg/flowaggregator/certificate.go 0.00% <0.00%> (-76.58%) ⬇️
... and 248 more

@@ -98,6 +99,15 @@ featureGates:
# - wireGuard: Enable WireGuard for tunnel traffic encryption.
#trafficEncryptionMode: none

# Eanble bridging mode of Pod network on Nodes, in which the Node's transport interface is connected
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: s/Eanble/Enable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -31,8 +31,9 @@ featureGates:
# Enable controlling SNAT IPs of Pod egress traffic.
# Egress: false

# Enable flexible IPAM mode for Antrea. This mode allows to assign IP Ranges to Namespaces,
# Deployments and StatefulSets via IP Pool annotation.
# Enable AntreaIPAM. AntreaIPAM is required by the bridging mode, and can assign IPs to bridging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe:

can assign IPs to Pods connected in bridging mode

BTW, is "bridging mode" a standard term and does it mean something different from "bridge mode"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said in the last comment, I do not have a good idea on how to name this mode/parameter. I call it bridging mode, just because the datapath behavior is like L2 bridging. And I welcome suggestions for a better name.

I think "bridge" vs "bridging" is like "switch" vs "switching".

return nil
}
if !features.DefaultFeatureGate.Enabled(features.AntreaIPAM) {
klog.InfoS("The enableBridgingMode option is set to true, but it will be ignored because the AntreaIPAM feature gate is disabled")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious about why we simply log a warning here, while we fail with an error if noSNAT is disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I returned an error originally, but then saw enableNodePortLocal just log a message and ignore the config if NodePortLocal feature gate is not enabled. Start to think it might be a rational if feature gate is for activating/deactivating a separate feature.

But let me know what you think.

Comment on lines 35 to 36
# mode Pods from the IPPools configured for Namespaces, Deployments, StatefulSets, or individual
# Pods.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is the right place to mention it but as it is phrased at the moment, it makes it sound like AntreaIPAM only make sense in bridging mode. While it may be the case now, I believe that the intention of this PR is to support other use cases for AntreaIPAM, which won't require bridging mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I can remove "from the IPPools configured for Namespaces, Deployments, StatefulSets, or individual Pods".

if !o.config.NoSNAT {
return fmt.Errorf("AntreaIPAM requires noSNAT")
}
if !o.config.EnableBridgingMode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break existing setup when upgrading a FlexibleIPAM cluster. Is this acceptable?

Copy link
Contributor Author

@jianjuns jianjuns Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I feel it is ok, assuming no one really uses the feature in production yet, and users can set EnableBridgingMode to true in the ConfigMap when upgrading.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. And we should notify users this change in Release Note.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tnqn Any comments about this behavior?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am good with this, this is an Alpha feature

cmd/antrea-agent/agent.go Show resolved Hide resolved
@@ -53,6 +53,9 @@ func (o *Options) checkUnsupportedFeatures() error {
if encryptionMode != config.TrafficEncryptionModeNone {
unsupported = append(unsupported, "TrafficEncryptionMode: "+encryptionMode.String())
}
if o.config.EnableBridgingMode {
unsupported = append(unsupported, "EnableBridgingMode")
Copy link
Contributor

@gran-vmv gran-vmv Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In future release, FlexibleIPAM feature may support Windows. However, it is already bridging mode forced to be enabled on Windows, although some details and functions are slightly different.
Maybe we need to distinguish between basic bridging mode on Windows and advanced bridging mode with FlexibleIPAM when naming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bridging mode means connecting uplink to bridge + Pod flexible IPAM. Again, if you have a better name, please share.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenyingd @tnqn Do you have any idea on this topic?

@@ -102,14 +102,21 @@ type AgentConfig struct {
TrafficEncryptionMode string `yaml:"trafficEncryptionMode,omitempty"`
// WireGuard related configurations.
WireGuard WireGuardConfig `yaml:"wireGuard"`
// Eanble bridging mode of Pod network on Nodes, in which the Node's transport interface is connected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Eanble bridging mode of Pod network on Nodes, in which the Node's transport interface is connected
// Enable bridging mode of Pod network on Nodes, in which the Node's transport interface is connected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought I pushed the fix, but seems not..

// to the OVS bridge, and cross-Node/VLAN traffic from AntreaIPAM Pods (Pods whose IP addresses are
// allocated by AntreaIPAM from IPPools) is sent to the underlay network via the uplink, and
// forwarded/routed by the underlay network.
// This option requires the `AntreaIPAM` feature gate to be enabled. At this moment, it supports only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this part. EnableBridgingMode requires AntreaIPAM or AntreaIPAM requires EnableBridgingMode? I think technically EnableBridgingMode can just work without AntreaIPAM enabled, but just doesn't make any difference from function's perspective?

Copy link
Contributor Author

@jianjuns jianjuns Feb 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current situation is:

  • AntreaIPAM is a feature gate for both AntreaIPAM and bridging mode (connecting uplink to bridge, flexible IPAM for Pods, and datapath changes for such Pods).
  • EnableBridgingMode is a config option to enable bridging mode.
  • We do not have a config option to enable IPAM (probably it is not needed in my mind as IPAM itself does add much overhead to controller or agent?).

@jianjuns jianjuns force-pushed the bridging-mode branch 2 times, most recently from ad439f9 to 0421429 Compare February 11, 2022 19:15
# Enable flexible IPAM mode for Antrea. This mode allows to assign IP Ranges to Namespaces,
# Deployments and StatefulSets via IP Pool annotation.
# Enable AntreaIPAM. AntreaIPAM is required by the bridging mode and can assign IPs to Pods in
# bridging mode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we still mention what is purpose for this feature (to allocate IPs via pools?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Let me add that.

@gran-vmv
Copy link
Contributor

/test-all
/test-flexible-ipam-e2e
/test-integration

@gran-vmv
Copy link
Contributor

/test-all
/test-flexible-ipam-e2e

@gran-vmv
Copy link
Contributor

NPL e2e failed.

I0221 06:03:58.925293       8 iptables.go:171] "Appended a rule" rule=[-p all -m addrtype --dst-type LOCAL -j ANTREA-NODE-PORT-LOCAL] table="nat" chain="OUTPUT" protocol="IPv4"
panic: runtime error: invalid memory address or nil pointer dereference
goroutine 86 [running]:
runtime/debug.Stack()
	/usr/local/go/src/runtime/debug/stack.go:24 +0x65
github.com/confluentinc/bincover.RunTest.func1()
	/go/pkg/mod/github.com/confluentinc/bincover@v0.1.0/instrument_bin.go:84 +0x39
panic({0x1e8d0c0, 0x37571b0})
	/usr/local/go/src/runtime/panic.go:1038 +0x215
antrea.io/antrea/pkg/agent/nodeportlocal/k8s.NewNPLController({0x2548e10, 0xc000696420}, {0x0, 0x0}, {0x25213a0, 0xc0004326e0}, 0xc0007ef5c0, {0xc00005605a, 0x3c})
	/antrea/pkg/agent/nodeportlocal/k8s/npl_controller.go:76 +0x9a
antrea.io/antrea/pkg/agent/nodeportlocal.InitializeNPLAgent({0x2548e10, 0xc000696420}, {0x253c3e8, 0xc000469e00}, 0x2530510, 0xc00069ffe0, {0xc00005605a, 0x3c}, {0x0, 0x0})
	/antrea/pkg/agent/nodeportlocal/npl_agent_init.go:48 +0x126
antrea.io/antrea/cmd/antrea-agent.run(0xc000477e30)
	/antrea/cmd/antrea-agent/agent.go:477 +0x33e5
antrea.io/antrea/cmd/antrea-agent.newAgentCommand.func1(0xc00043b600, {0xc000473100, 0x0, 0x8})
	/antrea/cmd/antrea-agent/main.go:57 +0x1ef
github.com/spf13/cobra.(*Command).execute(0xc00043b600, {0xc0000e8010, 0x8, 0x8})
	/go/pkg/mod/github.com/spf13/cobra@v1.1.1/command.go:854 +0x5f8
github.com/spf13/cobra.(*Command).ExecuteC(0xc00043b600)
	/go/pkg/mod/github.com/spf13/cobra@v1.1.1/command.go:958 +0x3ad
github.com/spf13/cobra.(*Command).Execute(...)
	/go/pkg/mod/github.com/spf13/cobra@v1.1.1/command.go:895
antrea.io/antrea/cmd/antrea-agent.main()
	/antrea/cmd/antrea-agent/main.go:37 +0x4a
github.com/confluentinc/bincover.RunTest(0x22a7ea0)
	/go/pkg/mod/github.com/confluentinc/bincover@v0.1.0/instrument_bin.go:93 +0x21b
antrea.io/antrea/cmd/antrea-agent.TestBincoverRunMain(0x0)
	/antrea/cmd/antrea-agent/bincover_run_main_test.go:27 +0x25
testing.tRunner(0xc000502820, 0x22a7e98)
	/usr/local/go/src/testing/testing.go:1259 +0x102
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:1306 +0x35a
START_BINCOVER_METADATA
{"cover_mode":"count","exit_code":1}
END_BINCOVER_METADATA
PASS
I0221 06:03:58.942349       8 route_linux.go:209] Successfully synced node iptables and routes

@jianjuns
Copy link
Contributor Author

/test-e2e

In the current implementation, enabling the AntreaIPAM feature gate will
enable both AntreaIPAM which allocates IPs from IPPools, and Node side
datapath configurations to support traffic forwarding of AntreaIPAM Pods
whose IPs are allocated from IPPools by AntreaIPAM.
This commit adds a separate config parameter - enableBridgingMode - to
Antrea Agent, to enable/disable support for AntreaIPAM Pods on a Node.
Only when enableBridgingMode is set to true, Antrea Agent connects the
uplink interface to the OVS bridge, requests IP allocation from
AntreaIPAM and implements traffic forwarding for AntreaIPAM (bridging
mode) Pods. This change is also needed to extend AntreaIPAM for other
IPAM use cases, in addition to IPAM for bridging mode Pods.

Signed-off-by: Jianjun Shen <shenj@vmware.com>
@jianjuns
Copy link
Contributor Author

NPL e2e failed.

Ah, my change introduced a bug when creating localPodInformer. Already fixed it. Thanks for reporting it! @gran-vmv

@jianjuns
Copy link
Contributor Author

/test-e2e

@gran-vmv
Copy link
Contributor

/test-conformance
/test-networkpolicy
/test-ipv6-all
/test-flexible-ipam-e2e

@gran-vmv
Copy link
Contributor

/test-windows-all
/test-all-features-conformance
/test-ipv6-only-all
/test-multicluster-e2e

@gran-vmv
Copy link
Contributor

/test-hw-offload

@gran-vmv
Copy link
Contributor

/test-multicluster-e2e
/test-windows-networkpolicy

Copy link
Contributor

@gran-vmv gran-vmv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gran-vmv
Copy link
Contributor

/test-e2e

@jianjuns
Copy link
Contributor Author

@antoninbas @tnqn : are you good with the name "bridging mode"?

@antoninbas
Copy link
Contributor

@antoninbas @tnqn : are you good with the name "bridging mode"?

yes, no issue on my side

@gran-vmv
Copy link
Contributor

@tnqn @antoninbas are you good with the upgrade issue I mentioned at #3297 (comment) ?

@jianjuns jianjuns merged commit 65e3574 into antrea-io:main Feb 26, 2022
@jianjuns
Copy link
Contributor Author

I will create a new PR to update IPAM documentation and describe the new config parameter.

bangqipropel pushed a commit to bangqipropel/antrea that referenced this pull request Mar 2, 2022
…de) (antrea-io#3297)

In the current implementation, enabling the AntreaIPAM feature gate will
enable both AntreaIPAM which allocates IPs from IPPools, and Node side
datapath configurations to support traffic forwarding of AntreaIPAM Pods
whose IPs are allocated from IPPools by AntreaIPAM.
This commit adds a separate config parameter - enableBridgingMode - to
Antrea Agent, to enable/disable support for AntreaIPAM Pods on a Node.
Only when enableBridgingMode is set to true, Antrea Agent connects the
uplink interface to the OVS bridge, requests IP allocation from
AntreaIPAM and implements traffic forwarding for AntreaIPAM (bridging
mode) Pods. This change is also needed to extend AntreaIPAM for other
IPAM use cases, in addition to IPAM for bridging mode Pods.

Signed-off-by: Jianjun Shen <shenj@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ipam Issues or PRs related to IP address management (IPAM).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants