-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect Network Instance IP subnet conflict #3823
Detect Network Instance IP subnet conflict #3823
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #3823 +/- ##
=======================================
Coverage 17.51% 17.51%
=======================================
Files 3 3
Lines 805 805
=======================================
Hits 141 141
Misses 629 629
Partials 35 35 ☔ View full report in Codecov by Sentry. |
1816ee2
to
46c0d0d
Compare
Note that I will be backporting this to 11.0 and 10.4 (in 9.4 the zedrouter is vastly different and would require to make the patch from the ground up). |
if config.Subnet.IP == nil || config.Subnet.IP.IsUnspecified() { | ||
return false, fmt.Errorf("subnet unspecified for %s-%s: %+v", | ||
return fmt.Errorf("subnet unspecified for %s-%s: %+v", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes we never call this function on a switch network instance or on some future NI type where the IP addresses can be assigned after the NI has been created.
Why do we need to report it as a conflict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function doNetworkInstanceSubnetSanityCheck
is called only if config.IpType
is not AddressTypeNone
, so it will not be called for switch network instance.
Note that IP conflict detection was now moved to a separate function: checkNetworkInstanceIPConflicts
.
@@ -244,22 +244,22 @@ func (z *zedrouter) init() (err error) { | |||
|
|||
z.peUsagePersist, err = persistcache.New(types.PersistCachePatchEnvelopesUsage) | |||
if err != nil { | |||
log.Fatal(err) | |||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a separate fix (which would make sense in a separate commit)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is a minor unrelated fix, I will move it to a separate commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Is this the reason for the introduction of the BridgePort and the various BridgeIfName changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run tests
Yes, correct. Previously we didn't have the "capability" to attach VIF to a bridge and were relying on the hypervisor to do it for us. Now, to support reconnecting VIFs after the NI (and the bridge) was recreated, we have to re-attach VIFs to the bridge from the zedrouter. This is represented by |
Zedrouter is able to detect if NI IP subnet overlaps with the subnet of another NI or with a device port network. When a conflict is detected for a new NI, it is blocked from being created. If IP conflict arises for an already created NI (e.g. a device port later received IP from an overlapping subnet), the NI is unconfigured and VIFs are set DOWN (not deleted). In both cases an error is reported for the NI. This is crucial because, previously, an NI with a subnet overlapping with a device port network would result in the device permanently losing controller connectivity (due to routing conflicts), with very limited options to restore connectivity and make the device manageable again. Instead, the device should remain online, inform the user about the issue, and allow to correct the IP configuration. When IP conflict is detected for an already created NI with an app connected to it, we intentionally do not halt and undeploy the app. Instead, the app is left running, just VIFs connected to the problematic NI lose their connectivity. This allows the user to fix the network config or at least backup the application data when IP conflict arises. To enable the deletion of NI when an IP conflict is detected, followed by its later recreation when the conflict is resolved, and to reconnect already running apps, few enhancements had to be made to the NI Reconciler, particularly in the area of VIF bridging. Signed-off-by: Milan Lenco <milan@zededa.com>
zedrouter.Run() already calls log.Fatal (with os.Exit inside) when error is returned from init() or run() methods. Signed-off-by: Milan Lenco <milan@zededa.com>
46c0d0d
to
7aa32d4
Compare
Instead of detecting the subnet overlap, wouldn't it also be possible to bind the socket to the interface ( |
@christoph-zededa |
@christoph-zededa @eriknordmark Note that I plan to investigate options how to isolate network instances from uplinks and allow even overlapping subnets. In the past, I tested VRFs and network namespaces. Both work but introduce quite some complexity and data-plane overhead. Alternative solution could be to rework policy-based routing and to experiment with |
Thank you for the explanation - I see it would take some more time to implement. |
Zedrouter is able to detect if NI IP subnet overlaps with the subnet of another NI or with a device port network. When a conflict is detected for a new NI, it is blocked from being created. If IP conflict arises for an already created NI (e.g. a device port later received IP from an overlapping subnet), the NI is unconfigured and VIFs are set DOWN (not deleted). In both cases an error is reported for the NI.
This is crucial because, previously, a NI with a subnet overlapping with a device port network would result in the device permanently losing controller connectivity (due to routing conflicts), with very limited options to restore connectivity and make the device manageable again. Instead, the device should remain online, inform the user about the issue, and allow to correct the IP configuration.
When IP conflict is detected for an already created NI with an app connected to it, we intentionally do not halt and undeploy the app. Instead, the app is left running, just VIFs connected to the problematic NI lose their connectivity. This allows the user to fix the network config or at least backup the application data when IP conflict arises.
To enable the deletion of NI when an IP conflict is detected, followed by its later recreation when the conflict is resolved, and to reconnect already running apps, few enhancements had to be made to the NI Reconciler, particularly in the area of VIF bridging.