distros.networking: initial implementation of layout #391

OddBloke · 2020-05-26T14:36:41Z

This pull request introduces the initial structure for the "cloudinit.net -> cloudinit.distros.networking Hierarchy" refactor, as detailed in [0]. It also updates that section with some changes driven by this initial implementation, as well as adding a lot more specifics to it.

[0] https://cloudinit.readthedocs.io/en/latest/topics/hacking.html#cloudinit-net-cloudinit-distros-networking-hierarchy

igalic

👀

igalic · 2020-05-26T17:11:59Z

HACKING.rst

+
+  * ``get_master``
+  * ``device_devid``
+  * ``device_driver``


most other Unices have the name of the device driver in the name of the device

So it sounds like a function which extracts that driver from the device name would still make sense?

(For my own information: what do such device names look like?)

Do we need to expose that in a public API? I've the feeling the information is just required by the backend driver itself.

Do we need to expose that in a public API?

This isn't a public API, it is only intended for internal cloud-init use.

I've the feeling the information is just required by the backend driver itself.

device_driver is called by a bunch of Linux-specific functions, but also extract_physdevs and generate_fallback_config. These latter two are expected to work on every platform, so I don't think we can only have device_driver present on LinuxNetworking in this initial layout. It's possible that as we perform the refactor we will discover that we don't need device_driver on the super-class after all, and I think we can move it at that point.

Does that sound right (and reasonable)?

igalic · 2020-05-26T17:23:55Z

HACKING.rst

+  * ``is_netfail_standby``
+
+* those that use ``/sys`` (via helpers) and have non-exhaustive BSD
+  logic:


the logic may be non-exhaustive, but it's more than enough :P

There's only a FreeBSD check in it, it falls through to the /sys-based implementation on {Open,Net,Dragonfly}BSD.

igalic · 2020-06-05T06:37:57Z

HACKING.rst

+  * ``natural_sort_key``
+
+Note that the functions in ``cloudinit.net`` use inconsistent parameter
+names for "string that cohtains a device name"; we can standardise on


raharper · 2020-06-05T14:06:28Z

HACKING.rst

-our various distros, while still allowing easy reuse of code between
+in ``cloudinit.distros.networking``, which each ``Distro`` subclass
+will reference.  These will capture the differences between networking
+on our various distros, while still allowing easy reuse of code between
 distros that share functionality (e.g. most of the Linux networking
 behaviour).  Callers will call ``distro.net.func`` instead of


distro.networking.func?

distro.net is where the instantiated Networking class is stored (on an instantiated Distro). That naming isn't introduced until my first "In more detail" bullet, however, so let me rework this to be clearer.

raharper · 2020-06-05T14:06:43Z

HACKING.rst


-* pick an unmigrated ``cloudinit.distros.Networking`` method
+* pick an unmigrated ``cloudinit.distros.networking.Networking`` method
 * refactor all of its callers to call the ``distro.net`` method on


distro.networking

Similarly, let me rework things to make it a bit clearer.

HACKING.rst

raharper · 2020-06-05T14:09:59Z

HACKING.rst

+    * this has non-distro-specific logic so should potentially be
+      refactored to use helpers on ``self`` instead of ``ip`` directly
+      (rather than being wholesale reimplemented in each of
+      ``BSDNetworking`` or ``LinuxNetworking``)


The nic renaming bit might be Linux specific; not sure if BSD has similar issues with Linux where the eth* kernel device namespace has issues with being renamed by user-space.

Right, that's a good point. @igalic @goneri can you let us know what you think about network device renaming on *BSD?

the main reason we do it is because it helps us reuse the vendor's cloud-config, which usually uses eth0, even though they of all people should know what the devices will be named

I'm convinced no BSD user wants to see the interface renamed on the fly, especially to something super Linux-ish like eth0. Also, today, the feature only works on FreeBSD.

Excluding user-generated networking configs; the primary reason for renaming nics is to bind a provided network config to a specific interface (identified by MAC).

Some virtualization platforms do not guarantee that the device name of the NIC is stable across reboots; that means eth0 with macA may be eth1 on the next boot, or in the case of Linux systemd, the "persistent name" changes as the PCI bus slot the NIC was attached to has changed (user added a virtual interface to their instance).

For containers, the Linux specific rules generated to ensure the name to mac mapping remains across reboots (udev /etc/udev/rules.d/70-persistent-net) does not operate inside containers; yet cloud-init still needs to ensure that the provided network config is on the correct interface.

@goneri any comments w.r.t to multi-nic VMs on public clouds and any experience with nic name changing due to device enumeration randomness? Originally this was seen on Amazon Xen-based instances.

I'm convinced no BSD user wants to see the interface renamed on the fly

I'm not sure many Linux users particularly want this either but, as Ryan lays out, the reality of cloud networking (on Linux, at least) sometimes requires it.

(And it's worth bearing in mind that if someone considers themselves a cloud-init user, or a $cloud user, then they may want consistency across distros/OSes more than they want the most appropriate behaviour for a particular OS or distro. This isn't a trump card argument, it's just another angle to consider.)

cloudinit/distros/networking.py

blackboxsw

Looks really good @OddBloke. A couple of questions here, but I like the approach and I think it makes sense.

One question I have is should we prescribe that a Networking subclass raise NotImplementedErrors for current unimplemented methods so cloud-init gets a big ugly error if those unsupported paths are called?

HACKING.rst

cloudinit/distros/networking.py

blackboxsw · 2020-06-08T17:42:18Z

cloudinit/distros/networking.py

+    def extract_physdevs(self, netcfg: NetworkConfig) -> list:
+        return net.extract_physdevs(netcfg)
+
+    def find_fallback_nic(self, *, blacklist_drivers=None):


Why the '*' param. What callers are we guarding against here?

IMO, generally speaking, the less different ways of calling a function in the same way, the cleaner the API. In this case, we are collapsing the potential ways of calling this from:

find_fallback_nic([my, blacklist, drivers]) find_fallback_nic(blacklist_drivers=[my, blacklist, drivers])

to only find_fallback_nic(blacklist_drivers=[my, blacklist, drivers]). This makes the codebase easier to understand, and easier to navigate too (e.g. I can't currently grep blacklist_drivers= and have confidence that I'll find (a superset of) all the places this parameter is passed).

We're already refactoring all the call sites of these functions/methods; there's no reason not to clean up the API while we're at it.

cloudinit/distros/networking.py

HACKING.rst

cloudinit/distros/networking.py

goneri · 2020-06-08T18:37:35Z

HACKING.rst

+  * ``is_connected``
+  * ``is_physical``
+  * ``is_present``
+  * ``is_renamed``


Some OS cannot rename interfaces, how do you deal with those? Do you raise an NotImplementedError?

Currently, this structure PR is just passing through to the existing implementation. In this case, it will always return False if /sys is not present:

cloud-init/cloudinit/net/__init__.py

Lines 249 to 261 in e01e3ed

def is_renamed(devname):

"""

/* interface name assignment types (sysfs name_assign_type attribute) */

#define NET_NAME_UNKNOWN 0 /* unknown origin (not exposed to user) */

#define NET_NAME_ENUM 1 /* enumerated by kernel */

#define NET_NAME_PREDICTABLE 2 /* predictably named by the kernel */

#define NET_NAME_USER 3 /* provided by user-space */

#define NET_NAME_RENAMED 4 /* renamed by user-space */

"""

name_assign_type = read_sys_net_safe(devname, 'name_assign_type')

if name_assign_type and name_assign_type in ['3', '4']:

return True

return False

Once the refactor is under way, we'll need to revisit whether or not that is the correct behaviour.

goneri · 2020-06-08T18:54:26Z

cloudinit/distros/networking.py

+        return net.get_ib_interface_hwaddr(devname, ethernet_format)
+
+    def get_interface_mac(self, devname: DeviceName):
+        return net.get_interface_mac(devname)


This one can safely rely on get_interfaces_by_mac() be default.

This PR is only adding the initial structure, that's a good note for when this method is being refactored!

goneri · 2020-06-08T18:57:16Z

cloudinit/distros/networking.py

+    def device_devid(self, devname: DeviceName):
+        return net.device_devid(devname)
+
+    def device_driver(self, devname: DeviceName):


Can we raise an NotImplementedError by default instead. We cannot support that on NetBSD, on possible OpenBSD too.

You can't infer the device driver in use given a network interface name? (I was under the impression from another/previous comment that the driver name was encoded in the name itself, e.g. rtk0 is using the rtk driver?)

Yes, this is the way, but it's rather empirical, and AFAIR on FreeBSD the interface name can be changed, and in this case, you cannot get the original name.

you can, but it's harder

goneri · 2020-06-08T18:57:49Z

cloudinit/distros/networking.py

+        return net._rename_interfaces(renames, current_info=current_info)
+
+    def apply_network_config_names(self, netcfg: NetworkConfig) -> None:
+        return net.apply_network_config_names(netcfg)


Can we also raise an NotImplementedError by default?

When we convert this to an abstractmethod (as per https://github.com/canonical/cloud-init/pull/391/files#diff-0abd02cf07a2406439d0d263b886ebcbR434), then we won't be able to instantiate Networking subclasses without an implementation. If the appropriate default behaviour on BSD is to raise NotImplementedError, then that should be the implementation in BSDNetworking.

(Of course, any callers will also need to cope with that exception being raised, so it may be the case that a different default behaviour makes sense; it may be acceptable to make it a noop, for example.)

i think if the exception wasn't BSD specific, there might be more of an incentive to actually catch or handle it

Well, this method will have an implementation on LinuxNetworking, so regardless of whether the exception is raised in Networking.apply_network_config_names or in BSDNetworking.apply_network_config_names, it will only be raised on BSD systems.

+1 this works for me thanks for addressing the concern @OddBloke

OddBloke · 2020-06-16T14:37:34Z

Thanks all for the comments, they've been really valuable! This round of feedback has raised a couple of questions that I'd appreciate input on:

How do we want to divide and track the work of the individual refactors? I don't want us to end up with multiple people working on the same one, and wasting effort. (Obviously our internal-to-Canonical tracking system isn't going to work, as we know that community folks are going to be involved in this.)
It seems like there's a fundamental question about interface renaming to be answered: is this Linux-specific behaviour (required because of the kernel/systemd's network interface naming quirks), or is this general cloud-init behaviour which just isn't available on some of the platforms which cloud-init runs on?

igalic · 2020-06-16T16:22:29Z

How do we want to divide and track the work of the individual refactors? I don't want us to end up with multiple people working on the same one, and wasting effort. (Obviously our internal-to-Canonical tracking system isn't going to work, as we know that community folks are going to be involved in this.)

i think the simplest thing we can do is to demand some highly descriptive PRs

It seems like there's a fundamental question about interface renaming to be answered: is this Linux-specific behaviour (required because of the kernel/systemd's network interface naming quirks), or is this general cloud-init behaviour which just isn't available on some of the platforms which cloud-init runs on?

i believe you're right about the volatile nature of the cloud, and as such, this is a key feature that cloud-init is providing to handle such volatility

blackboxsw · 2020-06-19T02:52:16Z

Thanks all for the comments, they've been really valuable! This round of feedback has raised a couple of questions that I'd appreciate input on:
* How do we want to divide and track the work of the individual refactors? I don't want us to end up with multiple people working on the same one, and wasting effort. (Obviously our internal-to-Canonical tracking system isn't going to work, as we know that community folks are going to be involved in this.)

Can we create feature bugs for the refactor method parts taggeds as 'bitesize' or 'refactor' or both that someone will create as they grab a component of this refactor? To ensure folks don't collide on certain aspects?

* It seems like there's a fundamental question about interface renaming to be answered: is this Linux-specific behaviour (required because of the kernel/systemd's network interface naming quirks), or is this general cloud-init behaviour which just isn't available on some of the platforms which cloud-init runs on?

I'm not quite groking from the responses about the level of support BSD* has for this. Can we assume this is general behavior on cloud-init and BSD* can determine otherwise at a later point in time?

Co-authored-by: Chad Smith <chad.smith@canonical.com>

OddBloke · 2020-06-22T13:50:19Z

I think we've reached consensus that this PR is ready to land. I'm going to create the tracking bugs for each individual function/method today before landing this first thing tomorrow, so this is last call for any comments!

OddBloke · 2020-06-22T20:46:17Z

I've opened a bug per function here: https://bugs.launchpad.net/cloud-init/+bugs?field.tag=net-refactor

powersj

Approving based on prior comments after a quick read

OddBloke mentioned this pull request May 26, 2020

HACKING: add specifics of function/classes for net refactor OddBloke/cloud-init#2

Closed

igalic reviewed May 26, 2020

View reviewed changes

OddBloke force-pushed the net branch from dca6f05 to 5dc3721 Compare June 4, 2020 21:11

igalic reviewed Jun 5, 2020

View reviewed changes

raharper reviewed Jun 5, 2020

View reviewed changes

OddBloke changed the title ~~HACKING: add specifics of function/classes for net refactor~~ distros.networking: initial implementation of layout Jun 5, 2020

OddBloke closed this Jun 5, 2020

OddBloke reopened this Jun 5, 2020

blackboxsw self-assigned this Jun 8, 2020

blackboxsw reviewed Jun 8, 2020

View reviewed changes

goneri reviewed Jun 8, 2020

View reviewed changes

OddBloke and others added 12 commits June 19, 2020 09:46

HACKING: add specifics of function/classes for net refactor

ef91834

HACKING.rst: update for new location of hierarchy

d40a0fa

expand what each iterative change should include

44b38d9

flesh out details of parameters we can remove

130a2d8

add initial networking structure

a8f475f

standardise parameter names

184df82

docstrings

855543f

typo fix

1420520

clarify some wording

ccfe299

add type annotation info to refactor section of HACKING

ab8e49b

further clarification and standardisation

f5b85fc

Update cloudinit/distros/networking.py

a7d1fc7

Co-authored-by: Chad Smith <chad.smith@canonical.com>

OddBloke force-pushed the net branch from b0f2cd2 to a7d1fc7 Compare June 19, 2020 13:48

Merge branch 'master' into net

5aa85d0

OddBloke added 2 commits June 23, 2020 08:35

Merge branch 'master' into net

03ff2a9

Merge branch 'master' into net

40ce348

powersj approved these changes Jun 23, 2020

View reviewed changes

Merge branch 'master' into net

71bb825

OddBloke merged commit 9a97a3f into canonical:master Jun 23, 2020

OddBloke deleted the net branch June 23, 2020 14:23

raharper mentioned this pull request Jul 10, 2020

Ubuntu/devel #486

Merged

This was referenced May 10, 2023

cloud-init searches for ec2 mirrors regardless of what cloud its on #2549

Closed

#include fails silently. #3069

Closed

cloud-init status broken in groovy lxd containers #3758

Closed

Release 20.3 #3774

Closed

	def is_renamed(devname):
	"""
	/* interface name assignment types (sysfs name_assign_type attribute) */
	#define NET_NAME_UNKNOWN 0 /* unknown origin (not exposed to user) */
	#define NET_NAME_ENUM 1 /* enumerated by kernel */
	#define NET_NAME_PREDICTABLE 2 /* predictably named by the kernel */
	#define NET_NAME_USER 3 /* provided by user-space */
	#define NET_NAME_RENAMED 4 /* renamed by user-space */
	"""
	name_assign_type = read_sys_net_safe(devname, 'name_assign_type')
	if name_assign_type and name_assign_type in ['3', '4']:
	return True
	return False

distros.networking: initial implementation of layout #391

distros.networking: initial implementation of layout #391

Conversation

OddBloke commented May 26, 2020 • edited Loading

igalic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

goneri Jun 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blackboxsw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OddBloke commented Jun 16, 2020

igalic commented Jun 16, 2020

blackboxsw commented Jun 19, 2020

OddBloke commented Jun 22, 2020

OddBloke commented Jun 22, 2020

powersj left a comment

Choose a reason for hiding this comment

OddBloke commented May 26, 2020 •

edited

Loading

goneri Jun 8, 2020 •

edited

Loading