-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distros.networking: initial implementation of layout #391
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👀
|
||
* ``get_master`` | ||
* ``device_devid`` | ||
* ``device_driver`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most other Unices have the name of the device driver in the name of the device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it sounds like a function which extracts that driver from the device name would still make sense?
(For my own information: what do such device names look like?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to expose that in a public API? I've the feeling the information is just required by the backend driver itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to expose that in a public API?
This isn't a public API, it is only intended for internal cloud-init use.
I've the feeling the information is just required by the backend driver itself.
device_driver
is called by a bunch of Linux-specific functions, but also extract_physdevs
and generate_fallback_config
. These latter two are expected to work on every platform, so I don't think we can only have device_driver
present on LinuxNetworking
in this initial layout. It's possible that as we perform the refactor we will discover that we don't need device_driver
on the super-class after all, and I think we can move it at that point.
Does that sound right (and reasonable)?
* ``is_netfail_standby`` | ||
|
||
* those that use ``/sys`` (via helpers) and have non-exhaustive BSD | ||
logic: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logic may be non-exhaustive, but it's more than enough :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's only a FreeBSD check in it, it falls through to the /sys
-based implementation on {Open,Net,Dragonfly}BSD.
HACKING.rst
Outdated
* ``natural_sort_key`` | ||
|
||
Note that the functions in ``cloudinit.net`` use inconsistent parameter | ||
names for "string that cohtains a device name"; we can standardise on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contains
HACKING.rst
Outdated
our various distros, while still allowing easy reuse of code between | ||
in ``cloudinit.distros.networking``, which each ``Distro`` subclass | ||
will reference. These will capture the differences between networking | ||
on our various distros, while still allowing easy reuse of code between | ||
distros that share functionality (e.g. most of the Linux networking | ||
behaviour). Callers will call ``distro.net.func`` instead of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
distro.networking.func?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
distro.net
is where the instantiated Networking
class is stored (on an instantiated Distro
). That naming isn't introduced until my first "In more detail" bullet, however, so let me rework this to be clearer.
HACKING.rst
Outdated
|
||
* pick an unmigrated ``cloudinit.distros.Networking`` method | ||
* pick an unmigrated ``cloudinit.distros.networking.Networking`` method | ||
* refactor all of its callers to call the ``distro.net`` method on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
distro.networking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, let me rework things to make it a bit clearer.
* this has non-distro-specific logic so should potentially be | ||
refactored to use helpers on ``self`` instead of ``ip`` directly | ||
(rather than being wholesale reimplemented in each of | ||
``BSDNetworking`` or ``LinuxNetworking``) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nic renaming bit might be Linux specific; not sure if BSD has similar issues with Linux where the eth* kernel device namespace has issues with being renamed by user-space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the main reason we do it is because it helps us reuse the vendor's cloud-config, which usually uses eth0
, even though they of all people should know what the devices will be named
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm convinced no BSD user wants to see the interface renamed on the fly, especially to something super Linux-ish like eth0
. Also, today, the feature only works on FreeBSD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excluding user-generated networking configs; the primary reason for renaming nics is to bind a provided network config to a specific interface (identified by MAC).
Some virtualization platforms do not guarantee that the device name of the NIC is stable across reboots; that means eth0 with macA may be eth1 on the next boot, or in the case of Linux systemd, the "persistent name" changes as the PCI bus slot the NIC was attached to has changed (user added a virtual interface to their instance).
For containers, the Linux specific rules generated to ensure the name to mac mapping remains across reboots (udev /etc/udev/rules.d/70-persistent-net) does not operate inside containers; yet cloud-init still needs to ensure that the provided network config is on the correct interface.
@goneri any comments w.r.t to multi-nic VMs on public clouds and any experience with nic name changing due to device enumeration randomness? Originally this was seen on Amazon Xen-based instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm convinced no BSD user wants to see the interface renamed on the fly
I'm not sure many Linux users particularly want this either but, as Ryan lays out, the reality of cloud networking (on Linux, at least) sometimes requires it.
(And it's worth bearing in mind that if someone considers themselves a cloud-init user, or a $cloud user, then they may want consistency across distros/OSes more than they want the most appropriate behaviour for a particular OS or distro. This isn't a trump card argument, it's just another angle to consider.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good @OddBloke. A couple of questions here, but I like the approach and I think it makes sense.
One question I have is should we prescribe that a Networking subclass raise NotImplementedErrors for current unimplemented methods so cloud-init gets a big ugly error if those unsupported paths are called?
def extract_physdevs(self, netcfg: NetworkConfig) -> list: | ||
return net.extract_physdevs(netcfg) | ||
|
||
def find_fallback_nic(self, *, blacklist_drivers=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the '*' param. What callers are we guarding against here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, generally speaking, the less different ways of calling a function in the same way, the cleaner the API. In this case, we are collapsing the potential ways of calling this from:
find_fallback_nic([my, blacklist, drivers])
find_fallback_nic(blacklist_drivers=[my, blacklist, drivers])
to only find_fallback_nic(blacklist_drivers=[my, blacklist, drivers])
. This makes the codebase easier to understand, and easier to navigate too (e.g. I can't currently grep blacklist_drivers=
and have confidence that I'll find (a superset of) all the places this parameter is passed).
We're already refactoring all the call sites of these functions/methods; there's no reason not to clean up the API while we're at it.
* ``is_connected`` | ||
* ``is_physical`` | ||
* ``is_present`` | ||
* ``is_renamed`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some OS cannot rename interfaces, how do you deal with those? Do you raise an NotImplementedError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, this structure PR is just passing through to the existing implementation. In this case, it will always return False
if /sys
is not present:
cloud-init/cloudinit/net/__init__.py
Lines 249 to 261 in e01e3ed
def is_renamed(devname): | |
""" | |
/* interface name assignment types (sysfs name_assign_type attribute) */ | |
#define NET_NAME_UNKNOWN 0 /* unknown origin (not exposed to user) */ | |
#define NET_NAME_ENUM 1 /* enumerated by kernel */ | |
#define NET_NAME_PREDICTABLE 2 /* predictably named by the kernel */ | |
#define NET_NAME_USER 3 /* provided by user-space */ | |
#define NET_NAME_RENAMED 4 /* renamed by user-space */ | |
""" | |
name_assign_type = read_sys_net_safe(devname, 'name_assign_type') | |
if name_assign_type and name_assign_type in ['3', '4']: | |
return True | |
return False |
Once the refactor is under way, we'll need to revisit whether or not that is the correct behaviour.
return net.get_ib_interface_hwaddr(devname, ethernet_format) | ||
|
||
def get_interface_mac(self, devname: DeviceName): | ||
return net.get_interface_mac(devname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one can safely rely on get_interfaces_by_mac()
be default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is only adding the initial structure, that's a good note for when this method is being refactored!
def device_devid(self, devname: DeviceName): | ||
return net.device_devid(devname) | ||
|
||
def device_driver(self, devname: DeviceName): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we raise an NotImplementedError
by default instead. We cannot support that on NetBSD, on possible OpenBSD too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't infer the device driver in use given a network interface name? (I was under the impression from another/previous comment that the driver name was encoded in the name itself, e.g. rtk0 is using the rtk driver?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the way, but it's rather empirical, and AFAIR on FreeBSD the interface name can be changed, and in this case, you cannot get the original name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can, but it's harder
return net._rename_interfaces(renames, current_info=current_info) | ||
|
||
def apply_network_config_names(self, netcfg: NetworkConfig) -> None: | ||
return net.apply_network_config_names(netcfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also raise an NotImplementedError
by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we convert this to an abstractmethod (as per https://github.com/canonical/cloud-init/pull/391/files#diff-0abd02cf07a2406439d0d263b886ebcbR434), then we won't be able to instantiate Networking
subclasses without an implementation. If the appropriate default behaviour on BSD is to raise NotImplementedError
, then that should be the implementation in BSDNetworking
.
(Of course, any callers will also need to cope with that exception being raised, so it may be the case that a different default behaviour makes sense; it may be acceptable to make it a noop, for example.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think if the exception wasn't BSD specific, there might be more of an incentive to actually catch or handle it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this method will have an implementation on LinuxNetworking
, so regardless of whether the exception is raised in Networking.apply_network_config_names
or in BSDNetworking.apply_network_config_names
, it will only be raised on BSD systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 this works for me thanks for addressing the concern @OddBloke
Thanks all for the comments, they've been really valuable! This round of feedback has raised a couple of questions that I'd appreciate input on:
|
i think the simplest thing we can do is to demand some highly descriptive PRs
i believe you're right about the volatile nature of the cloud, and as such, this is a key feature that cloud-init is providing to handle such volatility |
Can we create feature bugs for the refactor method parts taggeds as 'bitesize' or 'refactor' or both that someone will create as they grab a component of this refactor? To ensure folks don't collide on certain aspects?
I'm not quite groking from the responses about the level of support BSD* has for this. Can we assume this is general behavior on cloud-init and BSD* can determine otherwise at a later point in time? |
Co-authored-by: Chad Smith <chad.smith@canonical.com>
I think we've reached consensus that this PR is ready to land. I'm going to create the tracking bugs for each individual function/method today before landing this first thing tomorrow, so this is last call for any comments! |
I've opened a bug per function here: https://bugs.launchpad.net/cloud-init/+bugs?field.tag=net-refactor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving based on prior comments after a quick read
This pull request introduces the initial structure for the "cloudinit.net -> cloudinit.distros.networking Hierarchy" refactor, as detailed in [0]. It also updates that section with some changes driven by this initial implementation, as well as adding a lot more specifics to it.
[0] https://cloudinit.readthedocs.io/en/latest/topics/hacking.html#cloudinit-net-cloudinit-distros-networking-hierarchy