Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distros.networking: initial implementation of layout #391

Merged
merged 16 commits into from
Jun 23, 2020

Conversation

OddBloke
Copy link
Collaborator

@OddBloke OddBloke commented May 26, 2020

This pull request introduces the initial structure for the "cloudinit.net -> cloudinit.distros.networking Hierarchy" refactor, as detailed in [0]. It also updates that section with some changes driven by this initial implementation, as well as adding a lot more specifics to it.

[0] https://cloudinit.readthedocs.io/en/latest/topics/hacking.html#cloudinit-net-cloudinit-distros-networking-hierarchy

Copy link
Collaborator

@igalic igalic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀


* ``get_master``
* ``device_devid``
* ``device_driver``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most other Unices have the name of the device driver in the name of the device

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it sounds like a function which extracts that driver from the device name would still make sense?

(For my own information: what do such device names look like?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to expose that in a public API? I've the feeling the information is just required by the backend driver itself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to expose that in a public API?

This isn't a public API, it is only intended for internal cloud-init use.

I've the feeling the information is just required by the backend driver itself.

device_driver is called by a bunch of Linux-specific functions, but also extract_physdevs and generate_fallback_config. These latter two are expected to work on every platform, so I don't think we can only have device_driver present on LinuxNetworking in this initial layout. It's possible that as we perform the refactor we will discover that we don't need device_driver on the super-class after all, and I think we can move it at that point.

Does that sound right (and reasonable)?

* ``is_netfail_standby``

* those that use ``/sys`` (via helpers) and have non-exhaustive BSD
logic:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic may be non-exhaustive, but it's more than enough :P

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's only a FreeBSD check in it, it falls through to the /sys-based implementation on {Open,Net,Dragonfly}BSD.

HACKING.rst Outdated
* ``natural_sort_key``

Note that the functions in ``cloudinit.net`` use inconsistent parameter
names for "string that cohtains a device name"; we can standardise on
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contains

HACKING.rst Outdated
our various distros, while still allowing easy reuse of code between
in ``cloudinit.distros.networking``, which each ``Distro`` subclass
will reference. These will capture the differences between networking
on our various distros, while still allowing easy reuse of code between
distros that share functionality (e.g. most of the Linux networking
behaviour). Callers will call ``distro.net.func`` instead of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

distro.networking.func?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

distro.net is where the instantiated Networking class is stored (on an instantiated Distro). That naming isn't introduced until my first "In more detail" bullet, however, so let me rework this to be clearer.

HACKING.rst Outdated

* pick an unmigrated ``cloudinit.distros.Networking`` method
* pick an unmigrated ``cloudinit.distros.networking.Networking`` method
* refactor all of its callers to call the ``distro.net`` method on
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

distro.networking

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, let me rework things to make it a bit clearer.

HACKING.rst Show resolved Hide resolved
* this has non-distro-specific logic so should potentially be
refactored to use helpers on ``self`` instead of ``ip`` directly
(rather than being wholesale reimplemented in each of
``BSDNetworking`` or ``LinuxNetworking``)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nic renaming bit might be Linux specific; not sure if BSD has similar issues with Linux where the eth* kernel device namespace has issues with being renamed by user-space.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's a good point. @igalic @goneri can you let us know what you think about network device renaming on *BSD?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main reason we do it is because it helps us reuse the vendor's cloud-config, which usually uses eth0, even though they of all people should know what the devices will be named

Copy link
Contributor

@goneri goneri Jun 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm convinced no BSD user wants to see the interface renamed on the fly, especially to something super Linux-ish like eth0. Also, today, the feature only works on FreeBSD.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excluding user-generated networking configs; the primary reason for renaming nics is to bind a provided network config to a specific interface (identified by MAC).

Some virtualization platforms do not guarantee that the device name of the NIC is stable across reboots; that means eth0 with macA may be eth1 on the next boot, or in the case of Linux systemd, the "persistent name" changes as the PCI bus slot the NIC was attached to has changed (user added a virtual interface to their instance).

For containers, the Linux specific rules generated to ensure the name to mac mapping remains across reboots (udev /etc/udev/rules.d/70-persistent-net) does not operate inside containers; yet cloud-init still needs to ensure that the provided network config is on the correct interface.

@goneri any comments w.r.t to multi-nic VMs on public clouds and any experience with nic name changing due to device enumeration randomness? Originally this was seen on Amazon Xen-based instances.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm convinced no BSD user wants to see the interface renamed on the fly

I'm not sure many Linux users particularly want this either but, as Ryan lays out, the reality of cloud networking (on Linux, at least) sometimes requires it.

(And it's worth bearing in mind that if someone considers themselves a cloud-init user, or a $cloud user, then they may want consistency across distros/OSes more than they want the most appropriate behaviour for a particular OS or distro. This isn't a trump card argument, it's just another angle to consider.)

cloudinit/distros/networking.py Show resolved Hide resolved
@OddBloke OddBloke changed the title HACKING: add specifics of function/classes for net refactor distros.networking: initial implementation of layout Jun 5, 2020
@OddBloke OddBloke closed this Jun 5, 2020
@OddBloke OddBloke reopened this Jun 5, 2020
@blackboxsw blackboxsw self-assigned this Jun 8, 2020
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good @OddBloke. A couple of questions here, but I like the approach and I think it makes sense.

One question I have is should we prescribe that a Networking subclass raise NotImplementedErrors for current unimplemented methods so cloud-init gets a big ugly error if those unsupported paths are called?

HACKING.rst Show resolved Hide resolved
cloudinit/distros/networking.py Outdated Show resolved Hide resolved
def extract_physdevs(self, netcfg: NetworkConfig) -> list:
return net.extract_physdevs(netcfg)

def find_fallback_nic(self, *, blacklist_drivers=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the '*' param. What callers are we guarding against here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, generally speaking, the less different ways of calling a function in the same way, the cleaner the API. In this case, we are collapsing the potential ways of calling this from:

find_fallback_nic([my, blacklist, drivers])
find_fallback_nic(blacklist_drivers=[my, blacklist, drivers])

to only find_fallback_nic(blacklist_drivers=[my, blacklist, drivers]). This makes the codebase easier to understand, and easier to navigate too (e.g. I can't currently grep blacklist_drivers= and have confidence that I'll find (a superset of) all the places this parameter is passed).

We're already refactoring all the call sites of these functions/methods; there's no reason not to clean up the API while we're at it.

cloudinit/distros/networking.py Show resolved Hide resolved
HACKING.rst Show resolved Hide resolved
cloudinit/distros/networking.py Show resolved Hide resolved
* ``is_connected``
* ``is_physical``
* ``is_present``
* ``is_renamed``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some OS cannot rename interfaces, how do you deal with those? Do you raise an NotImplementedError?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this structure PR is just passing through to the existing implementation. In this case, it will always return False if /sys is not present:

def is_renamed(devname):
"""
/* interface name assignment types (sysfs name_assign_type attribute) */
#define NET_NAME_UNKNOWN 0 /* unknown origin (not exposed to user) */
#define NET_NAME_ENUM 1 /* enumerated by kernel */
#define NET_NAME_PREDICTABLE 2 /* predictably named by the kernel */
#define NET_NAME_USER 3 /* provided by user-space */
#define NET_NAME_RENAMED 4 /* renamed by user-space */
"""
name_assign_type = read_sys_net_safe(devname, 'name_assign_type')
if name_assign_type and name_assign_type in ['3', '4']:
return True
return False

Once the refactor is under way, we'll need to revisit whether or not that is the correct behaviour.

return net.get_ib_interface_hwaddr(devname, ethernet_format)

def get_interface_mac(self, devname: DeviceName):
return net.get_interface_mac(devname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one can safely rely on get_interfaces_by_mac() be default.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is only adding the initial structure, that's a good note for when this method is being refactored!

def device_devid(self, devname: DeviceName):
return net.device_devid(devname)

def device_driver(self, devname: DeviceName):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we raise an NotImplementedError by default instead. We cannot support that on NetBSD, on possible OpenBSD too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't infer the device driver in use given a network interface name? (I was under the impression from another/previous comment that the driver name was encoded in the name itself, e.g. rtk0 is using the rtk driver?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the way, but it's rather empirical, and AFAIR on FreeBSD the interface name can be changed, and in this case, you cannot get the original name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can, but it's harder

return net._rename_interfaces(renames, current_info=current_info)

def apply_network_config_names(self, netcfg: NetworkConfig) -> None:
return net.apply_network_config_names(netcfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also raise an NotImplementedError by default?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we convert this to an abstractmethod (as per https://github.com/canonical/cloud-init/pull/391/files#diff-0abd02cf07a2406439d0d263b886ebcbR434), then we won't be able to instantiate Networking subclasses without an implementation. If the appropriate default behaviour on BSD is to raise NotImplementedError, then that should be the implementation in BSDNetworking.

(Of course, any callers will also need to cope with that exception being raised, so it may be the case that a different default behaviour makes sense; it may be acceptable to make it a noop, for example.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think if the exception wasn't BSD specific, there might be more of an incentive to actually catch or handle it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this method will have an implementation on LinuxNetworking, so regardless of whether the exception is raised in Networking.apply_network_config_names or in BSDNetworking.apply_network_config_names, it will only be raised on BSD systems.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 this works for me thanks for addressing the concern @OddBloke

@OddBloke
Copy link
Collaborator Author

Thanks all for the comments, they've been really valuable! This round of feedback has raised a couple of questions that I'd appreciate input on:

  • How do we want to divide and track the work of the individual refactors? I don't want us to end up with multiple people working on the same one, and wasting effort. (Obviously our internal-to-Canonical tracking system isn't going to work, as we know that community folks are going to be involved in this.)
  • It seems like there's a fundamental question about interface renaming to be answered: is this Linux-specific behaviour (required because of the kernel/systemd's network interface naming quirks), or is this general cloud-init behaviour which just isn't available on some of the platforms which cloud-init runs on?

@igalic
Copy link
Collaborator

igalic commented Jun 16, 2020

  • How do we want to divide and track the work of the individual refactors? I don't want us to end up with multiple people working on the same one, and wasting effort. (Obviously our internal-to-Canonical tracking system isn't going to work, as we know that community folks are going to be involved in this.)

i think the simplest thing we can do is to demand some highly descriptive PRs

  • It seems like there's a fundamental question about interface renaming to be answered: is this Linux-specific behaviour (required because of the kernel/systemd's network interface naming quirks), or is this general cloud-init behaviour which just isn't available on some of the platforms which cloud-init runs on?

i believe you're right about the volatile nature of the cloud, and as such, this is a key feature that cloud-init is providing to handle such volatility

@blackboxsw
Copy link
Collaborator

Thanks all for the comments, they've been really valuable! This round of feedback has raised a couple of questions that I'd appreciate input on:

* How do we want to divide and track the work of the individual refactors? I don't want us to end up with multiple people working on the same one, and wasting effort. (Obviously our internal-to-Canonical tracking system isn't going to work, as we know that community folks are going to be involved in this.)

Can we create feature bugs for the refactor method parts taggeds as 'bitesize' or 'refactor' or both that someone will create as they grab a component of this refactor? To ensure folks don't collide on certain aspects?

* It seems like there's a fundamental question about interface renaming to be answered: is this Linux-specific behaviour (required because of the kernel/systemd's network interface naming quirks), or is this general cloud-init behaviour which just isn't available on some of the platforms which cloud-init runs on?

I'm not quite groking from the responses about the level of support BSD* has for this. Can we assume this is general behavior on cloud-init and BSD* can determine otherwise at a later point in time?

@OddBloke
Copy link
Collaborator Author

I think we've reached consensus that this PR is ready to land. I'm going to create the tracking bugs for each individual function/method today before landing this first thing tomorrow, so this is last call for any comments!

@OddBloke
Copy link
Collaborator Author

I've opened a bug per function here: https://bugs.launchpad.net/cloud-init/+bugs?field.tag=net-refactor

Copy link
Contributor

@powersj powersj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving based on prior comments after a quick read

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants