Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new network activators to bring up interfaces (SC-85) #919

Merged
merged 8 commits into from
Jul 1, 2021

Conversation

TheRealFalcon
Copy link
Member

@TheRealFalcon TheRealFalcon commented Jun 14, 2021

Proposed Commit Message

Add new network activators to bring up interfaces

Currently _bring_up_interfaces() is a no-op for any distro using
renderers. We need to be able to support bringing up a single
interfaces, a list of interfaces, and all interfaces. This should be
independent of the renderers, as the network config is often
generated independent of the mechanism used to apply it.

Additionally, I included a refactor to remove
"_supported_write_network_config". We had a confusing call chain of
apply_network_config->_write_network_config->_supported_write_network_config.
The last two have been combined.

Additional Context

This was motivated by needing to support hotplug.

_bring_up_interfaces() is only ever called by arch and gentoo these days, so I removed the other implementations.

Test Steps

I added some unit tests. Integration testing is difficult. Given that _bring_up_interfaces was previously a no-op, it's hard to imagine real world situations where we can test this. The only time we exercise this code path is when we're running in network mode, and we were unable to obtain DataSource information during the local phase.

One way to simulate this would be to create a testing DataSource that returns a different instance id every time it is queried. It would be an ugly hack, but it would work. Given that we'll need to test this when hotplug lands, I'm not sure it's worth the integration testing effort at the moment.

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

@@ -58,38 +58,6 @@
'bridge_waitport': None}}


def parse_net_config_data(net_config, skip_broken=True):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved parse_net_config_data to the bottom of the file so I could return some type information (NetworkConfig isn't defined here yet).

I removed parse_net_config as it is dead code.

@@ -38,13 +43,13 @@ def search(priority=None, target=None, first=False):
if render_mod.available(target):
cur = (name, render_mod.Renderer)
if first:
return cur
return [cur]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much easier to reason about if we're always returning a list. The only place this function is being called is from the select function below (which I modified accordingly) and a couple unit tests.

@TheRealFalcon TheRealFalcon changed the title WIP: Add new network configurers to bring up interfaces WIP: Add new network configurers to bring up interfaces (SC-85) Jun 14, 2021
state = nsi.get_network_state()

if not state:
raise RuntimeError(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we were either returning NetworkState or None. In net_convert.py we handled the None accordingly, but the renderers don't check for None and we'll likely get an ugly AttributeError error if they're passed it. So failing early here makes more sense and makes the function easier to reason about.

@TheRealFalcon TheRealFalcon changed the title WIP: Add new network configurers to bring up interfaces (SC-85) Add new network configurers to bring up interfaces (SC-85) Jun 15, 2021
@blackboxsw
Copy link
Collaborator

Yeah I'm struggling a bit with calling it a Configurer, maybe NetworkBackpane? What stinks is that netplan already calls -NetworkManager and systemd-networkd renderers... yet cloud-init also calls netplan a renderer. "NetworkBackplaneManager/ BackplaneBringup? I'll hold on the rest of the bike-shedding and just review the code and we can discuss more broadly what this class should be called when we figure our what exactly it needs to do.

@TheRealFalcon
Copy link
Member Author

What about InterfaceBringerUpper? NetworkInterfaceInterfacer?
(these aren't serious suggestions just in case you were wondering 😉 )

Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheRealFalcon, the code and refactor looks good I think the only critical thing at this point is solving the gentoo.py issues which I think lead to a break in network rendering.

The refactors and rename made it a bit time-consuming to review and understand for me, but I think I've walked through the setup path in general and the end result is a much cleaner set of method/function names.

See if any of my alternative bike-shed suggestions to "Configurer" ring as more appropriate/clear.... (I'm not sure they do).

util.logexc(LOG, "Running interface command %s failed", cmd)
return False
"""Deprecated. Remove if/when arch and gentoo support renderers."""
raise NotImplementedError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break both gentoo which directly calls Distros._bring_up_interface.

Don't we need to either:

  1. leave the _bring_up_interface stub in place still in the base class?
    OR
  2. Migrate this function declaration into gentoo.py._bring_up_interface? Or do we want to correct the gentoo reference to just call self._bring_up_interface

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't actually. The call hierarchy would work like this:
base:apply_network -> gentoo:_bring_up_interfaces->base:_bring_up_interfaces->gentoo:_bring_up_interface

The distros.Distro._bring_up_interfaces is confusing (it should just call super), but since it's passing self to the base class, the "normal" method calling rules still apply.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh right.

raise NotImplementedError()

def _supported_write_network_config(self, network_config):
def _write_network_state(self, network_state):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good naming, makes it more clear

return found


def select_configurer(priority=None, target=None) -> Type[NetworkConfigurer]:
Copy link
Collaborator

@blackboxsw blackboxsw Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're only calling this from the module, WDYT about aligning with our renderers.select|search naming?

Suggested change
def select_configurer(priority=None, target=None) -> Type[NetworkConfigurer]:
def select(priority=None, target=None) -> Type[NetworkConfigurer]:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah...I named it this way because I was frustrated when grepping for usages of the renderers.select|search. Searching for 'select' or 'search' will return a million results. I figured the function call would be slightly more explicit too.

That said, I'm not super strongly opinionated about it. If you are, I don't mind changing it.

from cloudinit.net.network_state import NetworkState


class NetworkConfigurer(ABC):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NetworkActivator ??? since all methods are bring_up related
NetworkActuator, NetworkUtilityController, NetworkCmdController, NetworkCmdAbstraction, NetworkCmd, NetworkTool, NetworkBackend?

]


def search_configurer(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def search_configurer(
def search(

???

@@ -426,4 +434,36 @@ def network_state_to_netplan(network_state, header=None):
contents = renderer._render_content(network_state)
return header + contents


class NetplanConfigurer(NetworkConfigurer):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit strange that we have completely new/separate modules for network_manager and ifupdown "configurers", but we still house NetplanConfigurer declaration in netplan.py. It feels a tiny bit like our naming of modules is muddy in this directory in general. Having the common module namespace for renderers vs "configurer" modules in this dir makes looking up renderers and configurer declarations a bit more costly as we have generic module naming like eni.py and ifupdown.py in the same directory.

What do you think about one of these options?

  • including these "configurer" declarations inline in net.configurers.py?
    • It doesn't seem like the netplanconfigurer needs anything specifcally from netplan.py anyway beyond the "available" function.
  • prefixing "configurer" module names with the bike-sheded configurer_(ifupdown|network_manager|netplan).py

@@ -62,9 +62,9 @@ def install_packages(self, pkglist):
self.update_package_sources()
self.package_command('', pkgs=pkglist)

def _write_network_config(self, netconfig):
def _write_network_state(self, network_state):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also drop "_bring_up_interfaces" from arch as it as a near equivalent (yet less functional) of the parent class Distro._bring_up_interfaces

@blackboxsw
Copy link
Collaborator

I was able to go through inplace upgrades on existing systems greenfield and brownfield runs without error so, as expected, we haven't broken any the networking paths in this refactor.

@TheRealFalcon
Copy link
Member Author

@blackboxsw , I think I've addressed (or responded to) each of your comments. The 'configurers-activators' was a fairly large change, so it might be easier to review the other commits separately.

@TheRealFalcon TheRealFalcon requested a review from blackboxsw June 30, 2021 18:18
@TheRealFalcon TheRealFalcon changed the title Add new network configurers to bring up interfaces (SC-85) Add new network activators to bring up interfaces (SC-85) Jun 30, 2021
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for working through this @TheRealFalcon

@blackboxsw
Copy link
Collaborator

blackboxsw commented Jul 1, 2021

One CI error is likely due to mock not operating as expected in travis build env:

E       AssertionError: assert [<class 'clou...anActivator'>] == [<class 'clou...anActivator'>]
3097E         At index 0 diff: <class 'cloudinit.net.activators.NetplanActivator'> != <class 'cloudinit.net.activators.NetworkManagerActivator'>
3098E         Right contains one more item: <class 'cloudinit.net.activators.NetplanActivator'>
3099E         Full diff:
3100E           [
3101E         -  <class 'cloudinit.net.activators.NetworkManagerActivator'>,
3102E            <class 'cloudinit.net.activators.NetplanActivator'>,
3103E           ]
3104
3105tests/unittests/test_net_activators.py:110: AssertionError

Looks like we need the following patch to actually apply mocks to avoid actually checking on NetworkManager and Netplan

diff --git a/tests/unittests/test_net_activators.py b/tests/unittests/test_net_activators.py
index 2cd37078..f11486ff 100644
--- a/tests/unittests/test_net_activators.py
+++ b/tests/unittests/test_net_activators.py
@@ -105,7 +105,7 @@ class TestSearchAndSelect:
 
     @patch('cloudinit.net.activators.IfUpDownActivator.available',
            return_value=False)
-    def test_first_not_available(self, m_available):
+    def test_first_not_available(self, m_available, available_mocks):
         resp = search_activator()
         assert resp == DEFAULT_PRIORITY[1:]
 

`

Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix mocked test to add available_mocks. then +1

Currently _bring_up_interfaces() is a no-op for any distro using
renderers. We need to be able to support bringing up a single
interfaces, a list of interfaces, and all interfaces. This should be
independent of the renderers, as the network config is often
generated independent of the mechanism used to apply it.

Additionally, I included a refactor to remove
"_supported_write_network_config". We had a confusing call chain of
apply_network_config->_write_network_config->_supported_write_network_config.
The last two have been combined.
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@TheRealFalcon TheRealFalcon merged commit 81299de into canonical:main Jul 1, 2021
@TheRealFalcon TheRealFalcon deleted the bring-up branch July 1, 2021 19:43
TheRealFalcon added a commit to TheRealFalcon/cloud-init that referenced this pull request Oct 5, 2021
In canonical#919 (81299de), we refactored some of the code used to bring up
networks across distros. Previously, the call to bring up network
interfaces during 'init' stage unintentionally resulted in a no-op
such that network interfaces were NEVER brought up by cloud-init, even
if new network interfaces were found after crawling the metadata.

The code was altered to bring up these discovered network interfaces.
On ubuntu, this results in a 'netplan apply' call during 'init' stage
for any ubuntu-based distro on a datasource that has a NETWORK
dependency. On GCE, this additional 'netplan apply' conflicts with the
google-guest-agent service, resulting in an instance that can no
be connected to.

This commit adds a 'disable_network_activation' option that can be
enabled in /etc/cloud.cfg to disable the activation of network
interfaces in 'init' stage.
TheRealFalcon added a commit to TheRealFalcon/cloud-init that referenced this pull request Oct 5, 2021
In canonical#919 (81299de), we refactored some of the code used to bring up
networks across distros. Previously, the call to bring up network
interfaces during 'init' stage unintentionally resulted in a no-op
such that network interfaces were NEVER brought up by cloud-init, even
if new network interfaces were found after crawling the metadata.

The code was altered to bring up these discovered network interfaces.
On ubuntu, this results in a 'netplan apply' call during 'init' stage
for any ubuntu-based distro on a datasource that has a NETWORK
dependency. On GCE, this additional 'netplan apply' conflicts with the
google-guest-agent service, resulting in an instance that can no
be connected to.

This commit adds a 'disable_network_activation' option that can be
enabled in /etc/cloud.cfg to disable the activation of network
interfaces in 'init' stage.
TheRealFalcon added a commit to TheRealFalcon/cloud-init that referenced this pull request Oct 5, 2021
In canonical#919 (81299de), we refactored some of the code used to bring up
networks across distros. Previously, the call to bring up network
interfaces during 'init' stage unintentionally resulted in a no-op
such that network interfaces were NEVER brought up by cloud-init, even
if new network interfaces were found after crawling the metadata.

The code was altered to bring up these discovered network interfaces.
On ubuntu, this results in a 'netplan apply' call during 'init' stage
for any ubuntu-based distro on a datasource that has a NETWORK
dependency. On GCE, this additional 'netplan apply' conflicts with the
google-guest-agent service, resulting in an instance that can no
be connected to.

This commit adds a 'disable_network_activation' option that can be
enabled in /etc/cloud.cfg to disable the activation of network
interfaces in 'init' stage.
TheRealFalcon added a commit to TheRealFalcon/cloud-init that referenced this pull request Oct 5, 2021
In canonical#919 (81299de), we refactored some of the code used to bring up
networks across distros. Previously, the call to bring up network
interfaces during 'init' stage unintentionally resulted in a no-op
such that network interfaces were NEVER brought up by cloud-init, even
if new network interfaces were found after crawling the metadata.

The code was altered to bring up these discovered network interfaces.
On ubuntu, this results in a 'netplan apply' call during 'init' stage
for any ubuntu-based distro on a datasource that has a NETWORK
dependency. On GCE, this additional 'netplan apply' conflicts with the
google-guest-agent service, resulting in an instance that can no
be connected to.

This commit adds a 'disable_network_activation' option that can be
enabled in /etc/cloud.cfg to disable the activation of network
interfaces in 'init' stage.
TheRealFalcon added a commit to TheRealFalcon/cloud-init that referenced this pull request Oct 7, 2021
In canonical#919 (81299de), we refactored some of the code used to bring up
networks across distros. Previously, the call to bring up network
interfaces during 'init' stage unintentionally resulted in a no-op
such that network interfaces were NEVER brought up by cloud-init, even
if new network interfaces were found after crawling the metadata.

The code was altered to bring up these discovered network interfaces.
On ubuntu, this results in a 'netplan apply' call during 'init' stage
for any ubuntu-based distro on a datasource that has a NETWORK
dependency. On GCE, this additional 'netplan apply' conflicts with the
google-guest-agent service, resulting in an instance that can no
be connected to.

This commit adds a 'disable_network_activation' option that can be
enabled in /etc/cloud.cfg to disable the activation of network
interfaces in 'init' stage.

LP: #1938299
blackboxsw pushed a commit that referenced this pull request Oct 7, 2021
In #919 (81299de), we refactored some of the code used to bring up
networks across distros. Previously, the call to bring up network
interfaces during 'init' stage unintentionally resulted in a no-op
such that network interfaces were NEVER brought up by cloud-init, even
if new network interfaces were found after crawling the metadata.

The code was altered to bring up these discovered network interfaces.
On ubuntu, this results in a 'netplan apply' call during 'init' stage
for any ubuntu-based distro on a datasource that has a NETWORK
dependency. On GCE, this additional 'netplan apply' conflicts with the
google-guest-agent service, resulting in an instance that can no
be connected to.

This commit adds a 'disable_network_activation' option that can be
enabled in /etc/cloud.cfg to disable the activation of network
interfaces in 'init' stage.

LP: #1938299
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants