Separating provider-specific code in the installer #566

ghost · 2020-12-15T12:41:23Z

This change introduces an internal API in the installer code to move provider-specific code into a separate folder and avoid having large switch-case structures in the installer code.

openshift-ci-robot · 2020-12-15T12:41:41Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign danehans after the PR has been reviewed.
You can assign the PR to them by writing /assign @danehans in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mdbooth · 2020-12-15T13:26:18Z

I'm in favour of this. The details of each 'API' can be hashed out separately one by one as cloud provider code is refactored into it. I don't think it's worth getting into that detail here, except to agree it in principal.

sdodson · 2020-12-15T15:11:46Z

/uncc @stevekuznetsov
/cc @sdodson @staebler @crawford

pierreprinetti

Thank you for proposing this!

We've been working in this direction in the OpenStack provider already, and I think that it would bring many advantages: clarity, more focus on the orchestration in common parts, a better integration of optional steps (eg pre-flight validation), and moreover, well-defined expectations, written in code.

A big 👍 from me!

pierreprinetti · 2020-12-15T13:24:24Z

enhancements/separating-provider-specific-code-in-the-installer.md

+}
+```
+
+Not all platforms implement all features. Optional features should be hidden behind a feature flag and a separate interface:


Optional platform-specific features, like validations, could be detected via type assertion.

This is what the orchestrator code could resemble to:

type InstallConfig struct{} func validateCloud(ic InstallConfig, platform interface{}) { if quotaValidator, ok := platform.(interface { ValidateQuota(InstallConfig) error }); ok { if err := quotaValidator.ValidateQuota(ic); err != nil { log.Fatal(err) } } }

Each platform can them choose to implement some methods and not others, and the orchestrator code is aware what methods are available.

Actually, there is a much better way to do this as I found out. We can create a struct that has receivers for the default functionality and providers can embed this default struct. (Basically, OOP inheritance.)

I am not sure I understand. The way I'm figuring this, Installer-core defines a struct with optional methods defined as no-ops, then the Platform embeds it in a struct where it can override the optional noops with actual logic, then the Installer-core imports that... I see a circular dependency

In other places where I've created driver-like APIs of this sort, repeating some code always gave us a result that was easier to maintain over time because a developer working on one driver does not have to untangle an inheritance hierarchy or track down the real implementation that may live somewhere else. In places where we could gain from code reuse, utility functions or types that are instantiated rather than inherited are simpler to test and more flexible.

Along the same lines, no-op implementations, even if they have to be provided by every platform, are simpler to understand than features flags and multiple interfaces.

pierreprinetti · 2020-12-15T15:35:34Z

enhancements/separating-provider-specific-code-in-the-installer.md

+
+## Proposal
+
+### Step 1: Create a Go interface


Instead of defining a single all-encompassing interface, each "installer target" (install-config, manifests, machines etc) could define its own required interface, ideally with just one method each. This way, each logic part can be easily tested and we get cleanly segregated interfaces.

pierreprinetti · 2020-12-15T15:45:37Z

enhancements/separating-provider-specific-code-in-the-installer.md

+
+### Step 2: Implement registry
+
+In this step we should implement a registry pattern where providers can register with their names.


In the OpenStack provider, we have started the effort of defining a clean provider interface for Installer steps. For example, for cloudproviderconfig we have come up with this signature:

func GenerateCloudProviderConfig(installConfig types.InstallConfig) (cloudProviderConfigData, cloudProviderConfigCABundleData string, err error)

We would be happy to be guinea pigs for this RFE and help defining and testing interface signatures in-vivo for the other steps as well! We also have that in our backlog.

There ought to be a saying about two guinea pigs being better than one, we (oVirt) are also wanting to have this. Let's do it!

pierreprinetti · 2020-12-15T15:46:59Z

enhancements/separating-provider-specific-code-in-the-installer.md

+
+### Version Skew Strategy
+
+As the API develops it presents the problem of version skew. As new API calls are introduced they need to either be added to all providers. If that is not possible a new interface should be introduced and a compatibility wrapper should be added. Once all providers are migrated the temporary interface can be removed.


New function signatures could be implemented as new "optional" interfaces a provider object can implement, with the mechanism of interface assertion.

See my comment above, we don't even need to do interface assertion. (Interface assertion is not nice to read and incurs somewhat of a performance penalty.)

Why wouldn't it be nice? 😁 Conceptually, it's really what we want here: "dear Platform type, do you implement this method?"

As for the performance penalty: I tend to value readability higher, especially when the total penalty is not expected to pass the 1-second threshold in a 40-minutes process. I would even value aesthetics higher than performance here 😛

dhellmann · 2020-12-16T18:32:24Z

enhancements/separating-provider-specific-code-in-the-installer.md

+}
+```
+
+Not all platforms implement all features. Optional features should be hidden behind a feature flag and a separate interface:


In other places where I've created driver-like APIs of this sort, repeating some code always gave us a result that was easier to maintain over time because a developer working on one driver does not have to untangle an inheritance hierarchy or track down the real implementation that may live somewhere else. In places where we could gain from code reuse, utility functions or types that are instantiated rather than inherited are simpler to test and more flexible.

Along the same lines, no-op implementations, even if they have to be provided by every platform, are simpler to understand than features flags and multiple interfaces.

dhellmann · 2020-12-16T18:35:45Z

enhancements/separating-provider-specific-code-in-the-installer.md

+
+### Step 4: Migrate providers
+
+Providers need to be migrated to the new API.


One of the biggest sources of friction we have had with "everyone needs to..." projects is prioritizing the work on the teams' backlogs. How many different teams will be involved in this part of the work? Are they all aware of this plan?

@dhellmann I have sent out this plan to all teamleads and managers affected as far as I could tell. The benefit of this plan is that no team needs to do anything. In addition, I'm working on a way to make sure that API calls added in the future don't create an immediate requirement to do anything for other teams. PoC will still take a while (probably until new year). The document will need to be partially updated.

staebler

I still do not see enough value in the monolithic interface to outweigh the effort and churn required to implement it. Perhaps with a PoC where I could see the proposed structure more clearly, I could be convinced. With that said, I am whole-heartedly in favor of the owners of specific providers making an effort to better separate their provider-specific code from the core installer code.

staebler · 2020-12-21T16:55:46Z

enhancements/separating-provider-specific-code-in-the-installer.md

+
+## Motivation
+
+Several parts of the installer code have large, [unwieldy switch-case structures](https://github.com/openshift/installer/blob/254680f6268a7b8e57091d788256c67b81755b5d/pkg/asset/cluster/tfvars.go#L166) that contain provider-specific codes. This makes it hard to keep track of the code that serves a provider, and it makes it impossible to write component-level tests. Creating a separated API reduces friction between teams and makes changes easier to merge because the risk of accidental side effects between providers is reduced. 


As I see it, there are three motivations.

You want to be able to find code that serves a provider more easily.

You want to be able to write component-level tests for a specific provider.

You want to be able to merge code specific to a provider without having to get reviews from people on other teams.

Creating a separate API is a solution not a motivation. I contend that you can satisfy all three motivations without have a separate API. As a maintainer of a platform, you are at liberty to extract whatever platform-specific code there is from the unwieldy switch statement and put it into a function in a directory that you own. You can then run whatever component-level tests you like on that function.

staebler · 2020-12-21T19:26:45Z

enhancements/separating-provider-specific-code-in-the-installer.md

+
+### Step 1: Create a Go interface
+
+We should create a Go interface that describes the functionality each provider must implement. For example, from the prototype:


This interface will have 20-30 functions. Some of these functions are going to be difficult to describe if the interface is far from the asset where it is used. As an example, the MCS Certificate asset has different logic for specifying the IP addresses and DNS names for the cert based on the platform. I personally prefer to have the logic for each asset located close to the asset, particularly where the logic is more closely coupled to the asset than the provider.

The logic for some assets that are very specific to a platform would do better farther away from the asset. One example of this is the Platform Credentials Check asset. The credentials check could stand on its own.

As an example, the MCS Certificate asset has different logic for specifying the IP addresses and DNS names for the cert based on the platform. I personally prefer to have the logic for each asset located close to the asset, particularly where the logic is more closely coupled to the asset than the provider.

This makes sense for readability. A counter-argument in favor of the interface, which is not included in this enhancement, is that the current implementation creates a dependency issue. We can see in the code that creating an asset depends on all platforms, creating an oVirt MCS Certificate asset depends on the AWS platform. If we inject the dependencies through an interface, we would resolve these dependency issues.

These dependencies are "theoretically wrong," but I'm not sure if they pose a practical problem.

staebler · 2020-12-21T19:38:03Z

enhancements/separating-provider-specific-code-in-the-installer.md

+
+### Version Skew Strategy
+
+As the API develops it presents the problem of version skew. As new API calls are introduced they need to either be added to all providers. If that is not possible a new interface should be introduced and a compatibility wrapper should be added. Once all providers are migrated the temporary interface can be removed.


This point is a big concern for me. Let's say that a provider needs a new parameter. Either we update every provider at once to add the new parameter. Or we support a new function with the new parameter. In the latter case, I would expect the providers that do not need the new parameter will never update to the new version of the function. Then, later, when a new parameter is added again, we get a third function. As time goes on, I could see every provider ending up with their own function. I look at the machine assets as an example. Every provider has a MachineSets and Machines function, but each one has its own unique signature.

This issue has been addressed in the PoC (see below) I will update the document later.

As time goes on, I could see every provider ending up with their own function. I look at the machine assets as an example. Every provider has a MachineSets and Machines function, but each one has its own unique signature.

The parameters in an interface's function signature to generate an asset should just match the dependencies for that asset. Each platform/provider should have a signature that takes in all the dependencies and then returns whatever is needed to generate the asset (i.e. whatever is coming out of the switch statement). The PoC could be improved in this regard, I will leave a comment there.

The machine config is pretty easy to standardize as far as I can tell. In my previous PoC I did this:

// GetMasterMachines returns a pool of machine configurations to create for the master nodes. GetMasterMachines( ic *types.InstallConfig, pool *types.MachinePool, rhcosImage *rhcos.Image, clusterID *installconfig.ClusterID, ) ([]machineapi.Machine, error) // AddWorkerMachines adds a set of machines to the worker pool. AddWorkerMachines( ic *types.InstallConfig, pool *types.MachinePool, rhcosImage *rhcos.Image, clusterID *installconfig.ClusterID, machineSets []runtime.Object, ) error

The API extraction is a rather simple process: there are a number of variables available there the provider-specific code is located and those make it into the API.

ghost · 2021-01-04T12:26:09Z

@mdbooth @staebler @dhellmann @pierreprinetti thank you for your input. I have a PoC ready that addresses some of the issues raised here: openshift/installer#4517

patrickdillon · 2021-01-04T16:19:32Z

enhancements/separating-provider-specific-code-in-the-installer.md

+
+## Motivation
+
+Several parts of the installer code have large, [unwieldy switch-case structures](https://github.com/openshift/installer/blob/254680f6268a7b8e57091d788256c67b81755b5d/pkg/asset/cluster/tfvars.go#L166) that contain provider-specific codes. This makes it hard to keep track of the code that serves a provider, and it makes it impossible to write component-level tests. Creating a separated API reduces friction between teams and makes changes easier to merge because the risk of accidental side effects between providers is reduced. 


makes it impossible to write component-level tests

What do you mean by "component-level tests"? Many platforms have platform-specific unit tests. For example, here are the GCP tests for validating install configs.

We (oVirt) have several parts of our code where half of the functionality is in a switch-case, the other half is in a switch-case. It would be helpful to have tests that run our code only without needing to test the whole installer. That makes test runs quicker and less flaky.

You can't really call them unit tests because they still require external API's, which is a separate issue to be addressed.

If you look at the GCP example I linked above you can see how we mocked out the GCP APIs.

openshift-bot · 2021-04-05T12:28:08Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2021-05-05T16:34:00Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

Initial draft

e18fd40

openshift-ci-robot requested review from abhinavdahiya and stevekuznetsov December 15, 2020 12:41

openshift-ci-robot requested review from crawford, sdodson and staebler and removed request for stevekuznetsov December 15, 2020 15:11

pierreprinetti reviewed Dec 15, 2020

View reviewed changes

dhellmann reviewed Dec 16, 2020

View reviewed changes

staebler reviewed Dec 21, 2020

View reviewed changes

ghost mentioned this pull request Jan 4, 2021

WIP: Moved oVirt installconfig to externalprovider openshift/installer#4517

Closed

patrickdillon reviewed Jan 4, 2021

View reviewed changes

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 5, 2021

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 5, 2021

ghost closed this May 5, 2021

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separating provider-specific code in the installer #566

Separating provider-specific code in the installer #566

ghost commented Dec 15, 2020

openshift-ci-robot commented Dec 15, 2020

mdbooth commented Dec 15, 2020

sdodson commented Dec 15, 2020

pierreprinetti left a comment

pierreprinetti Dec 15, 2020

ghost Dec 16, 2020

pierreprinetti Dec 16, 2020

dhellmann Dec 16, 2020

pierreprinetti Dec 15, 2020

pierreprinetti Dec 15, 2020

ghost Dec 16, 2020

pierreprinetti Dec 15, 2020

ghost Dec 16, 2020

pierreprinetti Dec 16, 2020 •

edited

Loading

dhellmann Dec 16, 2020

dhellmann Dec 16, 2020

ghost Dec 16, 2020 •

edited by ghost

Loading

staebler left a comment

staebler Dec 21, 2020

staebler Dec 21, 2020

patrickdillon Jan 4, 2021

staebler Dec 21, 2020

ghost Jan 4, 2021

patrickdillon Jan 4, 2021

ghost Jan 5, 2021 •

edited by ghost

Loading

ghost commented Jan 4, 2021

patrickdillon Jan 4, 2021

ghost Jan 4, 2021

ghost Jan 4, 2021

patrickdillon Jan 4, 2021

openshift-bot commented Apr 5, 2021

openshift-bot commented May 5, 2021


		### Step 2: Implement registry

		In this step we should implement a registry pattern where providers can register with their names.


		### Version Skew Strategy

		As the API develops it presents the problem of version skew. As new API calls are introduced they need to either be added to all providers. If that is not possible a new interface should be introduced and a compatibility wrapper should be added. Once all providers are migrated the temporary interface can be removed.


		### Step 4: Migrate providers

		Providers need to be migrated to the new API.


		## Motivation

		Several parts of the installer code have large, [unwieldy switch-case structures](https://github.com/openshift/installer/blob/254680f6268a7b8e57091d788256c67b81755b5d/pkg/asset/cluster/tfvars.go#L166) that contain provider-specific codes. This makes it hard to keep track of the code that serves a provider, and it makes it impossible to write component-level tests. Creating a separated API reduces friction between teams and makes changes easier to merge because the risk of accidental side effects between providers is reduced.


		### Step 1: Create a Go interface

		We should create a Go interface that describes the functionality each provider must implement. For example, from the prototype:

Separating provider-specific code in the installer #566

Separating provider-specific code in the installer #566

Conversation

ghost commented Dec 15, 2020

openshift-ci-robot commented Dec 15, 2020

mdbooth commented Dec 15, 2020

sdodson commented Dec 15, 2020

pierreprinetti left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pierreprinetti Dec 16, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost Dec 16, 2020 • edited by ghost Loading

Choose a reason for hiding this comment

staebler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost Jan 5, 2021 • edited by ghost Loading

Choose a reason for hiding this comment

ghost commented Jan 4, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-bot commented Apr 5, 2021

openshift-bot commented May 5, 2021

pierreprinetti Dec 16, 2020 •

edited

Loading

ghost Dec 16, 2020 •

edited by ghost

Loading

ghost Jan 5, 2021 •

edited by ghost

Loading