-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separating provider-specific code in the installer #566
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I'm in favour of this. The details of each 'API' can be hashed out separately one by one as cloud provider code is refactored into it. I don't think it's worth getting into that detail here, except to agree it in principal. |
/uncc @stevekuznetsov |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for proposing this!
We've been working in this direction in the OpenStack provider already, and I think that it would bring many advantages: clarity, more focus on the orchestration in common parts, a better integration of optional steps (eg pre-flight validation), and moreover, well-defined expectations, written in code.
A big 👍 from me!
} | ||
``` | ||
|
||
Not all platforms implement all features. Optional features should be hidden behind a feature flag and a separate interface: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional platform-specific features, like validations, could be detected via type assertion.
This is what the orchestrator code could resemble to:
type InstallConfig struct{}
func validateCloud(ic InstallConfig, platform interface{}) {
if quotaValidator, ok := platform.(interface {
ValidateQuota(InstallConfig) error
}); ok {
if err := quotaValidator.ValidateQuota(ic); err != nil {
log.Fatal(err)
}
}
}
Each platform can them choose to implement some methods and not others, and the orchestrator code is aware what methods are available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there is a much better way to do this as I found out. We can create a struct that has receivers for the default functionality and providers can embed this default struct. (Basically, OOP inheritance.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand. The way I'm figuring this, Installer-core defines a struct with optional methods defined as no-ops, then the Platform embeds it in a struct where it can override the optional noops with actual logic, then the Installer-core imports that... I see a circular dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other places where I've created driver-like APIs of this sort, repeating some code always gave us a result that was easier to maintain over time because a developer working on one driver does not have to untangle an inheritance hierarchy or track down the real implementation that may live somewhere else. In places where we could gain from code reuse, utility functions or types that are instantiated rather than inherited are simpler to test and more flexible.
Along the same lines, no-op implementations, even if they have to be provided by every platform, are simpler to understand than features flags and multiple interfaces.
|
||
## Proposal | ||
|
||
### Step 1: Create a Go interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of defining a single all-encompassing interface, each "installer target" (install-config
, manifests
, machines
etc) could define its own required interface, ideally with just one method each. This way, each logic part can be easily tested and we get cleanly segregated interfaces.
|
||
### Step 2: Implement registry | ||
|
||
In this step we should implement a registry pattern where providers can register with their names. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the OpenStack provider, we have started the effort of defining a clean provider interface for Installer steps. For example, for cloudproviderconfig
we have come up with this signature:
func GenerateCloudProviderConfig(installConfig types.InstallConfig) (cloudProviderConfigData, cloudProviderConfigCABundleData string, err error)
We would be happy to be guinea pigs for this RFE and help defining and testing interface signatures in-vivo for the other steps as well! We also have that in our backlog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There ought to be a saying about two guinea pigs being better than one, we (oVirt) are also wanting to have this. Let's do it!
|
||
### Version Skew Strategy | ||
|
||
As the API develops it presents the problem of version skew. As new API calls are introduced they need to either be added to all providers. If that is not possible a new interface should be introduced and a compatibility wrapper should be added. Once all providers are migrated the temporary interface can be removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New function signatures could be implemented as new "optional" interfaces a provider object can implement, with the mechanism of interface assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above, we don't even need to do interface assertion. (Interface assertion is not nice to read and incurs somewhat of a performance penalty.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't it be nice? 😁 Conceptually, it's really what we want here: "dear Platform type, do you implement this method?"
As for the performance penalty: I tend to value readability higher, especially when the total penalty is not expected to pass the 1-second threshold in a 40-minutes process. I would even value aesthetics higher than performance here 😛
} | ||
``` | ||
|
||
Not all platforms implement all features. Optional features should be hidden behind a feature flag and a separate interface: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other places where I've created driver-like APIs of this sort, repeating some code always gave us a result that was easier to maintain over time because a developer working on one driver does not have to untangle an inheritance hierarchy or track down the real implementation that may live somewhere else. In places where we could gain from code reuse, utility functions or types that are instantiated rather than inherited are simpler to test and more flexible.
Along the same lines, no-op implementations, even if they have to be provided by every platform, are simpler to understand than features flags and multiple interfaces.
|
||
### Step 4: Migrate providers | ||
|
||
Providers need to be migrated to the new API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the biggest sources of friction we have had with "everyone needs to..." projects is prioritizing the work on the teams' backlogs. How many different teams will be involved in this part of the work? Are they all aware of this plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dhellmann I have sent out this plan to all teamleads and managers affected as far as I could tell. The benefit of this plan is that no team needs to do anything. In addition, I'm working on a way to make sure that API calls added in the future don't create an immediate requirement to do anything for other teams. PoC will still take a while (probably until new year). The document will need to be partially updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still do not see enough value in the monolithic interface to outweigh the effort and churn required to implement it. Perhaps with a PoC where I could see the proposed structure more clearly, I could be convinced. With that said, I am whole-heartedly in favor of the owners of specific providers making an effort to better separate their provider-specific code from the core installer code.
|
||
## Motivation | ||
|
||
Several parts of the installer code have large, [unwieldy switch-case structures](https://github.com/openshift/installer/blob/254680f6268a7b8e57091d788256c67b81755b5d/pkg/asset/cluster/tfvars.go#L166) that contain provider-specific codes. This makes it hard to keep track of the code that serves a provider, and it makes it impossible to write component-level tests. Creating a separated API reduces friction between teams and makes changes easier to merge because the risk of accidental side effects between providers is reduced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I see it, there are three motivations.
- You want to be able to find code that serves a provider more easily.
- You want to be able to write component-level tests for a specific provider.
- You want to be able to merge code specific to a provider without having to get reviews from people on other teams.
Creating a separate API is a solution not a motivation. I contend that you can satisfy all three motivations without have a separate API. As a maintainer of a platform, you are at liberty to extract whatever platform-specific code there is from the unwieldy switch statement and put it into a function in a directory that you own. You can then run whatever component-level tests you like on that function.
|
||
### Step 1: Create a Go interface | ||
|
||
We should create a Go interface that describes the functionality each provider must implement. For example, from the prototype: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface will have 20-30 functions. Some of these functions are going to be difficult to describe if the interface is far from the asset where it is used. As an example, the MCS Certificate asset has different logic for specifying the IP addresses and DNS names for the cert based on the platform. I personally prefer to have the logic for each asset located close to the asset, particularly where the logic is more closely coupled to the asset than the provider.
The logic for some assets that are very specific to a platform would do better farther away from the asset. One example of this is the Platform Credentials Check asset. The credentials check could stand on its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an example, the MCS Certificate asset has different logic for specifying the IP addresses and DNS names for the cert based on the platform. I personally prefer to have the logic for each asset located close to the asset, particularly where the logic is more closely coupled to the asset than the provider.
This makes sense for readability. A counter-argument in favor of the interface, which is not included in this enhancement, is that the current implementation creates a dependency issue. We can see in the code that creating an asset depends on all platforms, creating an oVirt MCS Certificate asset depends on the AWS platform. If we inject the dependencies through an interface, we would resolve these dependency issues.
These dependencies are "theoretically wrong," but I'm not sure if they pose a practical problem.
|
||
### Version Skew Strategy | ||
|
||
As the API develops it presents the problem of version skew. As new API calls are introduced they need to either be added to all providers. If that is not possible a new interface should be introduced and a compatibility wrapper should be added. Once all providers are migrated the temporary interface can be removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This point is a big concern for me. Let's say that a provider needs a new parameter. Either we update every provider at once to add the new parameter. Or we support a new function with the new parameter. In the latter case, I would expect the providers that do not need the new parameter will never update to the new version of the function. Then, later, when a new parameter is added again, we get a third function. As time goes on, I could see every provider ending up with their own function. I look at the machine assets as an example. Every provider has a MachineSets and Machines function, but each one has its own unique signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue has been addressed in the PoC (see below) I will update the document later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As time goes on, I could see every provider ending up with their own function. I look at the machine assets as an example. Every provider has a MachineSets and Machines function, but each one has its own unique signature.
The parameters in an interface's function signature to generate an asset should just match the dependencies for that asset. Each platform/provider should have a signature that takes in all the dependencies and then returns whatever is needed to generate the asset (i.e. whatever is coming out of the switch statement). The PoC could be improved in this regard, I will leave a comment there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The machine config is pretty easy to standardize as far as I can tell. In my previous PoC I did this:
// GetMasterMachines returns a pool of machine configurations to create for the master nodes.
GetMasterMachines(
ic *types.InstallConfig,
pool *types.MachinePool,
rhcosImage *rhcos.Image,
clusterID *installconfig.ClusterID,
) ([]machineapi.Machine, error)
// AddWorkerMachines adds a set of machines to the worker pool.
AddWorkerMachines(
ic *types.InstallConfig,
pool *types.MachinePool,
rhcosImage *rhcos.Image,
clusterID *installconfig.ClusterID,
machineSets []runtime.Object,
) error
The API extraction is a rather simple process: there are a number of variables available there the provider-specific code is located and those make it into the API.
@mdbooth @staebler @dhellmann @pierreprinetti thank you for your input. I have a PoC ready that addresses some of the issues raised here: openshift/installer#4517 |
|
||
## Motivation | ||
|
||
Several parts of the installer code have large, [unwieldy switch-case structures](https://github.com/openshift/installer/blob/254680f6268a7b8e57091d788256c67b81755b5d/pkg/asset/cluster/tfvars.go#L166) that contain provider-specific codes. This makes it hard to keep track of the code that serves a provider, and it makes it impossible to write component-level tests. Creating a separated API reduces friction between teams and makes changes easier to merge because the risk of accidental side effects between providers is reduced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes it impossible to write component-level tests
What do you mean by "component-level tests"? Many platforms have platform-specific unit tests. For example, here are the GCP tests for validating install configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We (oVirt) have several parts of our code where half of the functionality is in a switch-case, the other half is in a switch-case. It would be helpful to have tests that run our code only without needing to test the whole installer. That makes test runs quicker and less flaky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't really call them unit tests because they still require external API's, which is a separate issue to be addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the GCP example I linked above you can see how we mocked out the GCP APIs.
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
This change introduces an internal API in the installer code to move provider-specific code into a separate folder and avoid having large switch-case structures in the installer code.