Skip to content

Commit

Permalink
Add dedicated instances proposal
Browse files Browse the repository at this point in the history
  • Loading branch information
Alexander Demichev authored and alexander-demicev committed Aug 18, 2021
1 parent 494c4df commit 9da7433
Showing 1 changed file with 197 additions and 0 deletions.
197 changes: 197 additions & 0 deletions enhancements/machine-api/dedicated-instances.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
---
title: dedicated-instances
authors:
- "@alexander-demichev"
reviewers:
- "@JoelSpeed"
- "@enxebre"
approvers:
- "@JoelSpeed"
- "@enxebre"
creation-date: 2020-09-01
last-updated: 2020-09-01
status: provisional
---

# Dedicated instances

## Release Signoff Checklist

- [ ] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)

## Summary

Make it possible for users to create machines which run as dedicated instances. Dedicated instances are instances
that usually run on hardware that's dedicated to a single customer.

## Motivation

Some organizations need to make sure that their workloads are not hosted on the same physical hardware as others.

### Goals

- Provide automation similar to what Machine API supports for spot instances.

- Expose a field in the machine API that enables consumers to choose dedicated tenancy.

### Non-Goals

- TODO

## Proposal

In order to give users the ability to run their workloads on dedicated instances we should do the following things for AWS, GCP and Azure:

- Add ability to enable dedicated instances using Machine's provider spec.

- Validate that provider spec doesn't contain spot instances configuration and dedicated instances at the same time when it's not supported by the cloud provider. The only provider that currently supports this case is [AWS](https://aws.amazon.com/about-aws/whats-new/2017/01/amazon-ec2-spot-instances-now-support-dedicated-tenancy/#:~:text=Dedicated%20Spot%20instances%20work%20the,belong%20to%20other%20AWS%20accounts)

### Implementation Details

For each of the cloud providers that support dedicated instances the implementation will be different.

#### AWS

`Dedicated Instances are Amazon EC2 instances that run in a virtual private cloud (VPC) on hardware that's dedicated to a single customer.`. [AWS Documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/dedicated-instance.html). Each launched instance has a tenancy attribute and it can be configured similar to how we set availability zone.

```go
placement = &ec2.Placement{
AvailabilityZone: aws.String(machineProviderConfig.Placement.AvailabilityZone),
Tenancy: aws.String(machineProviderConfig.Placement.Tenancy)
}
```

That change will require adding `Tenancy` field to provider spec.

```go
type AWSMachineProviderConfig struct {
// existing fields
...

// Placement specifies where to create the instance in AWS
Placement Placement `json:"placement"`
}

// Placement indicates where to create the instance in AWS
type Placement struct {
// existing fields
...

// Tenancy indicates tenant policy for instance
// +kubebuilder:validation:Enum:=default,dedicated,host
// +kubebuilder:default:=default
Tenancy string
}

```

AWS provides support for spot instances on dedicated tenancy, we need to make sure that this case is also tested.
[AWS Documentation](https://aws.amazon.com/about-aws/whats-new/2017/01/amazon-ec2-spot-instances-now-support-dedicated-tenancy/#:~:text=Dedicated%20Spot%20instances%20work%20the,belong%20to%20other%20AWS%20accounts.)

#### Azure

In order to make dedicated VMs work on Azure we need to understand the concept of host groups and hosts.
[Azure documentation](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/dedicated-hosts).

```text
A host group is a resource that represents a collection of dedicated hosts. You create a host group in a region and an availability zone, and add hosts to it.
A host is a resource, mapped to a physical server in an Azure data center. The physical server is allocated when the host is created. A host is created within a host group. A host has a SKU describing which VM sizes can be created. Each host can host multiple VMs, of different sizes, as long as they are from the same size series.
When creating a VM in Azure, you can select which dedicated host to use for your VM. You have full control as to which VMs are placed on your hosts.
```

The problem here are standard quotas: for host of type `DSv3-Type1` we can create only 32 VM of type `Standard_D2s_v3`(default type for worker VMs). To request a quota increase, the users are required to create a support request. This part should be well documented

The required API change is adding host name field `Host` to provider spec.

```go
type AzureMachineProviderConfig struct {
// existing fields
...

// Host name of physical server that hosts the virtual machine
// +optional
Host string
}
```

#### GCP

GCP requires `Node Templates` and `Node Groups` to be able to create dedicated instances. [GCP documentation](https://cloud.google.com/compute/docs/nodes/sole-tenant-nodes).

```text
Node templates
A node template is a regional resource that defines the properties of each node in a node group.
Node groups and VM provisioning
Sole-tenant node templates define the properties of a node group, and you must create a node template before creating a node group in a Google Cloud zone. When you create a group, specify the maintenance policy for VM instances on the node group, and the number of nodes for the node group.
A node group can have zero or more nodes; for example, you can reduce the number of nodes in a node group to zero when you don't need to run any VM instances on nodes in the group, or you can enable the node group autoscaler to manage the size of the node group automatically.
```

In order to be able to create a VM on a dedicated host we should introduce `NodeGroup` API field to provider spec.

We should document that node groups have resource capacities which limit the number of VMs, unless the node group autoscaler is enabled.

```text
type GCPMachineProviderConfig struct {
// existing fields
...
// NodeGroup name of node group that hosts the virtual machine
// +optional
NodeGroup string
}
```

### Risks and Mitigations

- Different quotas issues in Azure. See [this](#azure) section for more details.
- Azure and GCP will require some [configurations](#infrastructure-needed) on cloud provider side to be made before creating a dedicated instance.

#### Autoscaling

Autoscaling dedicated instances can be a problem because dedicated hosts have quotas and limits on provider side. We should provide good documentation here.

#### Limited resources for Azure and GCP

To avoid misconfiguration we should document that users are responsible for capacity of their dedicated host on Azure and node groups on GCP

## Design Details

### Open Questions

### Test Plan

### Graduation Criteria

#### Examples

##### Dev Preview -> Tech Preview

##### Tech Preview -> GA

##### Removing a deprecated feature

### Upgrade / Downgrade Strategy

### Version Skew Strategy

## Implementation History

## Drawbacks

## Alternatives

## Infrastructure Needed

- For GCP, our CI environment should have proper `Node Template` and `Node Group` created.

- For Azure, our CI environment should have `Host Group` and `Host` created.

0 comments on commit 9da7433

Please sign in to comment.