Skip to content

Commit

Permalink
proposals: new API for managed Security Groups
Browse files Browse the repository at this point in the history
  • Loading branch information
EmilienM committed Dec 1, 2023
1 parent 8a49b9f commit 41d77ff
Showing 1 changed file with 225 additions and 0 deletions.
225 changes: 225 additions & 0 deletions docs/proposals/20231122-security-groups.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
---
title: Flexible managed security sroups
authors:
- "@emilienm"
reviewers:
- "@jichenjc"
- "@mdbooth"
creation-date: 2023-11-22
last-updated: 2023-12-01

status: provisional
---

## Title

Flexible managed security groups

## Table of Contents

- [Flexible managed security groups](#title)
- [Table of Contents](#table-of-contents)
- [Glossary](#glossary)
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals/Future Work](#non-goalsfuture-work)
- [Proposal](#proposal)
- [User Stories](#user-stories)
- [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
- [Risks and Mitigations](#risks-and-mitigations)
- [Alternatives](#alternatives)
- [Upgrade Strategy](#upgrade-strategy)
- [Additional Details](#additional-details)
- [Test Plan [optional]](#test-plan-optional)
- [Graduation Criteria [optional]](#graduation-criteria-optional)
- [Version Skew Strategy [optional]](#version-skew-strategy-optional)
- [Implementation History](#implementation-history)

## Glossary

- CRD - Custom Resource Definition is an extension of the Kubernetes API that is not necessarily available in a default Kubernetes installation.
It represents a new object type, and allows users to create and define custom resources.
- CNI - Container Network Interface (CNI) is a specification for configuring network interfaces for Linux containers.
CNI defines the specification of a REST API for network drivers to configure network interfaces in Linux containers.
CNI also defines a specification of the configuration file format for the network drivers to plug into container runtime environments.
- Controller - A controller is a control loop that watches the shared state of the cluster through the API server and makes changes
attempting to move the current state towards the desired state. Examples of controllers in CAPO are OpenStackCluster and OpenStackMachine.
- Egress - Egress traffic is traffic that is leaving the cluster.
- Ingress - Ingress traffic is traffic that is coming into the cluster.
- Security Group - A security group is a named set of rules that control traffic to and from virtual machine instances.
The rules can specify which ports and protocols to allow, and which IP address ranges to allow traffic to or from.
Security groups can be associated with multiple instances, and multiple security groups can be associated with a single instance.
Security groups act as a firewall for associated instances, controlling both inbound and outbound traffic at the instance level.

## Summary

This proposal moves the management of security groups into a dedicated controller and introduces a new API to manage more security groups and rules.

## Motivation

We want users to be able to not think about the security groups that are needed for their Control Plane and Worker Nodes. However, we want to be able to provide a way for users to have more control over the managed security groups and not have to worry about rules that are not needed by their cluster. They will be able to add additional security groups and rules to the cluster and the controller will reconcile the machines accordingly.

### Goals

1. Move the management of security groups into a dedicated controller since we will have to reconcile the assignement of security groups to the machines.
1. Create a new API to manage more security groups and rules:
1. Default rules provided by `OpenStackCluster.spec.managedSecurityGroups` will be managed by the new controller and the new API will be used to managed additional security groups and rules.
1. Provide a migration path for Calico users (via a new field that will be deprecated from day 1: `OpenStackCluster.spec.useLegacyCalicoRules` and its default to `False`). This parameter might be removed in the future.
1. Successfully be able to new manage security groups rules for the Bastion, the Control Plane and the Worker Nodes.
1. Successfully be able to manage additional security groups that later can be used in the Machine Template. Their lifecycle has to be managed by the new controller as well in order to update the machines with the right security groups.

### Non-Goals/Future Work

1. Create a new CRD is not a goal since ManagedSecurityGroups feels like a property of the cluster, so we'll keep the `OpenStackCluster.Spec.ManagedSecurityGroups` for now.
1. Removing pre-existing security groups support in `OpenStackMachineTemplate`, via `OpenStackMachineSpec` and for the Bastion in `OpenStackCluster`.
1. Deprecate `OpenStackCluster.spec.allowAllInClusterTraffic`, as we'll still need it.

## Proposal

### User Stories

- When creating a cluster as an operator, I don't want to worry about the security groups that are needed for my Control Plane and Worker Nodes to be functional and
I want these security groups to be managed automatically.

- As an operator, I can already pre-create security groups and provide them to the Machine Templates. However, to remove that burden from the user,
I now want to let the controller manage the lifecycle of these security groups.
and have a new API to provide the additional security groups and their rules.

- As an operator, I want to be able to manage additional security group rules for the Bastion, the Control Plane and the Worker Nodes in order to provide more
rules than the defaults, so I have more flexibility in my cluster.

- As an operator, I want to be able to manage additional security group rules for the CNI so I have more options than Calico. I need to provide my CNI's specific security group rules
that I need applied to all the machines, so I have more flexibility in my cluster.

### Implementation Details/Notes/Constraints

The plan is to create a new controller and migrate the management of security groups and their rules in there when the `OpenStackCluster.Spec.ManagedSecurityGroups` is set to `True`.

A cluster operator will be able to add additional security groups and rules to the cluster by adding them to the `OpenStackCluster` spec.
The controller will then reconcile the security groups and their rules to the cluster between what's in the spec and what's in the status.
Special care will be taken to not remove security groups that are still in use by the machines.
Also, during an upgrade, the controller will make sure to add the new security groups and their rules to the machines before reducing the default rules
which won't have the Calico rules anymore.

#### Data model changes

The type for `OpenStackCluster.Spec.ManagedSecurityGroups` was a boolean but will become a structure with new fields.

Example `ManagedSecurityGroups` yaml:

```yaml
managedSecurityGroups:
# Enable the management of security groups with the default rules (kubelet, etcd, etc.)
# The default stays false
enabled: true
# Allow to extend the default security group rules for the Bastion,
# the Control Plane and the Worker security groups
additionalBastionSecurityGroupRules:
- direction: ingress
ethertype: IPv4
portRangeMin: 1234
portRangeMax: 1234
protocol: tcp
additionalControlPlaneSecurityGroupRules:
- direction: ingress
ethertype: IPv4
portRangeMin: 1234
portRangeMax: 1234
protocol: tcp
additionalWorkerSecurityGroupRules:
- direction: ingress
ethertype: IPv4
portRangeMin: 1234
portRangeMax:1234
protocol: tcp
# Allow to provide rules that will be applied to all nodes
allNodesSecurityGroupRules:
- direction: ingress
ethertype: IPv4
portRangeMin: 1234
portRangeMax: 1234
protocol: tcp
# Allow to provide additional security groups and rules that can be used in the Machine Template
additionalSecurityGroups:
- name: my-security-group
description: My security group
rules:
- direction: ingress
ethertype: IPv4
portRangeMin: 1234
portRangeMax: 1234
protocol: tcp
# When set to `True`, the controller will add the Calico rules to the All nodes security group and update
# the machines before the default rules are removed from the managed security groups
useLegacyCalicoRules: false
```
#### Enable or disable the management of security groups
`OpenStackCluster.Spec.ManagedSecurityGroups.enabled` will be used to enable or disable the management of security groups in general.
It was a boolean and will now be a structure with new fields.

Conversion will be done automatically for the user.

#### Additional security rules

`OpenStackCluster.Spec.ManagedSecurityGroups.additionalBastionSecurityGroupRules` will be used to add additional security rules to the Bastion security group.
`OpenStackCluster.Spec.ManagedSecurityGroups.additionalControlPlaneSecurityGroupRules` will be used to add additional security rules to the Control Plane security group.
`OpenStackCluster.Spec.ManagedSecurityGroups.additionalWorkerSecurityGroupRules` will be used to add additional security rules to the Worker security group.

Theses rules will be managed by the new controller and added to respectively the Bastion, Control Plane and Worker security groups status.
A user will be able to add or remove rules from the spec and the controller will reconcile the status accordingly and update the security groups.

#### Additional security groups

`OpenStackCluster.Spec.ManagedSecurityGroups.additionalSecurityGroups` will be used to add additional security groups and their rules.
The security groups will be managed by the new controller and added to the status.
A user will be able to add or remove security groups from the spec and the controller will reconcile the status accordingly and update the security groups.

The controller will also ensure that the security groups are not removed if they are still in use by the machines and add them to the machines if they are not already there.

#### All nodes security group rules

`OpenStackCluster.Spec.ManagedSecurityGroups.allNodesSecurityGroupRules` will be used to add additional security rules to the a new security group applied to all nodes.
The rules will be managed by the new controller and added to the All ndoes security group status.
A user will be able to add or remove rules from the spec and the controller will reconcile the status accordingly and update the All nodes security group.

#### Migration path for Calico users

`OpenStackCluster.Spec.ManagedSecurityGroups.useLegacyCalicoRules` will be used to add the Calico rules to the All nodes security group and update the
machines before the default rules are removed from the managed security groups. This parameter might be removed in the future.

### Risks and Mitigations

Example risks:

- The new controller might not be able to manage the security groups and their rules correctly during an upgrade.

## Alternatives

No currently known alternatives exist which are public and have been implemented for CAPO.

## Upgrade Strategy

The upgrade strategy will be to add the new controller and migrate the management of security groups and their default rules in there when the `managedSecurityGroups.enabled` is set to `True`.
For Calico users, the `useLegacyCalicoRules` will be used to add the Calico rules to the All nodes security group and update the machines before the default rules are removed from the managed security groups.

Since we are changing the type of `OpenStackCluster.Spec.ManagedSecurityGroups` from a boolean to a structure, we will have to add a conversion function to convert the boolean to the structure.

## Additional Details

### Test Plan

- The new controller will be tested with unit tests.
- Unit tests will be added to the security group reconciler to make sure that the security groups and their rules are managed correctly.
- The exising e2e tests will help to make sure that the new controller is working as expected and no regression is introduced.

### Version Skew Strategy

The feature itself should not depend significantly on the version of CAPI and will be backwards compatible with old versions of CAPO since it will be adding new options. If there is a drift in CAPI and CAPO versions, the functionality should stay the same without breaking anything.

## Implementation History

- [ ] 11/21/2023: Open WIP PR [ https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/1751 ]
- [ ] 11/22/2023: Open this KEP PR

0 comments on commit 41d77ff

Please sign in to comment.