Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flatcar nodes #322

Merged
merged 44 commits into from
Jun 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
3194607
enable-flatcar-os
calvix Mar 31, 2023
77c773c
fix-nodes
calvix Mar 31, 2023
93e4af4
enable-flatcar-os
calvix Mar 31, 2023
78f18a9
Merge branch 'master' into enable-flatcar-os
calvix Jun 8, 2023
cbc9f65
enable-flatcar-os
calvix Jun 8, 2023
28cb207
Merge branch 'enable-flatcar-os' of github.com:giantswarm/cluster-aws…
calvix Jun 8, 2023
5a6ce8d
enable-flatcar-os
calvix Jun 8, 2023
1382a58
enable-flatcar-os
calvix Jun 8, 2023
706b978
enable-flatcar-os
calvix Jun 8, 2023
b50b496
enable-flatcar-os
calvix Jun 8, 2023
c0c2045
enable-flatcar-os
calvix Jun 8, 2023
b88667e
enable-flatcar-os
calvix Jun 8, 2023
1c721da
flatcar-nodes
calvix Jun 9, 2023
e210fac
enable-flatcar-os
calvix Jun 13, 2023
1fc7849
rollback
calvix Jun 13, 2023
de90445
Merge branch 'master' into enable-flatcar-os
calvix Jun 13, 2023
49c0b8b
Merge branch 'enable-flatcar-os' into flatcar-nodes
calvix Jun 13, 2023
eb1418d
enable-flatcar-os
calvix Jun 13, 2023
c9fd313
enable-flatcar-os
calvix Jun 13, 2023
e4fdb01
enable-flatcar-os
calvix Jun 14, 2023
fe7f7a2
enable-flatcar-os
calvix Jun 14, 2023
ead0893
Merge branch 'enable-flatcar-os' into flatcar-nodes
calvix Jun 14, 2023
a92025b
flatcar-nodes
calvix Jun 14, 2023
7035bdb
flatcar-nodes
calvix Jun 14, 2023
a0f28a4
flatcar-nodes
calvix Jun 14, 2023
210db8d
enable-flatcar-os
calvix Jun 14, 2023
6f03a3c
Merge branch 'enable-flatcar-os' into flatcar-nodes
calvix Jun 14, 2023
89de7e2
enable-flatcar-os
calvix Jun 14, 2023
6a45a32
Merge branch 'enable-flatcar-os' into flatcar-nodes
calvix Jun 14, 2023
a38021f
ntp
calvix Jun 14, 2023
f5fc1d2
flatcar-nodes
calvix Jun 14, 2023
f2f75d6
flatcar-nodes
calvix Jun 14, 2023
6b54408
merge
calvix Jun 14, 2023
5f8b3b8
fix
calvix Jun 14, 2023
74dfc8c
flatcar-nodes
calvix Jun 15, 2023
9a56218
registry
calvix Jun 16, 2023
24d50b9
sandbox-image-from-values
calvix Jun 16, 2023
86c28af
normalize
calvix Jun 16, 2023
24890dc
flatcar-nodes
calvix Jun 16, 2023
bf75eb4
audit
calvix Jun 16, 2023
dc1b9b0
remove-comment
calvix Jun 20, 2023
5be86a6
comment
calvix Jun 20, 2023
c12e3d5
Merge branch 'master' into flatcar-nodes
calvix Jun 21, 2023
035f24c
changelog-breaking-change
calvix Jun 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Changed

- Use CAPBK to provision bastion node with Flatcar AMI.
- Use CAPBK to provision control plane nodes with Flatcar AMI.
- Use CAPBK to provision worker nodes with Flatcar AMI.
- Migrating from Ubuntu AMi to Flatcar AMI is a **breaking change** that requires manual steps.
- Apply default OS setting for flatcar and os hardening.
- Update CAPA CRs API version from `v1beta1` to `v1beta2`.
- Values schema: disallow additional properties on the `.nodePools` object. This is a **breaking change** where node pool names are in use that do not match the pattern `^[a-z0-9]{5,10}$`.

Expand Down
19 changes: 14 additions & 5 deletions helm/cluster-aws/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Properties within the `.providerSpecific` top-level object
| :----------- | :-------------- | :--------------- |
| `providerSpecific.ami` | **Amazon machine image (AMI)** - If specified, this image will be used to provision EC2 instances.|**Type:** `string`<br/>|
| `providerSpecific.awsClusterRoleIdentityName` | **Cluster role identity name** - Name of an AWSClusterRoleIdentity object. This in turn refers to the IAM role used to create all AWS cloud resources when creating the cluster. The role can be in another AWS account in order to create all resources in that account. Note: This name does not refer directly to an IAM role name/ARN.|**Type:** `string`<br/>**Value pattern:** `^[-a-zA-Z0-9_\.]{1,63}$`<br/>**Default:** `"default"`|
| `providerSpecific.flatcarAwsAccount` | **AWS account owning Flatcar image** - AWS account ID owning the Flatcar Container Linux AMI.|**Type:** `string`<br/>**Default:** `"075585003325"`|
| `providerSpecific.flatcarAwsAccount` | **AWS account owning Flatcar image** - AWS account ID owning the Flatcar Container Linux AMI.|**Type:** `string`<br/>**Default:** `"706635527432"`|
| `providerSpecific.region` | **Region**|**Type:** `string`<br/>|

### Connectivity
Expand Down Expand Up @@ -50,8 +50,12 @@ Properties within the `.connectivity` top-level object
| `connectivity.dns.mode` | **Mode** - Whether the Route53 hosted zone of this cluster should be public or private.|**Type:** `string`<br/>**Default:** `"public"`|
| `connectivity.dns.resolverRulesOwnerAccount` | **Resolver rules owner** - ID of the AWS account that created the resolver rules to be associated with the workload cluster VPC.|**Type:** `string`<br/>|
| `connectivity.network` | **Network**|**Type:** `object`<br/>|
| `connectivity.network.podCidr` | **Pod subnet** - IPv4 address range for pods, in CIDR notation.|**Type:** `string`<br/>**Default:** `"100.64.0.0/12"`|
| `connectivity.network.serviceCidr` | **Service subnet** - IPv4 address range for services, in CIDR notation.|**Type:** `string`<br/>**Default:** `"172.31.0.0/16"`|
| `connectivity.network.pods` | **Pods**|**Type:** `object`<br/>|
| `connectivity.network.pods.cidrBlocks` | **Pod subnets**|**Type:** `array`<br/>**Default:** `["100.64.0.0/12"]`|
| `connectivity.network.pods.cidrBlocks[*]` | **Pod subnet** - IPv4 address range for pods, in CIDR notation.|**Type:** `string`<br/>**Example:** `"10.244.0.0/16"`<br/>|
| `connectivity.network.services` | **Services**|**Type:** `object`<br/>|
| `connectivity.network.services.cidrBlocks` | **K8s Service subnets**|**Type:** `array`<br/>**Default:** `["172.31.0.0/16"]`|
| `connectivity.network.services.cidrBlocks[*]` | **Service subnet** - IPv4 address range for kubernetes services, in CIDR notation.|**Type:** `string`<br/>**Example:** `"172.31.0.0/16"`<br/>|
| `connectivity.network.vpcCidr` | **VPC subnet** - IPv4 address range to assign to this cluster's VPC, in CIDR notation.|**Type:** `string`<br/>**Default:** `"10.0.0.0/16"`|
| `connectivity.proxy` | **Proxy** - Whether/how outgoing traffic is routed through proxy servers.|**Type:** `object`<br/>|
| `connectivity.proxy.enabled` | **Enable**|**Type:** `boolean`<br/>|
Expand Down Expand Up @@ -85,7 +89,7 @@ Properties within the `.controlPlane` top-level object
| `controlPlane.apiMode` | **API mode** - Whether the Kubernetes API server load balancer should be reachable from the internet (public) or internal only (private).|**Type:** `string`<br/>**Default:** `"public"`|
| `controlPlane.containerdVolumeSizeGB` | **Containerd volume size (GB)**|**Type:** `integer`<br/>**Default:** `100`|
| `controlPlane.etcdVolumeSizeGB` | **Etcd volume size (GB)**|**Type:** `integer`<br/>**Default:** `100`|
| `controlPlane.instanceType` | **EC2 instance type**|**Type:** `string`<br/>**Default:** `"m5.xlarge"`|
| `controlPlane.instanceType` | **EC2 instance type**|**Type:** `string`<br/>**Default:** `"r6i.xlarge"`|
| `controlPlane.kubeletVolumeSizeGB` | **Kubelet volume size (GB)**|**Type:** `integer`<br/>**Default:** `100`|
| `controlPlane.machineHealthCheck` | **Machine health check**|**Type:** `object`<br/>|
| `controlPlane.machineHealthCheck.enabled` | **Enable**|**Type:** `boolean`<br/>**Default:** `true`|
Expand All @@ -112,7 +116,7 @@ For Giant Swarm internal use only, not stable, or not supported by UIs.
| :----------- | :-------------- | :--------------- |
| `internal.hashSalt` | **Hash salt** - If specified, this token is used as a salt to the hash suffix of some resource names. Can be used to force-recreate some resources.|**Type:** `string`<br/>|
| `internal.kubernetesVersion` | **Kubernetes version**|**Type:** `string`<br/>**Example:** `"1.24.7"`<br/>**Default:** `"1.24.10"`|
| `internal.nodePools` | **Default node pool**|**Type:** `object`<br/>**Default:** `{"def00":{"customNodeLabels":["label=default"],"instanceType":"m5.xlarge","minSize":3}}`|
| `internal.nodePools` | **Default node pool**|**Type:** `object`<br/>**Default:** `{"def00":{"customNodeLabels":["label=default"],"instanceType":"r6i.xlarge","minSize":3}}`|
| `internal.nodePools.PATTERN` | **Node pool**|**Type:** `object`<br/>**Key pattern:**<br/>`PATTERN`=`^[a-z0-9]{5,10}$`<br/>|
| `internal.nodePools.PATTERN.availabilityZones` | **Availability zones**|**Type:** `array`<br/>**Key pattern:**<br/>`PATTERN`=`^[a-z0-9]{5,10}$`<br/>|
| `internal.nodePools.PATTERN.availabilityZones[*]` | **Availability zone**|**Type:** `string`<br/>**Key pattern:**<br/>`PATTERN`=`^[a-z0-9]{5,10}$`<br/>|
Expand All @@ -130,6 +134,10 @@ For Giant Swarm internal use only, not stable, or not supported by UIs.
| `internal.nodePools.PATTERN.subnetTags` | **Subnet tags** - Tags to filter which AWS subnets will be used for this node pool.|**Type:** `array`<br/>**Key pattern:**<br/>`PATTERN`=`^[a-z0-9]{5,10}$`<br/>|
| `internal.nodePools.PATTERN.subnetTags[*]` | **Subnet tag**|**Type:** `object`<br/>**Key pattern:**<br/>`PATTERN`=`^[a-z0-9]{5,10}$`<br/>|
| `internal.nodePools.PATTERN.subnetTags[*].*` | **Tag value**|**Type:** `string`<br/>**Key pattern:**<br/>`PATTERN`=`^[a-z0-9]{5,10}$`<br/>**Value pattern:** `^[ a-zA-Z0-9\._:/=+-@]+$`<br/>|
| `internal.sandboxContainerImage` | **Kubectl image**|**Type:** `object`<br/>|
| `internal.sandboxContainerImage.name` | **Repository**|**Type:** `string`<br/>**Default:** `"giantswarm/pause"`|
| `internal.sandboxContainerImage.registry` | **Registry**|**Type:** `string`<br/>**Default:** `"quay.io"`|
| `internal.sandboxContainerImage.tag` | **Tag**|**Type:** `string`<br/>**Default:** `"3.9"`|

### Kubectl image
Properties within the `.kubectlImage` top-level object
Expand All @@ -148,6 +156,7 @@ Properties within the `.metadata` top-level object
| `metadata.description` | **Cluster description** - User-friendly description of the cluster's purpose.|**Type:** `string`<br/>|
| `metadata.name` | **Cluster name** - Unique identifier, cannot be changed after creation.|**Type:** `string`<br/>|
| `metadata.organization` | **Organization**|**Type:** `string`<br/>|
| `metadata.servicePriority` | **Service priority** - The relative importance of this cluster.|**Type:** `string`<br/>**Default:** `"highest"`|

### Node pools
Properties within the `.nodePools` top-level object
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,28 @@
version = 2

# recommended defaults from https://github.com/containerd/containerd/blob/main/docs/ops.md#base-configuration
# set containerd as a subreaper on linux when it is not running as PID 1
subreaper = true
# set containerd's OOM score
AndiDog marked this conversation as resolved.
Show resolved Hide resolved
oom_score = -999
disabled_plugins = []
[plugins."containerd.runtime.v1.linux"]
# shim binary name/path
shim = "containerd-shim"
# runtime binary name/path
runtime = "runc"
# do not use a shim when starting containers, saves on memory but
# live restore is not supported
no_shim = false

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
# setting runc.options unsets parent settings
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "{{ .Values.internal.sandboxContainerImage.registry }}/{{ .Values.internal.sandboxContainerImage.name }}:{{ .Values.internal.sandboxContainerImage.tag }}"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid a weird bug where containerd kills random pods - giantswarm/roadmap#1737
also configure sandbox pause container to use our registry instead of k8s gcr

[plugins."io.containerd.grpc.v1.cri".registry]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
{{- range $host, $config := .Values.connectivity.containerRegistries }}
Expand Down
24 changes: 24 additions & 0 deletions helm/cluster-aws/files/etc/sysctl.d/hardening.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
fs.inotify.max_user_watches = 16384
fs.inotify.max_user_instances = 8192
AndiDog marked this conversation as resolved.
Show resolved Hide resolved
kernel.kptr_restrict = 2
kernel.sysrq = 0
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.default.log_martians = 1
net.ipv4.tcp_timestamps = 0
net.ipv6.conf.all.accept_redirects = 0
net.ipv6.conf.default.accept_redirects = 0
# Increased mmapfs because some applications, like ES, need higher limit to store data properly
vm.max_map_count = 262144
# Reserved to avoid conflicts with kube-apiserver, which allocates within this range
net.ipv4.ip_local_reserved_ports=30000-32767
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2

# These are required for the kubelet '--protect-kernel-defaults' flag
# See https://github.com/giantswarm/giantswarm/issues/13587
vm.overcommit_memory=1
kernel.panic=10
kernel.panic_on_oops=1
2 changes: 2 additions & 0 deletions helm/cluster-aws/files/etc/systemd/timesyncd.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[Time]
NTP=169.254.169.123
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configure timesyncd instead of the ntp script

30 changes: 0 additions & 30 deletions helm/cluster-aws/files/opt/init-disks.sh

This file was deleted.

10 changes: 0 additions & 10 deletions helm/cluster-aws/files/opt/set-aws-ntp.sh

This file was deleted.

2 changes: 1 addition & 1 deletion helm/cluster-aws/templates/_bastion.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ ignition:
additionalConfig: |
systemd:
units:
{{- include "flatcarKubeadmService" $ | nindent 8 }}
{{- include "flatcarSystemdUnits" $ | nindent 8 }}
preKubeadmCommands:
{{ include "flatcarKubeadmPreCommands" $ }}
- systemctl restart sshd
Expand Down
26 changes: 18 additions & 8 deletions helm/cluster-aws/templates/_control_plane.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,19 @@ spec:
kind: AWSMachineTemplate
name: {{ include "resource.default.name" $ }}-control-plane-{{ include "hash" (dict "data" (include "controlplane-awsmachinetemplate-spec" $) "global" .) }}
kubeadmConfigSpec:
format: ignition
ignition:
containerLinuxConfig:
additionalConfig: |
systemd:
units:
{{- include "flatcarSystemdUnits" $ | nindent 14 }}
{{- include "diskStorageSystemdUnits" $ | nindent 14 }}
storage:
filesystems:
{{- include "diskStorageConfig" $ | nindent 14 }}
directories:
{{- include "nodeDirectories" $ | nindent 14 }}
clusterConfiguration:
# Avoid accessibility issues (e.g. on private clusters) and potential future rate limits for the default `registry.k8s.io`
imageRepository: docker.io/giantswarm
Expand Down Expand Up @@ -148,10 +161,9 @@ spec:
files:
{{- include "oidcFiles" . | nindent 4 }}
{{- include "sshFiles" . | nindent 4 }}
{{- include "diskFiles" . | nindent 4 }}
{{- include "irsaFiles" . | nindent 4 }}
{{- include "kubeletConfigFiles" . | nindent 4 }}
{{- include "awsNtpFiles" . | nindent 4 }}
{{- include "nodeConfigFiles" . | nindent 4 }}
{{- if .Values.connectivity.proxy.enabled }}{{- include "proxyFiles" . | nindent 4 }}{{- end }}
{{- include "kubernetesFiles" . | nindent 4 }}
{{- include "registryFiles" . | nindent 4 }}
Expand All @@ -167,9 +179,9 @@ spec:
cloud-provider: external
feature-gates: CronJobTimeZone=true
healthz-bind-address: 0.0.0.0
node-ip: '{{ `{{ ds.meta_data.local_ipv4 }}` }}'
node-ip: ${COREOS_EC2_IPV4_LOCAL}
v: "2"
name: '{{ `{{ ds.meta_data.local_hostname }}` }}'
name: ${COREOS_EC2_HOSTNAME}
{{- if .Values.controlPlane.customNodeTaints }}
{{- if (gt (len .Values.controlPlane.customNodeTaints) 0) }}
taints:
Expand All @@ -186,7 +198,7 @@ spec:
kubeletExtraArgs:
cloud-provider: external
feature-gates: CronJobTimeZone=true
name: '{{ `{{ ds.meta_data.local_hostname }}` }}'
name: ${COREOS_EC2_HOSTNAME}
{{- if .Values.controlPlane.customNodeTaints }}
{{- if (gt (len .Values.controlPlane.customNodeTaints) 0) }}
taints:
Expand All @@ -198,14 +210,12 @@ spec:
{{- end }}
{{- end }}
preKubeadmCommands:
{{- include "prepare-varLibKubelet-Dir" . | nindent 4 }}
{{- include "diskPreKubeadmCommands" . | nindent 4 }}
{{- include "flatcarKubeadmPreCommands" . | nindent 4 }}
{{- include "sshPreKubeadmCommands" . | nindent 4 }}
{{- if .Values.connectivity.proxy.enabled }}{{- include "proxyCommand" $ | nindent 4 }}{{- end }}
postKubeadmCommands:
{{- include "irsaPostKubeadmCommands" . | nindent 4 }}
{{- include "kubeletConfigPostKubeadmCommands" . | nindent 4 }}
{{- include "awsNtpPostKubeadmCommands" . | nindent 4 }}
users:
{{- include "sshUsers" . | nindent 4 }}
replicas: 3
Expand Down
Loading