From 0698a4921ba29bd1088f89406dfc89744a47e175 Mon Sep 17 00:00:00 2001 From: Justin Garrison Date: Fri, 26 Jul 2024 11:10:52 -0700 Subject: [PATCH] docs: aws getting started re-write Updated with autoscaling group for workers, better copy/paste ability, and not using default VPC Signed-off-by: Justin Garrison --- .../install/cloud-platforms/aws.md | 444 ++++++++++++------ .../install/cloud-platforms/aws.md | 442 +++++++++++------ 2 files changed, 598 insertions(+), 288 deletions(-) diff --git a/website/content/v1.7/talos-guides/install/cloud-platforms/aws.md b/website/content/v1.7/talos-guides/install/cloud-platforms/aws.md index 9fdbeb3f0c..35b1daaa7e 100644 --- a/website/content/v1.7/talos-guides/install/cloud-platforms/aws.md +++ b/website/content/v1.7/talos-guides/install/cloud-platforms/aws.md @@ -7,52 +7,105 @@ aliases: ## Creating a Cluster via the AWS CLI -In this guide we will create an HA Kubernetes cluster with 3 worker nodes. -We assume an existing VPC, and some familiarity with AWS. +In this guide we will create an HA Kubernetes cluster with 3 control plane nodes across 3 availability zones. +You should have an existing AWS account and have the AWS CLI installed and configured. If you need more information on AWS specifics, please see the [official AWS documentation](https://docs.aws.amazon.com). -### Set the needed info +To install the dependencies for this tutorial you can use homebrew on macOS or Linux: -Change to your desired region: +```bash +brew install siderolabs/tap/talosctl kubectl jq curl xz +``` + +If you would like to create infrastructure via `terraform` or `opentofu` please see the exmaple in the [contrib repository](https://github.com/siderolabs/contrib/tree/main/examples/terraform/aws). + +> Note: this guide is not a production set up and steps were tested in `bash` and `zsh` shells. + +### Create AWS Resources + +We will be creating a control plane with 3 Ec2 instances spread across 3 availability zones. +It is recommended to not use the default VPC so we will create a new one for this tutorial. + +Change to your desired region and CIDR block and create a VPC: + +> Make sure your subnet does not overlap with `10.244.0.0/16` or `10.96.0.0/12` the [default pod and services subnets in Kubernetes]({{% relref "../../../introduction/troubleshooting.md#conflict-on-kubernetes-and-host-subnets" %}}). ```bash -REGION="us-west-2" -aws ec2 describe-vpcs --region $REGION +AWS_REGION="us-west-2" +IPV4_CIDR="10.1.0.0/18" +VPC_ID=$(aws ec2 create-vpc \ + --cidr-block $IPV4_CIDR \ + --output text --query 'Vpc.VpcId') +``` -VPC="(the VpcId from the above command)" +### Create the Subnets + +Create 3 smaller CIDRs to use for each subnet in different availability zones. +Make sure to adjust these CIDRs if you changed the default value from the last command. + +```bash +IPV4_CIDRS=( "10.1.0.0/22" "10.1.4.0/22" "10.1.8.0/22" ) ``` -### Create the Subnet +Next create a subnet in each availability zones. -Use a CIDR block that is present on the VPC specified above. +> Note: If you're using zsh you need to run `setopt KSH_ARRAYS` to have arrays referenced properly. ```bash -aws ec2 create-subnet \ - --region $REGION \ - --vpc-id $VPC \ - --cidr-block ${CIDR_BLOCK} +CIDR=0 +declare -a SUBNETS +AZS=($(aws ec2 describe-availability-zones \ + --query 'AvailabilityZones[].ZoneName' \ + --filter "Name=state,Values=available" \ + --output text | tr -s '\t' '\n' | head -n3)) + +for AZ in ${AZS[@]}; do + SUBNETS[$CIDR]=$(aws ec2 create-subnet \ + --vpc-id $VPC_ID \ + --availability-zone $AZ \ + --cidr-block ${IPV4_CIDRS[$CIDR]} \ + --query 'Subnet.SubnetId' \ + --output text) + aws ec2 modify-subnet-attribute \ + --subnet-id ${SUBNETS[$CIDR]} \ + --private-dns-hostname-type-on-launch resource-name + echo ${SUBNETS[$CIDR]} + ((CIDR++)) +done ``` -Note the subnet ID that was returned, and assign it to a variable for ease of later use: +Create an internet gateway and attach it to the VPC: ```bash -SUBNET="(the subnet ID of the created subnet)" +IGW_ID=$(aws ec2 create-internet-gateway \ + --query 'InternetGateway.InternetGatewayId' \ + --output text) + +aws ec2 attach-internet-gateway \ + --vpc-id $VPC_ID \ + --internet-gateway-id $IGW_ID + +ROUTE_TABLE_ID=$(aws ec2 describe-route-tables \ + --filters "Name=vpc-id,Values=$VPC_ID" \ + --query 'RouteTables[].RouteTableId' \ + --output text) + +aws ec2 create-route \ + --route-table-id $ROUTE_TABLE_ID \ + --destination-cidr-block 0.0.0.0/0 \ + --gateway-id $IGW_ID ``` ### Official AMI Images -Official AMI image ID can be found in the `cloud-images.json` file attached to the Talos release: +Official AMI image ID can be found in the `cloud-images.json` file attached to the [Talos release](https://github.com/siderolabs/talos/releases). ```bash -AMI=`curl -sL https://github.com/siderolabs/talos/releases/download/{{< release >}}/cloud-images.json | \ - jq -r '.[] | select(.region == "'$REGION'") | select (.arch == "amd64") | .id'` +AMI=$(curl -sL https://github.com/siderolabs/talos/releases/download/{{< release >}}/cloud-images.json | \ + jq -r '.[] | select(.region == "'$AWS_REGION'") | select (.arch == "amd64") | .id') echo $AMI - ``` -Replace `amd64` in the line above with the desired architecture. -Note the AMI id that is returned is assigned to an environment variable: it will be used later when booting instances. - If using the official AMIs, you can skip to [Creating the Security group]({{< relref "#create-a-security-group" >}}) ### Create your own AMIs @@ -64,7 +117,7 @@ If using the official AMIs, you can skip to [Creating the Security group]({{< re ```bash aws s3api create-bucket \ --bucket $BUCKET \ - --create-bucket-configuration LocationConstraint=$REGION \ + --create-bucket-configuration LocationConstraint=$AWS_REGION \ --acl private ``` @@ -86,18 +139,18 @@ Copy the RAW disk to S3 and import it as a snapshot: ```bash aws s3 cp disk.raw s3://$BUCKET/talos-aws-tutorial.raw -aws ec2 import-snapshot \ +$SNAPSHOT_ID=$(aws ec2 import-snapshot \ --region $REGION \ --description "Talos kubernetes tutorial" \ - --disk-container "Format=raw,UserBucket={S3Bucket=$BUCKET,S3Key=talos-aws-tutorial.raw}" + --disk-container "Format=raw,UserBucket={S3Bucket=$BUCKET,S3Key=talos-aws-tutorial.raw}" \ + --query 'SnapshotId' \ + --output text) ``` -Save the `SnapshotId`, as we will need it once the import is done. To check on the status of the import, run: ```bash aws ec2 describe-import-snapshot-tasks \ - --region $REGION \ --import-task-ids ``` @@ -106,170 +159,208 @@ Once the `SnapshotTaskDetail.Status` indicates `completed`, we can register the #### Register the Image ```bash -aws ec2 register-image \ - --region $REGION \ - --block-device-mappings "DeviceName=/dev/xvda,VirtualName=talos,Ebs={DeleteOnTermination=true,SnapshotId=$SNAPSHOT,VolumeSize=4,VolumeType=gp2}" \ +AMI=$(aws ec2 register-image \ + --block-device-mappings "DeviceName=/dev/xvda,VirtualName=talos,Ebs={DeleteOnTermination=true,SnapshotId=$SNAPSHOT_ID,VolumeSize=4,VolumeType=gp2}" \ --root-device-name /dev/xvda \ --virtualization-type hvm \ --architecture x86_64 \ --ena-support \ - --name talos-aws-tutorial-ami + --name talos-aws-tutorial-ami \ + --query 'ImageId' \ + --output text) ``` We now have an AMI we can use to create our cluster. -Save the AMI ID, as we will need it when we create EC2 instances. - -```bash -AMI="(AMI ID of the register image command)" -``` ### Create a Security Group ```bash -aws ec2 create-security-group \ - --region $REGION \ +SECURITY_GROUP_ID=$(aws ec2 create-security-group \ + --vpc-id $VPC_ID \ --group-name talos-aws-tutorial-sg \ - --description "Security Group for EC2 instances to allow ports required by Talos" - -SECURITY_GROUP="(security group id that is returned)" + --description "Security Group for EC2 instances to allow ports required by Talos" \ + --query 'GroupId' \ + --output text) ``` Using the security group from above, allow all internal traffic within the same security group: ```bash aws ec2 authorize-security-group-ingress \ - --region $REGION \ - --group-name talos-aws-tutorial-sg \ + --group-id $SECURITY_GROUP_ID \ --protocol all \ --port 0 \ - --source-group talos-aws-tutorial-sg + --source-group $SECURITY_GROUP_ID ``` -and expose the Talos and Kubernetes APIs: +Expose the Talos (50000) and Kubernetes API. -```bash -aws ec2 authorize-security-group-ingress \ - --region $REGION \ - --group-name talos-aws-tutorial-sg \ - --protocol tcp \ - --port 6443 \ - --cidr 0.0.0.0/0 +> Note: This is only required for the control plane nodes. +> For a production environment you would want separate private subnets for worker nodes. +```bash aws ec2 authorize-security-group-ingress \ - --region $REGION \ - --group-name talos-aws-tutorial-sg \ - --protocol tcp \ - --port 50000-50001 \ - --cidr 0.0.0.0/0 + --group-id $SECURITY_GROUP_ID \ + --ip-permissions \ + IpProtocol=tcp,FromPort=50000,ToPort=50000,IpRanges="[{CidrIp=0.0.0.0/0}]" \ + IpProtocol=tcp,FromPort=6443,ToPort=6443,IpRanges="[{CidrIp=0.0.0.0/0}]" \ + --query 'SecurityGroupRules[].SecurityGroupRuleId' \ + --output text ``` -If you are using KubeSpan and will be adding workers outside of AWS, you need to allow inbound UDP for the Wireguard port: +We will bootstrap Talos with a MachineConfig via user-data it will never be exposed to the internet without certificate authentication. + +We enable KubeSpan in this tutorial so you need to allow inbound UDP for the Wireguard port: ```bash aws ec2 authorize-security-group-ingress \ - --region $REGION \ - --group-name talos-aws-tutorial-sg \ - --protocol udp --port 51820 --cidr 0.0.0.0/0 + --group-id $SECURITY_GROUP_ID \ + --ip-permissions \ + IpProtocol=tcp,FromPort=51820,ToPort=51820,IpRanges="[{CidrIp=0.0.0.0/0}]" \ + --query 'SecurityGroupRules[].SecurityGroupRuleId' \ + --output text ``` ### Create a Load Balancer +The load balancer is used for a stable Kubernetes API endpoint. + ```bash -aws elbv2 create-load-balancer \ - --region $REGION \ +LOAD_BALANCER_ARN=$(aws elbv2 create-load-balancer \ --name talos-aws-tutorial-lb \ - --type network --subnets $SUBNET + --subnets $(echo ${SUBNETS[@]}) \ + --type network \ + --ip-address-type ipv4 \ + --query 'LoadBalancers[].LoadBalancerArn' \ + --output text) + +LOAD_BALANCER_DNS=$(aws elbv2 describe-load-balancers \ + --load-balancer-arns $LOAD_BALANCER_ARN \ + --query 'LoadBalancers[].DNSName' \ + --output text) ``` -Take note of the DNS name and ARN. -We will need these soon. - -```bash -LOAD_BALANCER_ARN="(arn of the load balancer)" -``` +Now create a target group for the load balancer: ```bash -aws elbv2 create-target-group \ - --region $REGION \ +TARGET_GROUP_ARN=$(aws elbv2 create-target-group \ --name talos-aws-tutorial-tg \ --protocol TCP \ --port 6443 \ - --target-type ip \ - --vpc-id $VPC -``` - -Also note the `TargetGroupArn` that is returned. + --target-type instance \ + --vpc-id $VPC_ID \ + --query 'TargetGroups[].TargetGroupArn' \ + --output text) -```bash -TARGET_GROUP_ARN="(target group arn)" +LISTENER_ARN=$(aws elbv2 create-listener \ + --load-balancer-arn $LOAD_BALANCER_ARN \ + --protocol TCP \ + --port 6443 \ + --default-actions Type=forward,TargetGroupArn=$TARGET_GROUP_ARN \ + --query 'Listeners[].ListenerArn' \ + --output text) ``` ### Create the Machine Configuration Files -Using the DNS name of the loadbalancer created earlier, generate the base configuration files for the Talos machines. -> Note that the `port` used here is the externally accessible port configured on the load balancer - 443 - not the internal port of 6443: +We will create a [machine config patch]({{% relref "../../../talos-guides/configuration/patching.md#rfc6902-json-patches" %}}) to use the AWS time servers. +You can create [additional patches]({{% relref "../../../reference/configuration/v1alpha1/config.md" %}}) to customize the configuration as needed. ```bash -$ talosctl gen config talos-k8s-aws-tutorial https://: --with-examples=false --with-docs=false -created controlplane.yaml -created worker.yaml -created talosconfig +cat < time-server-patch.yaml +machine: + time: + servers: + - 169.254.169.123 +EOF ``` -> Note that the generated configs are too long for AWS userdata field if the `--with-examples` and `--with-docs` flags are not passed. - -At this point, you can modify the generated configs to your liking. - -Optionally, you can specify `--config-patch` with RFC6902 jsonpatch which will be applied during the config generation. - -#### Validate the Configuration Files +Using the DNS name of the loadbalancer created earlier, generate the base configuration files for the Talos machines. ```bash -$ talosctl validate --config controlplane.yaml --mode cloud -controlplane.yaml is valid for cloud mode -$ talosctl validate --config worker.yaml --mode cloud -worker.yaml is valid for cloud mode +talosctl gen config talos-k8s-aws-tutorial https://${LOAD_BALANCER_DNS}:6443 \ + --with-examples=false \ + --with-docs=false \ + --with-kubespan \ + --install-disk /dev/xvda \ + --config-patch '@time-server-patch.yaml' ``` +> Note that the generated configs are too long for AWS userdata field if the `--with-examples` and `--with-docs` flags are not passed. + ### Create the EC2 Instances -> change the instance type if desired. > Note: There is a known issue that prevents Talos from running on T2 instance types. > Please use T3 if you need burstable instance types. #### Create the Control Plane Nodes ```bash -CP_COUNT=1 -while [[ "$CP_COUNT" -lt 4 ]]; do - aws ec2 run-instances \ - --region $REGION \ - --image-id $AMI \ - --count 1 \ - --instance-type t3.small \ - --user-data file://controlplane.yaml \ - --subnet-id $SUBNET \ - --security-group-ids $SECURITY_GROUP \ - --associate-public-ip-address \ - --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=talos-aws-tutorial-cp-$CP_COUNT}]" - ((CP_COUNT++)) +declare -a CP_INSTANCES +INSTANCE_INDEX=0 +for SUBNET in ${SUBNETS[@]}; do + CP_INSTANCES[${INSTANCE_INDEX}]=$(aws ec2 run-instances \ + --image-id $AMI \ + --subnet-id $SUBNET \ + --instance-type t3.small \ + --user-data file://controlplane.yaml \ + --associate-public-ip-address \ + --security-group-ids $SECURITY_GROUP_ID \ + --count 1 \ + --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=talos-aws-tutorial-cp-$INSTANCE_INDEX}]" \ + --query 'Instances[].InstanceId' \ + --output text) + echo ${CP_INSTANCES[${INSTANCE_INDEX}]} + ((INSTANCE_INDEX++)) done ``` -> Make a note of the resulting `PrivateIpAddress` from the controlplane nodes for later use. - #### Create the Worker Nodes +For the worker nodes we will create a new launch template with the `worker.yaml` machine configuration and create an autoscaling group. + ```bash -aws ec2 run-instances \ - --region $REGION \ - --image-id $AMI \ - --count 3 \ - --instance-type t3.small \ - --user-data file://worker.yaml \ - --subnet-id $SUBNET \ - --security-group-ids $SECURITY_GROUP - --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=talos-aws-tutorial-worker}]" +WORKER_LAUNCH_TEMPLATE_ID=$(aws ec2 create-launch-template \ + --launch-template-name talos-aws-tutorial-worker \ + --launch-template-data '{ + "ImageId":"'$AMI'", + "InstanceType":"t3.small", + "UserData":"'$(base64 -w0 worker.yaml)'", + "NetworkInterfaces":[{ + "DeviceIndex":0, + "AssociatePublicIpAddress":true, + "Groups":["'$SECURITY_GROUP_ID'"], + "DeleteOnTermination":true + }], + "BlockDeviceMappings":[{ + "DeviceName":"/dev/xvda", + "Ebs":{ + "VolumeSize":20, + "VolumeType":"gp3", + "DeleteOnTermination":true + } + }], + "TagSpecifications":[{ + "ResourceType":"instance", + "Tags":[{ + "Key":"Name", + "Value":"talos-aws-tutorial-worker" + }] + }]}' \ + --query 'LaunchTemplate.LaunchTemplateId' \ + --output text) +``` + +```bash +aws autoscaling create-auto-scaling-group \ + --auto-scaling-group-name talos-aws-tutorial-worker \ + --min-size 1 \ + --max-size 3 \ + --desired-capacity 1 \ + --availability-zones $(echo ${AZS[@]}) \ + --target-group-arns $TARGET_GROUP_ARN \ + --launch-template "LaunchTemplateId=${WORKER_LAUNCH_TEMPLATE_ID}" \ + --vpc-zone-identifier $(echo ${SUBNETS[@]} | tr ' ' ',') ``` ### Configure the Load Balancer @@ -277,57 +368,120 @@ aws ec2 run-instances \ Now, using the load balancer target group's ARN, and the **PrivateIpAddress** from the controlplane instances that you created : ```bash -aws elbv2 register-targets \ - --region $REGION \ +for INSTANCE in ${CP_INSTANCES[@]}; do + aws elbv2 register-targets \ --target-group-arn $TARGET_GROUP_ARN \ - --targets Id=$CP_NODE_1_IP Id=$CP_NODE_2_IP Id=$CP_NODE_3_IP + --targets Id=$(aws ec2 describe-instances \ + --instance-ids $INSTANCE \ + --query 'Reservations[].Instances[].InstanceId' \ + --output text) +done ``` -Using the ARNs of the load balancer and target group from previous steps, create the listener: +### Bootstrap `etcd` + +Export the `talosconfig` file so commands sent to Talos will be authenticated. ```bash -aws elbv2 create-listener \ - --region $REGION \ - --load-balancer-arn $LOAD_BALANCER_ARN \ - --protocol TCP \ - --port 443 \ - --default-actions Type=forward,TargetGroupArn=$TARGET_GROUP_ARN -``` +export TALOSCONFIG=$(pwd)/talosconfig -### Bootstrap Etcd +WORKER_INSTANCES=( $(aws autoscaling \ + describe-auto-scaling-instances \ + --query 'AutoScalingInstances[?AutoScalingGroupName==`talos-aws-tutorial-worker`].InstanceId' \ + --output text) ) +``` Set the `endpoints` (the control plane node to which `talosctl` commands are sent) and `nodes` (the nodes that the command operates on): ```bash -talosctl --talosconfig talosconfig config endpoint -talosctl --talosconfig talosconfig config node +talosctl config endpoints $(aws ec2 describe-instances \ + --instance-ids ${CP_INSTANCES[*]} \ + --query 'Reservations[].Instances[].PublicIpAddress' \ + --output text) + +talosctl config nodes $(aws ec2 describe-instances \ + --instance-ids $(echo ${CP_INSTANCES[1]}) \ + --query 'Reservations[].Instances[].PublicIpAddress' \ + --output text) ``` Bootstrap `etcd`: ```bash -talosctl --talosconfig talosconfig bootstrap +talosctl bootstrap ``` -### Retrieve the `kubeconfig` - -At this point we can retrieve the admin `kubeconfig` by running: +You can now watch as your cluster bootstraps, by using ```bash -talosctl --talosconfig talosconfig kubeconfig . +talosctl health ``` -The different control plane nodes should sendi/receive traffic via the load balancer, notice that one of the control plane has intiated the etcd cluster, and the others should join. -You can now watch as your cluster bootstraps, by using +This command will take a few minutes for the nodes to start etcd, reach quarom and start the Kubernetes control plane. + +You can also watch the performance of a node, via: ```bash -talosctl --talosconfig talosconfig health +talosctl dashboard ``` -You can also watch the performance of a node, via: +### Retrieve the `kubeconfig` + +When the cluster is healthy you can retrieve the admin `kubeconfig` by running: ```bash -talosctl --talosconfig talosconfig dashboard +talosctl kubeconfig . +export KUBECONFIG=$(pwd)/kubeconfig ``` And use standard `kubectl` commands. + +```bash +kubectl get nodes +``` + +## Cleanup resources + +If you would like to delete all of the resources you created during this tutorial you can run the following commands. + +```bash +aws elbv2 delete-listener --listener-arn $LISTENER_ARN +aws elbv2 delete-target-group --target-group-arn $TARGET_GROUP_ARN +aws elbv2 delete-load-balancer --load-balancer-arn $LOAD_BALANCER_ARN + +aws autoscaling update-auto-scaling-group \ + --auto-scaling-group-name talos-aws-tutorial-worker \ + --min-size 0 \ + --max-size 0 \ + --desired-capacity 0 + +aws ec2 terminate-instances --instance-ids ${CP_INSTANCES[@]} ${WORKER_INSTANCES[@]} \ + --query 'TerminatingInstances[].InstanceId' \ + --output text + +aws autoscaling delete-auto-scaling-group \ + --auto-scaling-group-name talos-aws-tutorial-worker \ + --force-delete + +aws ec2 delete-launch-template --launch-template-id $WORKER_LAUNCH_TEMPLATE_ID + +while $(aws ec2 describe-instances \ + --instance-ids ${CP_INSTANCES[@]} ${WORKER_INSTANCES[@]} \ + --query 'Reservations[].Instances[].[InstanceId,State.Name]' \ + --output text | grep -q shutting-down); do \ + echo "waiting for instances to terminate"; sleep 5s +done + +aws ec2 detach-internet-gateway --vpc-id $VPC_ID --internet-gateway-id $IGW_ID +aws ec2 delete-internet-gateway --internet-gateway-id $IGW_ID + +aws ec2 delete-security-group --group-id $SECURITY_GROUP_ID + +for SUBNET in ${SUBNETS[@]}; do + aws ec2 delete-subnet --subnet-id $SUBNET +done + +aws ec2 delete-vpc --vpc-id $VPC_ID + +rm -f controlplane.yaml worker.yaml talosconfig kubeconfig time-server-patch.yaml disk.raw +``` diff --git a/website/content/v1.8/talos-guides/install/cloud-platforms/aws.md b/website/content/v1.8/talos-guides/install/cloud-platforms/aws.md index b33f23a088..35b1daaa7e 100644 --- a/website/content/v1.8/talos-guides/install/cloud-platforms/aws.md +++ b/website/content/v1.8/talos-guides/install/cloud-platforms/aws.md @@ -7,52 +7,105 @@ aliases: ## Creating a Cluster via the AWS CLI -In this guide we will create an HA Kubernetes cluster with 3 worker nodes. -We assume an existing VPC, and some familiarity with AWS. +In this guide we will create an HA Kubernetes cluster with 3 control plane nodes across 3 availability zones. +You should have an existing AWS account and have the AWS CLI installed and configured. If you need more information on AWS specifics, please see the [official AWS documentation](https://docs.aws.amazon.com). -### Set the needed info +To install the dependencies for this tutorial you can use homebrew on macOS or Linux: -Change to your desired region: +```bash +brew install siderolabs/tap/talosctl kubectl jq curl xz +``` + +If you would like to create infrastructure via `terraform` or `opentofu` please see the exmaple in the [contrib repository](https://github.com/siderolabs/contrib/tree/main/examples/terraform/aws). + +> Note: this guide is not a production set up and steps were tested in `bash` and `zsh` shells. + +### Create AWS Resources + +We will be creating a control plane with 3 Ec2 instances spread across 3 availability zones. +It is recommended to not use the default VPC so we will create a new one for this tutorial. + +Change to your desired region and CIDR block and create a VPC: + +> Make sure your subnet does not overlap with `10.244.0.0/16` or `10.96.0.0/12` the [default pod and services subnets in Kubernetes]({{% relref "../../../introduction/troubleshooting.md#conflict-on-kubernetes-and-host-subnets" %}}). ```bash -REGION="us-west-2" -aws ec2 describe-vpcs --region $REGION +AWS_REGION="us-west-2" +IPV4_CIDR="10.1.0.0/18" +VPC_ID=$(aws ec2 create-vpc \ + --cidr-block $IPV4_CIDR \ + --output text --query 'Vpc.VpcId') +``` -VPC="(the VpcId from the above command)" +### Create the Subnets + +Create 3 smaller CIDRs to use for each subnet in different availability zones. +Make sure to adjust these CIDRs if you changed the default value from the last command. + +```bash +IPV4_CIDRS=( "10.1.0.0/22" "10.1.4.0/22" "10.1.8.0/22" ) ``` -### Create the Subnet +Next create a subnet in each availability zones. -Use a CIDR block that is present on the VPC specified above. +> Note: If you're using zsh you need to run `setopt KSH_ARRAYS` to have arrays referenced properly. ```bash -aws ec2 create-subnet \ - --region $REGION \ - --vpc-id $VPC \ - --cidr-block ${CIDR_BLOCK} +CIDR=0 +declare -a SUBNETS +AZS=($(aws ec2 describe-availability-zones \ + --query 'AvailabilityZones[].ZoneName' \ + --filter "Name=state,Values=available" \ + --output text | tr -s '\t' '\n' | head -n3)) + +for AZ in ${AZS[@]}; do + SUBNETS[$CIDR]=$(aws ec2 create-subnet \ + --vpc-id $VPC_ID \ + --availability-zone $AZ \ + --cidr-block ${IPV4_CIDRS[$CIDR]} \ + --query 'Subnet.SubnetId' \ + --output text) + aws ec2 modify-subnet-attribute \ + --subnet-id ${SUBNETS[$CIDR]} \ + --private-dns-hostname-type-on-launch resource-name + echo ${SUBNETS[$CIDR]} + ((CIDR++)) +done ``` -Note the subnet ID that was returned, and assign it to a variable for ease of later use: +Create an internet gateway and attach it to the VPC: ```bash -SUBNET="(the subnet ID of the created subnet)" +IGW_ID=$(aws ec2 create-internet-gateway \ + --query 'InternetGateway.InternetGatewayId' \ + --output text) + +aws ec2 attach-internet-gateway \ + --vpc-id $VPC_ID \ + --internet-gateway-id $IGW_ID + +ROUTE_TABLE_ID=$(aws ec2 describe-route-tables \ + --filters "Name=vpc-id,Values=$VPC_ID" \ + --query 'RouteTables[].RouteTableId' \ + --output text) + +aws ec2 create-route \ + --route-table-id $ROUTE_TABLE_ID \ + --destination-cidr-block 0.0.0.0/0 \ + --gateway-id $IGW_ID ``` ### Official AMI Images -Official AMI image ID can be found in the `cloud-images.json` file attached to the Talos release: +Official AMI image ID can be found in the `cloud-images.json` file attached to the [Talos release](https://github.com/siderolabs/talos/releases). ```bash -AMI=`curl -sL https://github.com/siderolabs/talos/releases/download/{{< release >}}/cloud-images.json | \ - jq -r '.[] | select(.region == "'$REGION'") | select (.arch == "amd64") | .id'` +AMI=$(curl -sL https://github.com/siderolabs/talos/releases/download/{{< release >}}/cloud-images.json | \ + jq -r '.[] | select(.region == "'$AWS_REGION'") | select (.arch == "amd64") | .id') echo $AMI - ``` -Replace `amd64` in the line above with the desired architecture. -Note the AMI id that is returned is assigned to an environment variable: it will be used later when booting instances. - If using the official AMIs, you can skip to [Creating the Security group]({{< relref "#create-a-security-group" >}}) ### Create your own AMIs @@ -64,7 +117,7 @@ If using the official AMIs, you can skip to [Creating the Security group]({{< re ```bash aws s3api create-bucket \ --bucket $BUCKET \ - --create-bucket-configuration LocationConstraint=$REGION \ + --create-bucket-configuration LocationConstraint=$AWS_REGION \ --acl private ``` @@ -86,18 +139,18 @@ Copy the RAW disk to S3 and import it as a snapshot: ```bash aws s3 cp disk.raw s3://$BUCKET/talos-aws-tutorial.raw -aws ec2 import-snapshot \ +$SNAPSHOT_ID=$(aws ec2 import-snapshot \ --region $REGION \ --description "Talos kubernetes tutorial" \ - --disk-container "Format=raw,UserBucket={S3Bucket=$BUCKET,S3Key=talos-aws-tutorial.raw}" + --disk-container "Format=raw,UserBucket={S3Bucket=$BUCKET,S3Key=talos-aws-tutorial.raw}" \ + --query 'SnapshotId' \ + --output text) ``` -Save the `SnapshotId`, as we will need it once the import is done. To check on the status of the import, run: ```bash aws ec2 describe-import-snapshot-tasks \ - --region $REGION \ --import-task-ids ``` @@ -106,168 +159,208 @@ Once the `SnapshotTaskDetail.Status` indicates `completed`, we can register the #### Register the Image ```bash -aws ec2 register-image \ - --region $REGION \ - --block-device-mappings "DeviceName=/dev/xvda,VirtualName=talos,Ebs={DeleteOnTermination=true,SnapshotId=$SNAPSHOT,VolumeSize=4,VolumeType=gp2}" \ +AMI=$(aws ec2 register-image \ + --block-device-mappings "DeviceName=/dev/xvda,VirtualName=talos,Ebs={DeleteOnTermination=true,SnapshotId=$SNAPSHOT_ID,VolumeSize=4,VolumeType=gp2}" \ --root-device-name /dev/xvda \ --virtualization-type hvm \ --architecture x86_64 \ --ena-support \ - --name talos-aws-tutorial-ami + --name talos-aws-tutorial-ami \ + --query 'ImageId' \ + --output text) ``` We now have an AMI we can use to create our cluster. -Save the AMI ID, as we will need it when we create EC2 instances. - -```bash -AMI="(AMI ID of the register image command)" -``` ### Create a Security Group ```bash -aws ec2 create-security-group \ - --region $REGION \ +SECURITY_GROUP_ID=$(aws ec2 create-security-group \ + --vpc-id $VPC_ID \ --group-name talos-aws-tutorial-sg \ - --description "Security Group for EC2 instances to allow ports required by Talos" - -SECURITY_GROUP="(security group id that is returned)" + --description "Security Group for EC2 instances to allow ports required by Talos" \ + --query 'GroupId' \ + --output text) ``` Using the security group from above, allow all internal traffic within the same security group: ```bash aws ec2 authorize-security-group-ingress \ - --region $REGION \ - --group-name talos-aws-tutorial-sg \ + --group-id $SECURITY_GROUP_ID \ --protocol all \ --port 0 \ - --source-group talos-aws-tutorial-sg + --source-group $SECURITY_GROUP_ID ``` -and expose the Talos and Kubernetes APIs: +Expose the Talos (50000) and Kubernetes API. -```bash -aws ec2 authorize-security-group-ingress \ - --region $REGION \ - --group-name talos-aws-tutorial-sg \ - --protocol tcp \ - --port 6443 \ - --cidr 0.0.0.0/0 +> Note: This is only required for the control plane nodes. +> For a production environment you would want separate private subnets for worker nodes. +```bash aws ec2 authorize-security-group-ingress \ - --region $REGION \ - --group-name talos-aws-tutorial-sg \ - --protocol tcp \ - --port 50000-50001 \ - --cidr 0.0.0.0/0 + --group-id $SECURITY_GROUP_ID \ + --ip-permissions \ + IpProtocol=tcp,FromPort=50000,ToPort=50000,IpRanges="[{CidrIp=0.0.0.0/0}]" \ + IpProtocol=tcp,FromPort=6443,ToPort=6443,IpRanges="[{CidrIp=0.0.0.0/0}]" \ + --query 'SecurityGroupRules[].SecurityGroupRuleId' \ + --output text ``` -If you are using KubeSpan and will be adding workers outside of AWS, you need to allow inbound UDP for the Wireguard port: +We will bootstrap Talos with a MachineConfig via user-data it will never be exposed to the internet without certificate authentication. + +We enable KubeSpan in this tutorial so you need to allow inbound UDP for the Wireguard port: ```bash aws ec2 authorize-security-group-ingress \ - --region $REGION \ - --group-name talos-aws-tutorial-sg \ - --protocol udp --port 51820 --cidr 0.0.0.0/0 + --group-id $SECURITY_GROUP_ID \ + --ip-permissions \ + IpProtocol=tcp,FromPort=51820,ToPort=51820,IpRanges="[{CidrIp=0.0.0.0/0}]" \ + --query 'SecurityGroupRules[].SecurityGroupRuleId' \ + --output text ``` ### Create a Load Balancer +The load balancer is used for a stable Kubernetes API endpoint. + ```bash -aws elbv2 create-load-balancer \ - --region $REGION \ +LOAD_BALANCER_ARN=$(aws elbv2 create-load-balancer \ --name talos-aws-tutorial-lb \ - --type network --subnets $SUBNET + --subnets $(echo ${SUBNETS[@]}) \ + --type network \ + --ip-address-type ipv4 \ + --query 'LoadBalancers[].LoadBalancerArn' \ + --output text) + +LOAD_BALANCER_DNS=$(aws elbv2 describe-load-balancers \ + --load-balancer-arns $LOAD_BALANCER_ARN \ + --query 'LoadBalancers[].DNSName' \ + --output text) ``` -Take note of the DNS name and ARN. -We will need these soon. - -```bash -LOAD_BALANCER_ARN="(arn of the load balancer)" -``` +Now create a target group for the load balancer: ```bash -aws elbv2 create-target-group \ - --region $REGION \ +TARGET_GROUP_ARN=$(aws elbv2 create-target-group \ --name talos-aws-tutorial-tg \ --protocol TCP \ --port 6443 \ - --target-type ip \ - --vpc-id $VPC -``` - -Also note the `TargetGroupArn` that is returned. + --target-type instance \ + --vpc-id $VPC_ID \ + --query 'TargetGroups[].TargetGroupArn' \ + --output text) -```bash -TARGET_GROUP_ARN="(target group arn)" +LISTENER_ARN=$(aws elbv2 create-listener \ + --load-balancer-arn $LOAD_BALANCER_ARN \ + --protocol TCP \ + --port 6443 \ + --default-actions Type=forward,TargetGroupArn=$TARGET_GROUP_ARN \ + --query 'Listeners[].ListenerArn' \ + --output text) ``` ### Create the Machine Configuration Files -Using the DNS name of the loadbalancer created earlier, generate the base configuration files for the Talos machines. -> Note that the `port` used here is the externally accessible port configured on the load balancer - 443 - not the internal port of 6443: +We will create a [machine config patch]({{% relref "../../../talos-guides/configuration/patching.md#rfc6902-json-patches" %}}) to use the AWS time servers. +You can create [additional patches]({{% relref "../../../reference/configuration/v1alpha1/config.md" %}}) to customize the configuration as needed. ```bash -$ talosctl gen config talos-k8s-aws-tutorial https://: --with-examples=false --with-docs=false -created controlplane.yaml -created worker.yaml -created talosconfig +cat < time-server-patch.yaml +machine: + time: + servers: + - 169.254.169.123 +EOF ``` -> Note that the generated configs are too long for AWS userdata field if the `--with-examples` and `--with-docs` flags are not passed. - -At this point, you can modify the generated configs to your liking. - -Optionally, you can specify `--config-patch` with RFC6902 jsonpatch which will be applied during the config generation. - -#### Validate the Configuration Files +Using the DNS name of the loadbalancer created earlier, generate the base configuration files for the Talos machines. ```bash -$ talosctl validate --config controlplane.yaml --mode cloud -controlplane.yaml is valid for cloud mode -$ talosctl validate --config worker.yaml --mode cloud -worker.yaml is valid for cloud mode +talosctl gen config talos-k8s-aws-tutorial https://${LOAD_BALANCER_DNS}:6443 \ + --with-examples=false \ + --with-docs=false \ + --with-kubespan \ + --install-disk /dev/xvda \ + --config-patch '@time-server-patch.yaml' ``` +> Note that the generated configs are too long for AWS userdata field if the `--with-examples` and `--with-docs` flags are not passed. + ### Create the EC2 Instances -> change the instance type if desired. > Note: There is a known issue that prevents Talos from running on T2 instance types. > Please use T3 if you need burstable instance types. #### Create the Control Plane Nodes ```bash -for CP_COUNT in {1..3}; do - aws ec2 run-instances \ - --region $REGION \ - --image-id $AMI \ - --count 1 \ - --instance-type t3.small \ - --user-data file://controlplane.yaml \ - --subnet-id $SUBNET \ - --security-group-ids $SECURITY_GROUP \ - --associate-public-ip-address \ - --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=talos-aws-tutorial-cp-$CP_COUNT}]" +declare -a CP_INSTANCES +INSTANCE_INDEX=0 +for SUBNET in ${SUBNETS[@]}; do + CP_INSTANCES[${INSTANCE_INDEX}]=$(aws ec2 run-instances \ + --image-id $AMI \ + --subnet-id $SUBNET \ + --instance-type t3.small \ + --user-data file://controlplane.yaml \ + --associate-public-ip-address \ + --security-group-ids $SECURITY_GROUP_ID \ + --count 1 \ + --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=talos-aws-tutorial-cp-$INSTANCE_INDEX}]" \ + --query 'Instances[].InstanceId' \ + --output text) + echo ${CP_INSTANCES[${INSTANCE_INDEX}]} + ((INSTANCE_INDEX++)) done ``` -> Make a note of the resulting `PrivateIpAddress` from the controlplane nodes for later use. - #### Create the Worker Nodes +For the worker nodes we will create a new launch template with the `worker.yaml` machine configuration and create an autoscaling group. + ```bash -aws ec2 run-instances \ - --region $REGION \ - --image-id $AMI \ - --count 3 \ - --instance-type t3.small \ - --user-data file://worker.yaml \ - --subnet-id $SUBNET \ - --security-group-ids $SECURITY_GROUP - --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=talos-aws-tutorial-worker}]" +WORKER_LAUNCH_TEMPLATE_ID=$(aws ec2 create-launch-template \ + --launch-template-name talos-aws-tutorial-worker \ + --launch-template-data '{ + "ImageId":"'$AMI'", + "InstanceType":"t3.small", + "UserData":"'$(base64 -w0 worker.yaml)'", + "NetworkInterfaces":[{ + "DeviceIndex":0, + "AssociatePublicIpAddress":true, + "Groups":["'$SECURITY_GROUP_ID'"], + "DeleteOnTermination":true + }], + "BlockDeviceMappings":[{ + "DeviceName":"/dev/xvda", + "Ebs":{ + "VolumeSize":20, + "VolumeType":"gp3", + "DeleteOnTermination":true + } + }], + "TagSpecifications":[{ + "ResourceType":"instance", + "Tags":[{ + "Key":"Name", + "Value":"talos-aws-tutorial-worker" + }] + }]}' \ + --query 'LaunchTemplate.LaunchTemplateId' \ + --output text) +``` + +```bash +aws autoscaling create-auto-scaling-group \ + --auto-scaling-group-name talos-aws-tutorial-worker \ + --min-size 1 \ + --max-size 3 \ + --desired-capacity 1 \ + --availability-zones $(echo ${AZS[@]}) \ + --target-group-arns $TARGET_GROUP_ARN \ + --launch-template "LaunchTemplateId=${WORKER_LAUNCH_TEMPLATE_ID}" \ + --vpc-zone-identifier $(echo ${SUBNETS[@]} | tr ' ' ',') ``` ### Configure the Load Balancer @@ -275,57 +368,120 @@ aws ec2 run-instances \ Now, using the load balancer target group's ARN, and the **PrivateIpAddress** from the controlplane instances that you created : ```bash -aws elbv2 register-targets \ - --region $REGION \ +for INSTANCE in ${CP_INSTANCES[@]}; do + aws elbv2 register-targets \ --target-group-arn $TARGET_GROUP_ARN \ - --targets Id=$CP_NODE_1_IP Id=$CP_NODE_2_IP Id=$CP_NODE_3_IP + --targets Id=$(aws ec2 describe-instances \ + --instance-ids $INSTANCE \ + --query 'Reservations[].Instances[].InstanceId' \ + --output text) +done ``` -Using the ARNs of the load balancer and target group from previous steps, create the listener: +### Bootstrap `etcd` + +Export the `talosconfig` file so commands sent to Talos will be authenticated. ```bash -aws elbv2 create-listener \ - --region $REGION \ - --load-balancer-arn $LOAD_BALANCER_ARN \ - --protocol TCP \ - --port 443 \ - --default-actions Type=forward,TargetGroupArn=$TARGET_GROUP_ARN -``` +export TALOSCONFIG=$(pwd)/talosconfig -### Bootstrap Etcd +WORKER_INSTANCES=( $(aws autoscaling \ + describe-auto-scaling-instances \ + --query 'AutoScalingInstances[?AutoScalingGroupName==`talos-aws-tutorial-worker`].InstanceId' \ + --output text) ) +``` Set the `endpoints` (the control plane node to which `talosctl` commands are sent) and `nodes` (the nodes that the command operates on): ```bash -talosctl --talosconfig talosconfig config endpoint -talosctl --talosconfig talosconfig config node +talosctl config endpoints $(aws ec2 describe-instances \ + --instance-ids ${CP_INSTANCES[*]} \ + --query 'Reservations[].Instances[].PublicIpAddress' \ + --output text) + +talosctl config nodes $(aws ec2 describe-instances \ + --instance-ids $(echo ${CP_INSTANCES[1]}) \ + --query 'Reservations[].Instances[].PublicIpAddress' \ + --output text) ``` Bootstrap `etcd`: ```bash -talosctl --talosconfig talosconfig bootstrap +talosctl bootstrap ``` -### Retrieve the `kubeconfig` - -At this point we can retrieve the admin `kubeconfig` by running: +You can now watch as your cluster bootstraps, by using ```bash -talosctl --talosconfig talosconfig kubeconfig . +talosctl health ``` -The different control plane nodes should sendi/receive traffic via the load balancer, notice that one of the control plane has intiated the etcd cluster, and the others should join. -You can now watch as your cluster bootstraps, by using +This command will take a few minutes for the nodes to start etcd, reach quarom and start the Kubernetes control plane. + +You can also watch the performance of a node, via: ```bash -talosctl --talosconfig talosconfig health +talosctl dashboard ``` -You can also watch the performance of a node, via: +### Retrieve the `kubeconfig` + +When the cluster is healthy you can retrieve the admin `kubeconfig` by running: ```bash -talosctl --talosconfig talosconfig dashboard +talosctl kubeconfig . +export KUBECONFIG=$(pwd)/kubeconfig ``` And use standard `kubectl` commands. + +```bash +kubectl get nodes +``` + +## Cleanup resources + +If you would like to delete all of the resources you created during this tutorial you can run the following commands. + +```bash +aws elbv2 delete-listener --listener-arn $LISTENER_ARN +aws elbv2 delete-target-group --target-group-arn $TARGET_GROUP_ARN +aws elbv2 delete-load-balancer --load-balancer-arn $LOAD_BALANCER_ARN + +aws autoscaling update-auto-scaling-group \ + --auto-scaling-group-name talos-aws-tutorial-worker \ + --min-size 0 \ + --max-size 0 \ + --desired-capacity 0 + +aws ec2 terminate-instances --instance-ids ${CP_INSTANCES[@]} ${WORKER_INSTANCES[@]} \ + --query 'TerminatingInstances[].InstanceId' \ + --output text + +aws autoscaling delete-auto-scaling-group \ + --auto-scaling-group-name talos-aws-tutorial-worker \ + --force-delete + +aws ec2 delete-launch-template --launch-template-id $WORKER_LAUNCH_TEMPLATE_ID + +while $(aws ec2 describe-instances \ + --instance-ids ${CP_INSTANCES[@]} ${WORKER_INSTANCES[@]} \ + --query 'Reservations[].Instances[].[InstanceId,State.Name]' \ + --output text | grep -q shutting-down); do \ + echo "waiting for instances to terminate"; sleep 5s +done + +aws ec2 detach-internet-gateway --vpc-id $VPC_ID --internet-gateway-id $IGW_ID +aws ec2 delete-internet-gateway --internet-gateway-id $IGW_ID + +aws ec2 delete-security-group --group-id $SECURITY_GROUP_ID + +for SUBNET in ${SUBNETS[@]}; do + aws ec2 delete-subnet --subnet-id $SUBNET +done + +aws ec2 delete-vpc --vpc-id $VPC_ID + +rm -f controlplane.yaml worker.yaml talosconfig kubeconfig time-server-patch.yaml disk.raw +```