diff --git a/deploy/aliyun/README-CN.md b/deploy/aliyun/README-CN.md deleted file mode 100644 index cb1b298dc1..0000000000 --- a/deploy/aliyun/README-CN.md +++ /dev/null @@ -1,161 +0,0 @@ -# 在阿里云上部署 TiDB Operator 和 TiDB 集群 - -## 环境需求 - -- [aliyun-cli](https://github.com/aliyun/aliyun-cli) >= 3.0.15 并且[配置 aliyun-cli](https://www.alibabacloud.com/help/doc-detail/90766.htm?spm=a2c63.l28256.a3.4.7b52a893EFVglq) -> **注意:** Access Key 需要具有操作相应资源的权限 -- [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl) >= 1.12 -- [helm](https://github.com/helm/helm/blob/master/docs/install.md#installing-the-helm-client) >= 2.9.1 且 <= 2.11.0 -- [jq](https://stedolan.github.io/jq/download/) >= 1.6 -- [terraform](https://learn.hashicorp.com/terraform/getting-started/install.html) 0.11.* - -> 你可以使用阿里云的 [云命令行](https://shell.aliyun.com) 服务来进行操作,云命令行中已经预装并配置好了所有工具。 - -### 权限 - -完整部署集群需要下列权限: -- AliyunECSFullAccess -- AliyunESSFullAccess -- AliyunVPCFullAccess -- AliyunSLBFullAccess -- AliyunCSFullAccess -- AliyunEIPFullAccess -- AliyunECIFullAccess -- AliyunVPNGatewayFullAccess -- AliyunNATGatewayFullAccess - -## 概览 - -默认配置下,我们会创建: - -- 一个新的 VPC; -- 一台 ECS 实例作为堡垒机; -- 一个托管版 ACK(阿里云 Kubernetes)集群以及一系列 worker 节点: - - 属于一个伸缩组的 2 台 ECS 实例(2核2G), 托管版 Kubernetes 的默认伸缩组中必须至少有两台实例, 用于承载整个的系统服务, 比如 CoreDNS - - 属于一个伸缩组的 3 台 `ecs.i2.xlarge` 实例, 用于部署 PD - - 属于一个伸缩组的 3 台 `ecs.i2.2xlarge` 实例, 用于部署 TiKV - - 属于一个伸缩组的 2 台 ECS 实例(16核32G)用于部署 TiDB - - 属于一个伸缩组的 1 台 ECS 实例(4核8G)用于部署监控组件 - - 一块 500GB 的云盘用作监控数据存储 - -除了默认伸缩组之外的其它所有实例都是跨可用区部署的。而伸缩组(Auto-scaling Group)能够保证集群的健康实例数等于期望数值,因此,当发生节点故障甚至可用区故障时,伸缩组能够自动为我们创建新实例来确保服务可用性。 - -## 安装 - -设置目标 Region 和阿里云密钥(也可以在运行 `terraform` 命令时根据命令提示输入) -```shell -export TF_VAR_ALICLOUD_REGION= -export TF_VAR_ALICLOUD_ACCESS_KEY= -export TF_VAR_ALICLOUD_SECRET_KEY= -``` - -用于部署集群的各变量的默认值存储在 `variables.tf` 文件中,如需定制可以修改此文件或在安装时通过 `-var` 参数覆盖。 - -使用 Terraform 进行安装: - -```shell -$ git clone https://github.com/pingcap/tidb-operator -$ cd tidb-operator/deploy/aliyun -$ terraform init -$ terraform apply -``` - -假如在运行 `terraform apply` 时出现报错, 请根据报错信息(比如缺少权限)进行修复后再次运行 `terraform apply` - -整个安装过程大约需要 5 至 10 分钟,安装完成后会输出集群的关键信息(想要重新查看这些信息,可以运行 `terraform output`): - -``` -Apply complete! Resources: 3 added, 0 changed, 1 destroyed. - -Outputs: - -bastion_ip = 1.2.3.4 -bastion_key_file = /root/tidb-operator/deploy/aliyun/credentials/tidb-cluster-bastion-key.pem -cluster_id = ca57c6071f31f458da66965ceddd1c31b -kubeconfig_file = /root/tidb-operator/deploy/aliyun/.terraform/modules/a2078f76522ae433133fc16e24bd21ae/kubeconfig_tidb-cluster -monitor_endpoint = 1.2.3.4:3000 -region = cn-hangzhou -tidb_port = 4000 -tidb_slb_ip = 192.168.5.53 -tidb_version = v3.0.1 -vpc_id = vpc-bp16wcbu0xhbg833fymmc -worker_key_file = /root/tidb-operator/deploy/aliyun/credentials/tidb-cluster-node-key.pem -``` - -接下来可以用 `kubectl` 或 `helm` 对集群进行操作(其中 `cluster_name` 默认值为 `tidb-cluster`): - -```shell -$ export KUBECONFIG=$PWD/credentials/kubeconfig_ -$ kubectl version -$ helm ls -``` - -## 连接数据库 - -通过堡垒机可连接 TiDB 集群进行测试,相关信息在安装完成后的输出中均可找到: - -```shell -$ ssh -i credentials/-bastion-key.pem root@ -$ mysql -h -P -u root -``` - -## 监控 - -访问 `` 就可以查看相关的 Grafana 大盘。相关信息可在安装完成后的输出中找到。默认帐号密码为: - - - 用户名:admin - - 密码:admin - -> **警告:**出于安全考虑,假如你已经或将要配置 VPN 用于访问 VPC, 强烈建议将 `monitor_slb_network_type` 设置为 `intranet` 以禁止监控服务的公网访问。 - -## 升级 TiDB 集群 - -设置 `variables.tf` 中的 `tidb_version` 参数,并再次运行 `terraform apply` 即可完成升级。 - -升级操作可能会执行较长时间,可以通过以下命令来持续观察进度: - -``` -watch kubectl get pods --namespace tidb -o wide -``` - -## TiDB 集群水平伸缩 - -按需修改 `variables.tf` 中的 `tikv_count` 和 `tidb_count` 数值,再次运行 `terraform apply` 即可完成 TiDB 集群的水平伸缩。 - -## 销毁集群 - -```shell -$ terraform destroy -``` - -假如 kubernetes 集群没有创建成功,那么在 destroy 时会出现报错,无法进行正常清理。 此时需要手动将 kubernetes 资源从本地状态中移除: - -```shell -$ terraform state list -$ terraform state rm module.ack.alicloud_cs_managed_kubernetes.k8s -``` - -销毁集群操作需要执行较长时间。 - -> **注意:**监控组件挂载的云盘需要在阿里云管理控制台中手动删除。 - -## 自定义 - -默认配置下,Terraform 脚本会创建一个新的 VPC,假如要使用现有的 VPC,可以在 `variable.tf` 中设置 `vpc_id`。注意,当使用现有 VPC 时,没有设置 vswitch 的可用区将不会部署 kubernetes 节点。 - -出于安全考虑,TiDB 服务的 SLB 只对内网暴露,因此默认配置下还会创建一台堡垒机用于运维操作。堡垒机上还会安装 mysql-cli 和 sysbench 以便于使用和测试。假如不需要堡垒机,可以设置 `variables.tf` 中的 `create_bastion` 参数来关闭。 - -实例的规格可以通过两种方式进行定义: - -1. 通过声明实例规格名; -2. 通过声明实例的配置,比如 CPU 核数和内存大小。 - -由于阿里云在不同地域会提供不同的规格类型,并且部分规格有售罄的情况,我们推荐使用第二种办法来定义更通用的实例规格。你可以在 `variables.tf` 中找到相关的配置项。 - -特殊地,由于 PD 和 TiKV 节点强需求本地 SSD 存储,脚本中不允许直接声明 PD 和 TiKV 的规格名,你可以通过设置 `*_instance_type_family` 来选择 PD 或 TiKV 的规格族(只能在三个拥有本地 SSD 的规格族中选择),再通过内存大小来筛选符合需求的型号。 - -更多自定义配置相关的内容,请直接参考项目中的 `variables.tf` 文件。 - -## 限制 - -目前,pod cidr, service cid 和节点型号等配置在集群创建后均无法修改。 diff --git a/deploy/aliyun/README.md b/deploy/aliyun/README.md index 90a275a79d..be5ab01dba 100644 --- a/deploy/aliyun/README.md +++ b/deploy/aliyun/README.md @@ -1,170 +1,3 @@ # Deploy TiDB Operator and TiDB Cluster on Alibaba Cloud Kubernetes -[中文](README-CN.md) - -## Requirements - -- [aliyun-cli](https://github.com/aliyun/aliyun-cli) >= 3.0.15 and [configure aliyun-cli](https://www.alibabacloud.com/help/doc-detail/90766.htm?spm=a2c63.l28256.a3.4.7b52a893EFVglq) -> **Note:** The access key used must be granted permissions to control resources. -- [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl) >= 1.12 -- [helm](https://github.com/helm/helm/blob/master/docs/install.md#installing-the-helm-client) >= 2.9.1 and <= 2.11.0 -- [jq](https://stedolan.github.io/jq/download/) >= 1.6 -- [terraform](https://learn.hashicorp.com/terraform/getting-started/install.html) 0.11.* - -### Permissions - -The following permissions are required: -- AliyunECSFullAccess -- AliyunESSFullAccess -- AliyunVPCFullAccess -- AliyunSLBFullAccess -- AliyunCSFullAccess -- AliyunEIPFullAccess -- AliyunECIFullAccess -- AliyunVPNGatewayFullAccess -- AliyunNATGatewayFullAccess - -## Overview - -The default setup will create: - -- A new VPC -- An ECS instance as bastion machine -- A managed ACK(Alibaba Cloud Kubernetes) cluster with the following ECS instance worker nodes: - - An auto-scaling group of 2 * instances(2c2g) as ACK mandatory workers for system service like CoreDNS - - An auto-scaling group of 3 * `ecs.i2.xlarge` instances for PD - - An auto-scaling group of 3 * `ecs.i2.2xlarge` instances for TiKV - - An auto-scaling group of 2 * instances(16c32g) for TiDB - - An auto-scaling group of 1 * instance(4c8g) for monitoring components - -In addition, the monitoring node will mount a 500GB cloud disk as data volume. All the instances except ACK mandatory workers span in multiple available zones to provide cross-AZ high availability. - -The auto-scaling group will ensure the desired number of healthy instances, so the cluster can auto recover from node failure or even available zone failure. - -## Setup - -Configure target region and credential (you can also set these variables in `terraform` command prompt): -```shell -export TF_VAR_ALICLOUD_REGION= -export TF_VAR_ALICLOUD_ACCESS_KEY= -export TF_VAR_ALICLOUD_SECRET_KEY= -``` - -The `variables.tf` file contains default settings of variables used for deploying the cluster, you can change it or use `-var` option to override a specific variable to fit your need. - -Apply the stack: - -```shell -# Get the code -$ git clone https://github.com/pingcap/tidb-operator -$ cd tidb-operator/deploy/aliyun - -# Apply the configs, note that you must answer "yes" to `terraform apply` to continue -$ terraform init -$ terraform apply -``` - -If you get an error while running `terraform apply`, fix the error(e.g. lack of permission) according to the description and run `terraform apply` again. - -`terraform apply` will take 5 to 10 minutes to create the whole stack, once complete, basic cluster information will be printed: - -> **Note:** You can use the `terraform output` command to get the output again. - -``` -Apply complete! Resources: 3 added, 0 changed, 1 destroyed. - -Outputs: - -bastion_ip = 1.2.3.4 -bastion_key_file = /root/tidb-operator/deploy/aliyun/credentials/tidb-cluster-bastion-key.pem -cluster_id = ca57c6071f31f458da66965ceddd1c31b -kubeconfig_file = /root/tidb-operator/deploy/aliyun/.terraform/modules/a2078f76522ae433133fc16e24bd21ae/kubeconfig_tidb-cluster -monitor_endpoint = 1.2.3.4:3000 -region = cn-hangzhou -tidb_port = 4000 -tidb_slb_ip = 192.168.5.53 -tidb_version = v3.0.1 -vpc_id = vpc-bp16wcbu0xhbg833fymmc -worker_key_file = /root/tidb-operator/deploy/aliyun/credentials/tidb-cluster-node-key.pem -``` - -You can then interact with the ACK cluster using `kubectl` and `helm` (`cluster_name` is `tidb-cluster` by default): - -```shell -$ export KUBECONFIG=$PWD/credentials/kubeconfig_ -$ kubectl version -$ helm ls -``` - -## Access the DB - -You can connect the TiDB cluster via the bastion instance, all necessary information are in the output printed after installation is finished (replace the `<>` parts with values from the output): - -```shell -$ ssh -i credentials/-bastion-key.pem root@ -$ mysql -h -P -u root -``` - -## Monitoring - -Visit `` to view the grafana dashboards. You can find this information in the output of installation. - -The initial login credentials are: - - User: admin - - Password: admin - -> **Warning:** It is strongly recommended to set `monitor_slb_network_type` to `intranet` in `variables.tf` for security if you already have a VPN connecting to your VPC or plan to setup one. - -## Upgrade TiDB cluster - -To upgrade TiDB cluster, modify `tidb_version` variable to a higher version in `variables.tf` and run `terraform apply`. - -This may take a while to complete, watch the process using command: - -``` -kubectl get pods --namespace tidb -o wide --watch -``` - -## Scale TiDB cluster - -To scale TiDB cluster, modify `tikv_count` or `tidb_count` to your desired numbers, and then run `terraform apply`. - -## Destroy - -It may take some while to finish destroying the cluster. - -```shell -$ terraform destroy -``` - -Alibaba cloud terraform provider do not handle kubernetes creation error properly, which will cause an error when destroying. In that case, you can remove the kubernetes resource from the local state manually and proceed to destroy the rest resources: - -```shell -$ terraform state list -$ terraform state rm module.ack.alicloud_cs_managed_kubernetes.k8s -``` - -> **Note:** You have to manually delete the cloud disk used by monitoring node in Aliyun's console after destroying if you don't need it anymore. - -## Customize - -By default, the terraform script will create a new VPC. You can use an existing VPC by setting `vpc_id` to use an existing VPC. Note that kubernetes node will only be created in available zones that has vswitch existed when using existing VPC. - -An ecs instance is also created by default as bastion machine to connect to the created TiDB cluster, because the TiDB service is only exposed to intranet. The bastion instance has mysql-cli and sysbench installed that helps you use and test TiDB. - -If you don't have to access TiDB from internet, you could disable the creation of bastion instance by setting `create_bastion` to false in `variables.tf` - -The worker node instance types are also configurable, there are two ways to configure that: - -1. by specifying instance type id -2. by specifying capacity like instance cpu count and memory size - -Because the Alibaba Cloud offers different instance types in different region, it is recommended to specify the capacity instead of certain type. You can configure these in the `variables.tf`, note that instance type will override capacity configurations. - -There's a exception for PD and TiKV instances, because PD and TiKV required local SSD, so you cannot specify instance type for them. Instead, you can choose the type family among `ecs.i1`,`ecs.i2` and `ecs.i2g`, which has one or more local NVMe SSD, and select a certain type in the type family by specifying `instance_memory_size`. - -For more customization options, please refer to `variables.tf` - -## Limitations - -You cannot change pod cidr, service cidr and worker instance types once the cluster created. +This document has been moved to [https://pingcap.com/docs/v3.0/how-to/deploy/orchestrated/tidb-in-kubernetes/alibaba-cloud/](https://pingcap.com/docs/v3.0/how-to/deploy/orchestrated/tidb-in-kubernetes/alibaba-cloud/). diff --git a/deploy/aliyun/ack/data.tf b/deploy/aliyun/ack/data.tf deleted file mode 100644 index 5d9315d773..0000000000 --- a/deploy/aliyun/ack/data.tf +++ /dev/null @@ -1,40 +0,0 @@ -data "alicloud_zones" "all" { - network_type = "Vpc" -} - -data "alicloud_vswitches" "default" { - vpc_id = "${var.vpc_id}" -} - -data "alicloud_instance_types" "default" { - availability_zone = "${lookup(data.alicloud_zones.all.zones[0], "id")}" - cpu_core_count = "${var.default_worker_cpu_core_count}" -} - -# Workaround map to list transformation, see stackoverflow.com/questions/43893295 -data "template_file" "vswitch_id" { - count = "${var.vpc_id == "" ? 0 : length(data.alicloud_vswitches.default.vswitches)}" - template = "${lookup(data.alicloud_vswitches.default.0.vswitches[count.index], "id")}" -} - -# Get cluster bootstrap token -data "external" "token" { - depends_on = ["alicloud_cs_managed_kubernetes.k8s"] - - # Terraform use map[string]string to unmarshal the result, transform the json to conform - program = ["bash", "-c", "aliyun --region ${var.region} cs POST /clusters/${alicloud_cs_managed_kubernetes.k8s.id}/token --body '{\"is_permanently\": true}' | jq \"{token: .token}\""] -} - -data "template_file" "userdata" { - template = "${file("${path.module}/templates/user_data.sh.tpl")}" - count = "${length(var.worker_groups)}" - - vars { - pre_userdata = "${lookup(var.worker_groups[count.index], "pre_userdata", var.group_default["pre_userdata"])}" - post_userdata = "${lookup(var.worker_groups[count.index], "post_userdata", var.group_default["post_userdata"])}" - open_api_token = "${lookup(data.external.token.result, "token")}" - node_taints = "${lookup(var.worker_groups[count.index], "node_taints", var.group_default["node_taints"])}" - node_labels = "${lookup(var.worker_groups[count.index], "node_labels", var.group_default["node_labels"])}" - region = "${var.region}" - } -} diff --git a/deploy/aliyun/ack/main.tf b/deploy/aliyun/ack/main.tf deleted file mode 100644 index 85c2cf52a9..0000000000 --- a/deploy/aliyun/ack/main.tf +++ /dev/null @@ -1,147 +0,0 @@ -/* - Alicloud ACK module that launches: - - - A managed kubernetes cluster; - - Several auto-scaling groups which acting as worker nodes. - - Each auto-scaling group has the same instance type and will - balance ECS instances across multiple AZ in favor of HA. - */ -provider "alicloud" {} - -resource "alicloud_key_pair" "default" { - count = "${var.key_pair_name == "" ? 1 : 0}" - key_name_prefix = "${var.cluster_name_prefix}-key" - key_file = "${var.key_file != "" ? var.key_file : format("%s/%s-key", path.module, var.cluster_name_prefix)}" -} - -# If there is not specifying vpc_id, create a new one -resource "alicloud_vpc" "vpc" { - count = "${var.vpc_id == "" ? 1 : 0}" - cidr_block = "${var.vpc_cidr}" - name = "${var.cluster_name_prefix}-vpc" - - lifecycle { - ignore_changes = ["cidr_block"] - } -} - -# For new vpc or existing vpc with no vswitches, create vswitch for each zone -resource "alicloud_vswitch" "all" { - count = "${var.vpc_id != "" && (length(data.alicloud_vswitches.default.vswitches) != 0) ? 0 : length(data.alicloud_zones.all.zones)}" - vpc_id = "${alicloud_vpc.vpc.0.id}" - cidr_block = "${cidrsubnet(alicloud_vpc.vpc.0.cidr_block, var.vpc_cidr_newbits, count.index)}" - availability_zone = "${lookup(data.alicloud_zones.all.zones[count.index%length(data.alicloud_zones.all.zones)], "id")}" - name = "${format("vsw-%s-%d", var.cluster_name_prefix, count.index+1)}" -} - -resource "alicloud_security_group" "group" { - count = "${var.group_id == "" ? 1 : 0}" - name = "${var.cluster_name_prefix}-sg" - vpc_id = "${var.vpc_id != "" ? var.vpc_id : alicloud_vpc.vpc.0.id}" - description = "Security group for ACK worker nodes" -} - -# Allow traffic inside VPC -resource "alicloud_security_group_rule" "cluster_worker_ingress" { - count = "${var.group_id == "" ? 1 : 0}" - security_group_id = "${alicloud_security_group.group.id}" - type = "ingress" - ip_protocol = "all" - nic_type = "intranet" - port_range = "-1/-1" - cidr_ip = "${var.vpc_id != "" ? var.vpc_cidr : alicloud_vpc.vpc.0.cidr_block}" -} - -# Create a managed Kubernetes cluster -resource "alicloud_cs_managed_kubernetes" "k8s" { - name_prefix = "${var.cluster_name_prefix}" - - // split and join: workaround for terraform's limitation of conditional list choice, similarly hereinafter - vswitch_ids = ["${element(split(",", var.vpc_id != "" && (length(data.alicloud_vswitches.default.vswitches) != 0) ? join(",", data.template_file.vswitch_id.*.rendered) : join(",", alicloud_vswitch.all.*.id)), 0)}"] - key_name = "${alicloud_key_pair.default.key_name}" - pod_cidr = "${var.k8s_pod_cidr}" - service_cidr = "${var.k8s_service_cidr}" - new_nat_gateway = "${var.create_nat_gateway}" - cluster_network_type = "${var.cluster_network_type}" - slb_internet_enabled = "${var.public_apiserver}" - kube_config = "${var.kubeconfig_file != "" ? var.kubeconfig_file : format("%s/kubeconfig", path.module)}" - worker_numbers = ["${var.default_worker_count}"] - worker_instance_types = ["${var.default_worker_type != "" ? var.default_worker_type : data.alicloud_instance_types.default.instance_types.0.id}"] - - # These varialbes are 'ForceNew' that will cause kubernetes cluster re-creation - # on variable change, so we make all these variables immutable in favor of safety. - lifecycle { - ignore_changes = [ - "vswitch_ids", - "worker_instance_types", - "key_name", - "pod_cidr", - "service_cidr", - "cluster_network_type", - ] - } - - depends_on = ["alicloud_vpc.vpc"] -} - -# Create auto-scaling groups -resource "alicloud_ess_scaling_group" "workers" { - count = "${length(var.worker_groups)}" - scaling_group_name = "${alicloud_cs_managed_kubernetes.k8s.name}-${lookup(var.worker_groups[count.index], "name", count.index)}" - vswitch_ids = ["${split(",", var.vpc_id != "" ? join(",", data.template_file.vswitch_id.*.rendered) : join(",", alicloud_vswitch.all.*.id))}"] - min_size = "${lookup(var.worker_groups[count.index], "min_size", var.group_default["min_size"])}" - max_size = "${lookup(var.worker_groups[count.index], "max_size", var.group_default["max_size"])}" - default_cooldown = "${lookup(var.worker_groups[count.index], "default_cooldown", var.group_default["default_cooldown"])}" - multi_az_policy = "${lookup(var.worker_groups[count.index], "multi_az_policy", var.group_default["multi_az_policy"])}" - - # Remove the newest instance in the oldest scaling configuration - removal_policies = [ - "OldestScalingConfiguration", - "NewestInstance", - ] - - lifecycle { - # FIXME: currently update vswitch_ids will force will recreate, allow updating when upstream support in-place - # vswitch id update - ignore_changes = ["vswitch_ids"] - - create_before_destroy = true - } -} - -# Create the cooresponding auto-scaling configurations -resource "alicloud_ess_scaling_configuration" "workers" { - count = "${length(var.worker_groups)}" - scaling_group_id = "${element(alicloud_ess_scaling_group.workers.*.id, count.index)}" - image_id = "${lookup(var.worker_groups[count.index], "image_id", var.group_default["image_id"])}" - instance_type = "${lookup(var.worker_groups[count.index], "instance_type", var.group_default["instance_type"])}" - security_group_id = "${var.group_id != "" ? var.group_id : alicloud_security_group.group.id}" - key_name = "${alicloud_key_pair.default.key_name}" - system_disk_category = "${lookup(var.worker_groups[count.index], "system_disk_category", var.group_default["system_disk_category"])}" - system_disk_size = "${lookup(var.worker_groups[count.index], "system_disk_size", var.group_default["system_disk_size"])}" - user_data = "${element(data.template_file.userdata.*.rendered, count.index)}" - internet_charge_type = "${lookup(var.worker_groups[count.index], "internet_charge_type", var.group_default["internet_charge_type"])}" - internet_max_bandwidth_in = "${lookup(var.worker_groups[count.index], "internet_max_bandwidth_in", var.group_default["internet_max_bandwidth_in"])}" - internet_max_bandwidth_out = "${lookup(var.worker_groups[count.index], "internet_max_bandwidth_out", var.group_default["internet_max_bandwidth_out"])}" - - enable = true - active = true - force_delete = true - - tags = "${merge(map( - "name", "${alicloud_cs_managed_kubernetes.k8s.name}-${lookup(var.worker_groups[count.index], "name", count.index)}-ack_asg", - "kubernetes.io/cluster/${alicloud_cs_managed_kubernetes.k8s.name}", "owned", - "k8s.io/cluster-autoscaler/${lookup(var.worker_groups[count.index], "autoscaling_enabled", var.group_default["autoscaling_enabled"]) == 1 ? "enabled" : "disabled"}", "true", - "k8s.io/cluster-autoscaler/${alicloud_cs_managed_kubernetes.k8s.name}", "default" - ), - var.default_group_tags, - var.worker_group_tags[count.index%length(var.worker_group_tags)] - ) - }" - - lifecycle { - ignore_changes = ["instance_type"] - create_before_destroy = true - } -} diff --git a/deploy/aliyun/ack/outputs.tf b/deploy/aliyun/ack/outputs.tf deleted file mode 100644 index 288c342855..0000000000 --- a/deploy/aliyun/ack/outputs.tf +++ /dev/null @@ -1,34 +0,0 @@ -output "cluster_id" { - description = "The id of the ACK cluster." - value = "${alicloud_cs_managed_kubernetes.k8s.*.id}" -} - -output "cluster_name" { - description = "The name of ACK cluster" - value = "${alicloud_cs_managed_kubernetes.k8s.*.name}" -} - -output "cluster_nodes" { - description = "The cluster worker node ids of ACK cluster" - value = "${alicloud_ess_scaling_configuration.workers.*.id}" -} - -output "vpc_id" { - description = "The vpc id of ACK cluster" - value = "${alicloud_cs_managed_kubernetes.k8s.*.vpc_id}" -} - -output "vswitch_ids" { - description = "The vswich ids of ACK cluster" - value = "${alicloud_cs_managed_kubernetes.k8s.*.vswitch_ids}" -} - -output "security_group_id" { - description = "The security_group_id of ACK cluster" - value = "${alicloud_cs_managed_kubernetes.k8s.*.security_group_id}" -} - -output "kubeconfig_filename" { - description = "The filename of the generated kubectl config." - value = "${path.module}/kubeconfig_${var.cluster_name_prefix}" -} diff --git a/deploy/aliyun/ack/variables.tf b/deploy/aliyun/ack/variables.tf deleted file mode 100644 index f36ad7b217..0000000000 --- a/deploy/aliyun/ack/variables.tf +++ /dev/null @@ -1,160 +0,0 @@ -variable "region" { - description = "Alicloud region" -} - -variable "cluster_name_prefix" { - description = "Kubernetes cluster name" - default = "ack-cluster" -} - -variable "cluster_network_type" { - description = "Kubernetes network plugin, options: [flannel, terway]. Cannot change once created." - default = "flannel" -} - -variable "span_all_zones" { - description = "Whether span worker nodes in all avaiable zones, worker_zones will be ignored if span_all_zones=true" - default = true -} - -variable "worker_zones" { - description = "Available zones of worker nodes, used when span_all_zones=false. It is highly recommended to guarantee the instance type of workers is available in at least two zones in favor of HA." - type = "list" - default = [] -} - -variable "public_apiserver" { - description = "Whether enable apiserver internet access" - default = false -} - -variable "kubeconfig_file" { - description = "The path that kubeconfig file write to, default to $${path.module}/kubeconfig if empty." - default = "" -} - -variable "k8s_pod_cidr" { - description = "The kubernetes pod cidr block. It cannot be equals to vpc's or vswitch's and cannot be in them. Cannot change once the cluster created." - default = "172.20.0.0/16" -} - -variable "k8s_service_cidr" { - description = "The kubernetes service cidr block. It cannot be equals to vpc's or vswitch's or pod's and cannot be in them. Cannot change once the cluster created." - default = "172.21.0.0/20" -} - -variable "vpc_cidr" { - description = "VPC cidr_block, options: [192.168.0.0.0/16, 172.16.0.0/16, 10.0.0.0/8], cannot collidate with kubernetes service cidr and pod cidr. Cannot change once the vpc created." - default = "192.168.0.0/16" -} - -variable "key_file" { - description = "The path that new key file write to, defaul to $${path.module}/$${cluster_name}-key.pem if empty" - default = "" -} - -variable "key_pair_name" { - description = "Key pair for worker instance, specify this variable to use an exsitng key pair. A new key pair will be created by default." - default = "" -} - -variable "vpc_id" { - description = "VPC id, specify this variable to use an exsiting VPC and the vswitches in the VPC. Note that when using existing vpc, it is recommended to use a existing security group too. Otherwise you have to set vpc_cidr according to the existing VPC settings to get correct in-cluster security rule." - default = "" -} - -variable "group_id" { - description = "Security group id, specify this variable to use and exising security group" - default = "" -} - -variable "vpc_cidr_newbits" { - description = "VPC cidr newbits, it's better to be set as 16 if you use 10.0.0.0/8 cidr block" - default = "8" -} - -variable "create_nat_gateway" { - description = "If create nat gateway in VPC" - default = true -} - -variable "default_worker_count" { - description = "The number of kubernetes default worker nodes, value: [2,50]. See module README for detail." - default = 2 -} - -variable "default_worker_cpu_core_count" { - description = "The instance cpu core count of kubernetes default worker nodes, this variable will be ignroed if default_worker_type set" - default = 1 -} - -variable "default_worker_type" { - description = "The instance type of kubernets default worker nodes, it is recommend to use default_worker_cpu_core_count to select flexible instance type" - default = "" -} - -variable "worker_groups" { - description = "A list of maps defining worker group configurations to be defined using alicloud ESS. See group_default for validate keys." - type = "list" - - default = [ - { - "name" = "default" - }, - ] -} - -variable "group_default" { - description = < 8, default thread pool size for coprocessors - # will be set to tikv.resources.limits.cpu * 0.8. - # readpoolCoprocessorConcurrency: 8 - - # scheduler's worker pool size, should increase it in heavy write cases, - # also should less than total cpu cores. - # storageSchedulerWorkerPoolSize: 4 - -tidb: - replicas: ${tidb_replicas} - # The secret name of root password, you can create secret with following command: - # kubectl create secret generic tidb-secret --from-literal=root= --namespace= - # If unset, the root password will be empty and you can set it after connecting - # passwordSecretName: tidb-secret - # initSql is the SQL statements executed after the TiDB cluster is bootstrapped. - # initSql: |- - # create database app; - image: "pingcap/tidb:${cluster_version}" - # Image pull policy. - imagePullPolicy: IfNotPresent - logLevel: info - preparedPlanCacheEnabled: false - preparedPlanCacheCapacity: 100 - # Enable local latches for transactions. Enable it when - # there are lots of conflicts between transactions. - txnLocalLatchesEnabled: false - txnLocalLatchesCapacity: "10240000" - # The limit of concurrent executed sessions. - tokenLimit: "1000" - # Set the memory quota for a query in bytes. Default: 32GB - memQuotaQuery: "34359738368" - # The limitation of the number for the entries in one transaction. - # If using TiKV as the storage, the entry represents a key/value pair. - # WARNING: Do not set the value too large, otherwise it will make a very large impact on the TiKV cluster. - # Please adjust this configuration carefully. - txnEntryCountLimit: "300000" - # The limitation of the size in byte for the entries in one transaction. - # If using TiKV as the storage, the entry represents a key/value pair. - # WARNING: Do not set the value too large, otherwise it will make a very large impact on the TiKV cluster. - # Please adjust this configuration carefully. - txnTotalSizeLimit: "104857600" - # enableBatchDml enables batch commit for the DMLs - enableBatchDml: false - # check mb4 value in utf8 is used to control whether to check the mb4 characters when the charset is utf8. - checkMb4ValueInUtf8: true - # treat-old-version-utf8-as-utf8mb4 use for upgrade compatibility. Set to true will treat old version table/column UTF8 charset as UTF8MB4. - treatOldVersionUtf8AsUtf8mb4: true - # lease is schema lease duration, very dangerous to change only if you know what you do. - lease: 45s - # Max CPUs to use, 0 use number of CPUs in the machine. - maxProcs: 0 - resources: - limits: {} - # cpu: 16000m - # memory: 16Gi - requests: {} - # cpu: 12000m - # memory: 12Gi - nodeSelector: - dedicated: tidb - # kind: tidb - # zone: cn-bj1-01,cn-bj1-02 - # region: cn-bj1 - tolerations: - - key: dedicated - operator: Equal - value: tidb - effect: "NoSchedule" - maxFailoverCount: 3 - service: - type: LoadBalancer - exposeStatus: true - annotations: - service.beta.kubernetes.io/alicloud-loadbalancer-address-type: intranet - service.beta.kubernetes.io/alicloud-loadbalancer-slb-network-type: vpc - # separateSlowLog: true - slowLogTailer: - image: busybox:1.26.2 - resources: - limits: - cpu: 100m - memory: 50Mi - requests: - cpu: 20m - memory: 5Mi - - # tidb plugin configuration - plugin: - # enable plugin or not - enable: false - # the start argument to specify the folder containing - directory: /plugins - # the start argument to specify the plugin id (name "-" version) that needs to be loaded, e.g. 'conn_limit-1'. - list: ["whitelist-1"] - -# mysqlClient is used to set password for TiDB -# it must has Python MySQL client installed -mysqlClient: - image: tnir/mysqlclient - imagePullPolicy: IfNotPresent - -monitor: - create: true - # Also see rbac.create - # If you set rbac.create to false, you need to provide a value here. - # If you set rbac.create to true, you should leave this empty. - # serviceAccount: - persistent: true - storageClassName: ${monitor_storage_class} - storage: ${monitor_storage_size} - grafana: - create: true - image: grafana/grafana:6.0.1 - imagePullPolicy: IfNotPresent - logLevel: info - resources: - limits: {} - # cpu: 8000m - # memory: 8Gi - requests: {} - # cpu: 4000m - # memory: 4Gi - username: admin - password: admin - config: - # Configure Grafana using environment variables except GF_PATHS_DATA, GF_SECURITY_ADMIN_USER and GF_SECURITY_ADMIN_PASSWORD - # Ref https://grafana.com/docs/installation/configuration/#using-environment-variables - GF_AUTH_ANONYMOUS_ENABLED: %{ if monitor_enable_anonymous_user }"true"%{ else }"false"%{ endif } - GF_AUTH_ANONYMOUS_ORG_NAME: "Main Org." - GF_AUTH_ANONYMOUS_ORG_ROLE: "Viewer" - # if grafana is running behind a reverse proxy with subpath http://foo.bar/grafana - # GF_SERVER_DOMAIN: foo.bar - # GF_SERVER_ROOT_URL: "%(protocol)s://%(domain)s/grafana/" - service: - type: LoadBalancer - annotations: - service.beta.kubernetes.io/alicloud-loadbalancer-address-type: ${monitor_slb_network_type} - prometheus: - image: prom/prometheus:v2.2.1 - imagePullPolicy: IfNotPresent - logLevel: info - resources: - limits: {} - # cpu: 8000m - # memory: 8Gi - requests: {} - # cpu: 4000m - # memory: 4Gi - service: - type: NodePort - reserveDays: ${monitor_reserve_days} - # alertmanagerURL: "" - nodeSelector: {} - # kind: monitor - # zone: cn-bj1-01,cn-bj1-02 - # region: cn-bj1 - tolerations: [] - # - key: node-role - # operator: Equal - # value: tidb - # effect: "NoSchedule" - -binlog: - pump: - create: false - replicas: 1 - image: "pingcap/tidb-binlog:${cluster_version}" - imagePullPolicy: IfNotPresent - logLevel: info - # storageClassName is a StorageClass provides a way for administrators to describe the "classes" of storage they offer. - # different classes might map to quality-of-service levels, or to backup policies, - # or to arbitrary policies determined by the cluster administrators. - # refer to https://kubernetes.io/docs/concepts/storage/storage-classes - storageClassName: ${local_storage_class} - storage: 10Gi - syncLog: true - # a integer value to control expiry date of the binlog data, indicates for how long (in days) the binlog data would be stored. - # must bigger than 0 - gc: 7 - # number of seconds between heartbeat ticks (in 2 seconds) - heartbeatInterval: 2 - - drainer: - create: false - image: "pingcap/tidb-binlog:${cluster_version}" - imagePullPolicy: IfNotPresent - logLevel: info - # storageClassName is a StorageClass provides a way for administrators to describe the "classes" of storage they offer. - # different classes might map to quality-of-service levels, or to backup policies, - # or to arbitrary policies determined by the cluster administrators. - # refer to https://kubernetes.io/docs/concepts/storage/storage-classes - storageClassName: ${local_storage_class} - storage: 10Gi - # parallel worker count (default 16) - workerCount: 16 - # the interval time (in seconds) of detect pumps' status (default 10) - detectInterval: 10 - # disbale detect causality - disableDetect: false - # disable dispatching sqls that in one same binlog; if set true, work-count and txn-batch would be useless - disableDispatch: false - # # disable sync these schema - ignoreSchemas: "INFORMATION_SCHEMA,PERFORMANCE_SCHEMA,mysql,test" - # if drainer donesn't have checkpoint, use initial commitTS to initial checkpoint - initialCommitTs: 0 - # enable safe mode to make syncer reentrant - safeMode: false - # number of binlog events in a transaction batch (default 20) - txnBatch: 20 - # downstream storage, equal to --dest-db-type - # valid values are "mysql", "pb", "kafka" - destDBType: pb - mysql: {} - # host: "127.0.0.1" - # user: "root" - # password: "" - # port: 3306 - # # Time and size limits for flash batch write - # timeLimit: "30s" - # sizeLimit: "100000" - kafka: {} - # only need config one of zookeeper-addrs and kafka-addrs, will get kafka address if zookeeper-addrs is configed. - # zookeeperAddrs: "127.0.0.1:2181" - # kafkaAddrs: "127.0.0.1:9092" - # kafkaVersion: "0.8.2.0" - -scheduledBackup: - create: false - binlogImage: "pingcap/tidb-binlog:${cluster_version}" - binlogImagePullPolicy: IfNotPresent - # https://github.com/tennix/tidb-cloud-backup - mydumperImage: pingcap/tidb-cloud-backup:20190610 - mydumperImagePullPolicy: IfNotPresent - # storageClassName is a StorageClass provides a way for administrators to describe the "classes" of storage they offer. - # different classes might map to quality-of-service levels, or to backup policies, - # or to arbitrary policies determined by the cluster administrators. - # refer to https://kubernetes.io/docs/concepts/storage/storage-classes - storageClassName: ${local_storage_class} - storage: 100Gi - # https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#schedule - schedule: "0 0 * * *" - # https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#suspend - suspend: false - # https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#jobs-history-limits - successfulJobsHistoryLimit: 3 - failedJobsHistoryLimit: 1 - # https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#starting-deadline - startingDeadlineSeconds: 3600 - # https://github.com/maxbube/mydumper/blob/master/docs/mydumper_usage.rst#options - options: "--chunk-filesize=100" - # secretName is the name of the secret which stores user and password used for backup - # Note: you must give the user enough privilege to do the backup - # you can create the secret by: - # kubectl create secret generic backup-secret --from-literal=user=root --from-literal=password= - secretName: backup-secret - # backup to gcp - gcp: {} - # bucket: "" - # secretName is the name of the secret which stores the gcp service account credentials json file - # The service account must have read/write permission to the above bucket. - # Read the following document to create the service account and download the credentials file as credentials.json: - # https://cloud.google.com/docs/authentication/production#obtaining_and_providing_service_account_credentials_manually - # And then create the secret by: kubectl create secret generic gcp-backup-secret --from-file=./credentials.json - # secretName: gcp-backup-secret - - # backup to ceph object storage - ceph: {} - # endpoint: "" - # bucket: "" - # secretName is the name of the secret which stores ceph object store access key and secret key - # You can create the secret by: - # kubectl create secret generic ceph-backup-secret --from-literal=access_key= --from-literal=secret_key= - # secretName: ceph-backup-secret - - # backup to s3 - s3: {} - # region: "" - # bucket: "" - # secretName is the name of the secret which stores s3 object store access key and secret key - # You can create the secret by: - # kubectl create secret generic s3-backup-secret --from-literal=access_key= --from-literal=secret_key= - # secretName: s3-backup-secret - -metaInstance: "{{ $labels.instance }}" -metaType: "{{ $labels.type }}" -metaValue: "{{ $value }}" diff --git a/deploy/aliyun/userdata/pd-userdata.sh b/deploy/aliyun/userdata/pd-userdata.sh deleted file mode 100644 index 0740182f9e..0000000000 --- a/deploy/aliyun/userdata/pd-userdata.sh +++ /dev/null @@ -1,15 +0,0 @@ -#!/bin/sh -# set system ulimits -cat < /etc/security/limits.d/99-tidb.conf -root soft nofile 1000000 -root hard nofile 1000000 -root soft core unlimited -root soft stack 10240 -EOF -# config docker ulimits -cp /usr/lib/systemd/system/docker.service /etc/systemd/system/docker.service -sed -i 's/LimitNOFILE=infinity/LimitNOFILE=1048576/' /etc/systemd/system/docker.service -sed -i 's/LimitNPROC=infinity/LimitNPROC=1048576/' /etc/systemd/system/docker.service -systemctl daemon-reload -systemctl restart docker - diff --git a/deploy/aliyun/variables.tf b/deploy/aliyun/variables.tf index e20fc0d7a0..7e583a1131 100644 --- a/deploy/aliyun/variables.tf +++ b/deploy/aliyun/variables.tf @@ -1,26 +1,50 @@ -variable "cluster_name_prefix" { - description = "TiDB cluster name" - default = "tidb-cluster" +variable "bastion_image_name" { + description = "OS image of bastion" + default = "centos_7_06_64_20G_alibase_20190218.vhd" +} + +variable "bastion_cpu_core_count" { + description = "CPU core count to select bastion type" + default = 1 +} + +variable "operator_version" { + type = string + default = "v1.0.0-beta.3" +} + +variable "operator_helm_values" { + type = string + default = "" +} + +variable "bastion_ingress_cidr" { + description = "Bastion ingress security rule cidr, it is highly recommended to set this in favor of safety" + default = "0.0.0.0/0" +} + +variable "cluster_name" { + description = "Kubernetes cluster name" + default = "my-cluster" } variable "tidb_version" { description = "TiDB cluster version" default = "v3.0.1" } +variable "tidb_cluster_chart_version" { + description = "tidb-cluster chart version" + default = "v1.0.0-beta.3" +} variable "pd_count" { description = "PD instance count, the recommend value is 3" default = 3 } -variable "pd_instance_type_family" { - description = "PD instance type family, values: [ecs.i2, ecs.i1, ecs.i2g]" - default = "ecs.i2" -} - -variable "pd_instance_memory_size" { - description = "PD instance memory size in GB, must available in the type famliy" - default = 32 +variable "pd_instance_type" { + description = "PD instance type" + default = "ecs.g5.large" } variable "tikv_count" { @@ -28,14 +52,9 @@ variable "tikv_count" { default = 3 } -variable "tikv_instance_type_family" { +variable "tikv_instance_type" { description = "TiKV instance memory in GB, must available in type family" - default = "ecs.i2" -} - -variable "tikv_memory_size" { - description = "TiKV instance memory in GB, must available in type family" - default = 64 + default = "ecs.i2.2xlarge" } variable "tidb_count" { @@ -44,46 +63,13 @@ variable "tidb_count" { } variable "tidb_instance_type" { - description = "TiDB instance type, this variable override tidb_instance_core_count and tidb_instance_memory_size, is recommended to use the tidb_instance_core_count and tidb_instance_memory_size to select instance type in favor of flexibility" - - default = "" -} - -variable "tidb_instance_core_count" { - default = 16 -} - -variable "tidb_instance_memory_size" { - default = 32 + description = "TiDB instance type" + default = "ecs.c5.4xlarge" } -variable "monitor_intance_type" { - description = "Monitor instance type, this variable override tidb_instance_core_count and tidb_instance_memory_size, is recommended to use the tidb_instance_core_count and tidb_instance_memory_size to select instance type in favor of flexibility" - - default = "" -} - -variable "monitor_instance_core_count" { - default = 4 -} - -variable "monitor_instance_memory_size" { - default = 8 -} - -variable "monitor_storage_class" { - description = "Monitor PV storageClass, values: [alicloud-disk-commo, alicloud-disk-efficiency, alicloud-disk-ssd, alicloud-disk-available]" - default = "alicloud-disk-available" -} - -variable "monitor_storage_size" { - description = "Monitor storage size in Gi" - default = 500 -} - -variable "monitor_reserve_days" { - description = "Monitor data reserve days" - default = 14 +variable "monitor_instance_type" { + description = "Monitor instance type" + default = "ecs.c5.xlarge" } variable "default_worker_core_count" { @@ -91,43 +77,9 @@ variable "default_worker_core_count" { default = 2 } -variable "create_bastion" { - description = "Whether create bastion server" - default = true -} - -variable "bastion_image_name" { - description = "OS image of bastion" - default = "centos_7_06_64_20G_alibase_20190218.vhd" -} - -variable "bastion_key_prefix" { - default = "bastion-key" -} - -variable "bastion_cpu_core_count" { - description = "CPU core count to select bastion type" - default = 1 -} - -variable "bastion_ingress_cidr" { - description = "Bastion ingress security rule cidr, it is highly recommended to set this in favor of safety" - default = "0.0.0.0/0" -} - -variable "monitor_slb_network_type" { - description = "The monitor slb network type, values: [internet, intranet]. It is recommended to set it as intranet and access via VPN in favor of safety" - default = "internet" -} - -variable "monitor_enable_anonymous_user" { - description = "Whether enabling anonymous user visiting for monitoring" - default = false -} - variable "vpc_id" { - description = "VPC id, specify this variable to use an exsiting VPC and the vswitches in the VPC. Note that when using existing vpc, it is recommended to use a existing security group too. Otherwise you have to set vpc_cidr according to the existing VPC settings to get correct in-cluster security rule." - default = "" + description = "VPC id" + default = "" } variable "group_id" { diff --git a/deploy/aliyun/versions.tf b/deploy/aliyun/versions.tf new file mode 100644 index 0000000000..ac97c6ac8e --- /dev/null +++ b/deploy/aliyun/versions.tf @@ -0,0 +1,4 @@ + +terraform { + required_version = ">= 0.12" +} diff --git a/deploy/aws/clusters.tf b/deploy/aws/clusters.tf index fcf0ac462b..011279289d 100644 --- a/deploy/aws/clusters.tf +++ b/deploy/aws/clusters.tf @@ -57,3 +57,4 @@ module "default-cluster" { monitor_instance_type = var.default_cluster_monitor_instance_type override_values = file("default-cluster.yaml") } + diff --git a/deploy/aliyun/userdata/bastion-userdata b/deploy/modules/aliyun/bastion/bastion-userdata similarity index 100% rename from deploy/aliyun/userdata/bastion-userdata rename to deploy/modules/aliyun/bastion/bastion-userdata diff --git a/deploy/modules/aliyun/bastion/bastion.tf b/deploy/modules/aliyun/bastion/bastion.tf new file mode 100644 index 0000000000..cc60320d95 --- /dev/null +++ b/deploy/modules/aliyun/bastion/bastion.tf @@ -0,0 +1,43 @@ +data "alicloud_instance_types" "bastion" { + cpu_core_count = var.bastion_cpu_core_count +} + +resource "alicloud_security_group" "bastion-group" { + name = var.bastion_name + vpc_id = var.vpc_id + description = "Allow internet SSH connections to bastion node" +} + +resource "alicloud_security_group_rule" "allow_ssh_from_local" { + type = "ingress" + ip_protocol = "tcp" + nic_type = "intranet" + port_range = "22/22" + security_group_id = alicloud_security_group.bastion-group.id + cidr_ip = var.bastion_ingress_cidr +} + +resource "alicloud_security_group_rule" "allow_ssh_to_worker" { + count = var.enable_ssh_to_worker ? 1 : 0 + type = "ingress" + ip_protocol = "tcp" + nic_type = "intranet" + policy = "accept" + port_range = "22/22" + priority = 1 + security_group_id = var.worker_security_group_id + source_security_group_id = alicloud_security_group.bastion-group.id +} + +resource "alicloud_instance" "bastion" { + instance_name = var.bastion_name + image_id = var.bastion_image_name + instance_type = data.alicloud_instance_types.bastion.instance_types[0].id + security_groups = [alicloud_security_group.bastion-group.id] + vswitch_id = var.vswitch_id + key_name = var.key_name + internet_charge_type = "PayByTraffic" + internet_max_bandwidth_in = 10 + internet_max_bandwidth_out = 10 + user_data = file("${path.module}/bastion-userdata") +} \ No newline at end of file diff --git a/deploy/modules/aliyun/bastion/outputs.tf b/deploy/modules/aliyun/bastion/outputs.tf new file mode 100644 index 0000000000..74a1157c67 --- /dev/null +++ b/deploy/modules/aliyun/bastion/outputs.tf @@ -0,0 +1,3 @@ +output "bastion_ip" { + value = join(",", alicloud_instance.bastion.*.public_ip) +} diff --git a/deploy/modules/aliyun/bastion/variables.tf b/deploy/modules/aliyun/bastion/variables.tf new file mode 100644 index 0000000000..c3683ba2c7 --- /dev/null +++ b/deploy/modules/aliyun/bastion/variables.tf @@ -0,0 +1,40 @@ +variable "bastion_image_name" { + description = "OS image of bastion" + default = "centos_7_06_64_20G_alibase_20190218.vhd" +} + +variable "bastion_cpu_core_count" { + description = "CPU core count to select bastion type" + default = 1 +} + +variable "bastion_ingress_cidr" { + description = "Bastion ingress security rule cidr, it is highly recommended to set this in favor of safety" + default = "0.0.0.0/0" +} + +variable "key_name" { + description = "bastion key name" +} + +variable "vpc_id" { + description = "VPC id" +} + +variable "bastion_name" { + description = "bastion name" +} + +variable "vswitch_id" { + description = "vswitch id" +} + +variable "worker_security_group_id" { + description = "The security group id of worker nodes, must be provided if enable_ssh_to_worker set to true" + default = "" +} + +variable "enable_ssh_to_worker" { + description = "Whether enable ssh connection from bastion to ACK workers" + default = false +} diff --git a/deploy/modules/aliyun/tidb-cluster/local.tf b/deploy/modules/aliyun/tidb-cluster/local.tf new file mode 100644 index 0000000000..a41ab70faa --- /dev/null +++ b/deploy/modules/aliyun/tidb-cluster/local.tf @@ -0,0 +1,54 @@ +locals { + + group_default = { + min_size = 0 + max_size = 100 + default_cooldown = 300 + image_id = var.image_id + instance_type = "ecs.g5.large" + system_disk_category = "cloud_efficiency" + system_disk_size = 50 + pre_userdata = "" + post_userdata = "" + internet_charge_type = "PayByTraffic" + internet_max_bandwidth_in = 10 + internet_max_bandwidth_out = 10 + node_taints = "" + node_labels = "" + } + + tidb_cluster_worker_groups = [ + { + name = "${var.cluster_name}-pd" + instance_type = var.pd_instance_type + min_size = var.pd_count + max_size = var.pd_count + node_taints = "dedicated=${var.cluster_name}-pd:NoSchedule" + node_labels = "dedicated=${var.cluster_name}-pd" + post_userdata = file("${path.module}/userdata.sh") + }, + { + name = "${var.cluster_name}-tikv" + instance_type = var.tikv_instance_type + min_size = var.tikv_count + max_size = var.tikv_count + node_taints = "dedicated=${var.cluster_name}-tikv:NoSchedule" + node_labels = "dedicated=${var.cluster_name}-tikv,pingcap.com/aliyun-local-ssd=true" + post_userdata = file("${path.module}/userdata.sh") + }, + { + name = "${var.cluster_name}-tidb" + instance_type = var.tidb_instance_type + min_size = var.tidb_count + max_size = var.tidb_count + node_taints = "dedicated=${var.cluster_name}-tidb:NoSchedule" + node_labels = "dedicated=${var.cluster_name}-tidb" + }, + { + name = "${var.cluster_name}-monitor" + instance_type = var.monitor_instance_type + min_size = 1 + max_size = 1 + } + ] +} \ No newline at end of file diff --git a/deploy/modules/aliyun/tidb-cluster/main.tf b/deploy/modules/aliyun/tidb-cluster/main.tf new file mode 100644 index 0000000000..b935db97aa --- /dev/null +++ b/deploy/modules/aliyun/tidb-cluster/main.tf @@ -0,0 +1,15 @@ +module "tidb-cluster" { + source = "../../share/tidb-cluster-release" + + cluster_name = var.cluster_name + cluster_version = var.tidb_version + pd_count = var.pd_count + tikv_count = var.tikv_count + tidb_count = var.tidb_count + tidb_cluster_chart_version = var.tidb_cluster_chart_version + override_values = var.override_values + local_exec_interpreter = var.local_exec_interpreter + base_values = file("${path.module}/values/default.yaml") + kubeconfig_filename = var.ack.kubeconfig_filename + service_ingress_key = "ip" +} diff --git a/deploy/modules/aliyun/tidb-cluster/outputs.tf b/deploy/modules/aliyun/tidb-cluster/outputs.tf new file mode 100644 index 0000000000..54257185b4 --- /dev/null +++ b/deploy/modules/aliyun/tidb-cluster/outputs.tf @@ -0,0 +1,15 @@ +output "tidb_hostname" { + value = module.tidb-cluster.tidb_hostname +} + +output "monitor_hostname" { + value = module.tidb-cluster.monitor_hostname +} + +output "tidb_endpoint" { + value = module.tidb-cluster.tidb_endpoint +} + +output "monitor_endpoint" { + value = module.tidb-cluster.monitor_endpoint +} \ No newline at end of file diff --git a/deploy/aliyun/ack/templates/user_data.sh.tpl b/deploy/modules/aliyun/tidb-cluster/templates/user_data.sh.tpl similarity index 100% rename from deploy/aliyun/ack/templates/user_data.sh.tpl rename to deploy/modules/aliyun/tidb-cluster/templates/user_data.sh.tpl diff --git a/deploy/aliyun/userdata/tikv-userdata.sh b/deploy/modules/aliyun/tidb-cluster/userdata.sh similarity index 100% rename from deploy/aliyun/userdata/tikv-userdata.sh rename to deploy/modules/aliyun/tidb-cluster/userdata.sh diff --git a/deploy/modules/aliyun/tidb-cluster/values/default.yaml b/deploy/modules/aliyun/tidb-cluster/values/default.yaml new file mode 100644 index 0000000000..193fe52884 --- /dev/null +++ b/deploy/modules/aliyun/tidb-cluster/values/default.yaml @@ -0,0 +1,33 @@ +# Basic customization for tidb-cluster chart that suits Alibaba Cloud environment +timezone: UTC + +pd: + logLevel: info + resources: + requests: + storage: 20Gi + storageClassName: alicloud-disk +tikv: + logLevel: info + storageClassName: local-volume + syncLog: true +tidb: + logLevel: info + service: + type: LoadBalancer + exposeStatus: true + annotations: + service.beta.kubernetes.io/alicloud-loadbalancer-address-type: intranet + service.beta.kubernetes.io/alicloud-loadbalancer-slb-network-type: vpc + +monitor: + storage: 100Gi + storageClassName: alicloud-disk-available + persistent: true + grafana: + config: + GF_AUTH_ANONYMOUS_ENABLED: "true" + service: + type: LoadBalancer + annotations: + service.beta.kubernetes.io/alicloud-loadbalancer-address-type: internet \ No newline at end of file diff --git a/deploy/modules/aliyun/tidb-cluster/variable.tf b/deploy/modules/aliyun/tidb-cluster/variable.tf new file mode 100644 index 0000000000..fb53b40a9b --- /dev/null +++ b/deploy/modules/aliyun/tidb-cluster/variable.tf @@ -0,0 +1,67 @@ +variable "ack" { + description = "The reference of the target ACK cluster" +} + +variable "cluster_name" { + description = "The TiDB cluster name" +} + +variable "image_id" { + default = "centos_7_06_64_20G_alibase_20190218.vhd" +} + +variable "tidb_version" { + description = "TiDB cluster version" + default = "v3.0.0" +} + +variable "tidb_cluster_chart_version" { + description = "tidb-cluster chart version" + default = "v1.0.0-beta.3" +} + +variable "pd_count" { + description = "PD instance count, the recommend value is 3" + default = 3 +} + +variable "pd_instance_type" { + description = "PD instance type" + default = "ecs.g5.large" +} + +variable "tikv_count" { + description = "TiKV instance count, ranges: [3, 100]" + default = 3 +} + +variable "tikv_instance_type" { + description = "TiKV instance memory in GB, must available in type family" + default = "ecs.i2.2xlarge" +} + +variable "tidb_count" { + description = "TiDB instance count, ranges: [1, 100]" + default = 2 +} + +variable "tidb_instance_type" { + description = "TiDB instance type" + default = "ecs.c5.4xlarge" +} + +variable "monitor_instance_type" { + description = "Monitor instance type" + default = "ecs.c5.xlarge" +} + +variable "override_values" { + type = string + default = "" +} + +variable "local_exec_interpreter" { + description = "Command to run for local-exec resources. Must be a shell-style interpreter. If you are on Windows Git Bash is a good choice." + type = list(string) + default = ["/bin/sh", "-c"] +} diff --git a/deploy/modules/aliyun/tidb-cluster/workers.tf b/deploy/modules/aliyun/tidb-cluster/workers.tf new file mode 100644 index 0000000000..304e817e9d --- /dev/null +++ b/deploy/modules/aliyun/tidb-cluster/workers.tf @@ -0,0 +1,91 @@ +data "template_file" "userdata" { + template = file("${path.module}/templates/user_data.sh.tpl") + count = length(local.tidb_cluster_worker_groups) + + vars = { + pre_userdata = lookup( + local.tidb_cluster_worker_groups[count.index], + "pre_userdata", + local.group_default.pre_userdata + ) + post_userdata = lookup( + local.tidb_cluster_worker_groups[count.index], + "post_userdata", + local.group_default.post_userdata + ) + open_api_token = var.ack.bootstrap_token + node_taints = lookup( + local.tidb_cluster_worker_groups[count.index], + "node_taints", + local.group_default.node_taints + ) + node_labels = lookup( + local.tidb_cluster_worker_groups[count.index], + "node_labels", + local.group_default.node_labels + ) + region = var.ack.region + } +} + +resource "alicloud_ess_scaling_group" "workers" { + count = length(local.tidb_cluster_worker_groups) + scaling_group_name = "${var.ack.cluster_name}-${lookup(local.tidb_cluster_worker_groups[count.index], "name", count.index)}" + vswitch_ids = var.ack.vswitch_ids + min_size = lookup( + local.tidb_cluster_worker_groups[count.index], + "min_size", + local.group_default.min_size, + ) + max_size = lookup( + local.tidb_cluster_worker_groups[count.index], + "max_size", + local.group_default.max_size, + ) + default_cooldown = lookup( + local.tidb_cluster_worker_groups[count.index], + "default_cooldown", + local.group_default["default_cooldown"] + ) + multi_az_policy = "BALANCE" + + removal_policies = [ + "OldestScalingConfiguration", + "NewestInstance", + ] + + lifecycle { + ignore_changes = [vswitch_ids] + } +} + +# Create the cooresponding auto-scaling configurations +resource "alicloud_ess_scaling_configuration" "workers" { + count = length(local.tidb_cluster_worker_groups) + scaling_group_id = element(alicloud_ess_scaling_group.workers.*.id, count.index) + instance_type = local.tidb_cluster_worker_groups[count.index].instance_type + image_id = var.image_id + security_group_id = var.ack.security_group_id + key_name = var.ack.key_name + instance_name = local.tidb_cluster_worker_groups[count.index].name + user_data = element(data.template_file.userdata.*.rendered, count.index) + system_disk_category = lookup(local.tidb_cluster_worker_groups[count.index], "system_disk_category", local.group_default["system_disk_category"]) + system_disk_size = lookup(local.tidb_cluster_worker_groups[count.index], "system_disk_size", local.group_default["system_disk_size"]) + internet_charge_type = lookup(local.tidb_cluster_worker_groups[count.index], "internet_charge_type", local.group_default["internet_charge_type"]) + internet_max_bandwidth_in = lookup(local.tidb_cluster_worker_groups[count.index], "internet_max_bandwidth_in", local.group_default["internet_max_bandwidth_in"]) + internet_max_bandwidth_out = lookup(local.tidb_cluster_worker_groups[count.index], "internet_max_bandwidth_out", local.group_default["internet_max_bandwidth_out"]) + + enable = true + active = true + force_delete = true + + tags = { + name = "${var.ack.cluster_name}-${lookup(local.tidb_cluster_worker_groups[count.index], "name", count.index)}-ack_asg" + "kubernetes.io/cluster/${var.ack.cluster_name}" = "owned" + "k8s.io/cluster-autoscaler/${var.ack.cluster_name}" = "default" + } + + lifecycle { + ignore_changes = [instance_type] + } +} \ No newline at end of file diff --git a/deploy/modules/aliyun/tidb-operator/data.tf b/deploy/modules/aliyun/tidb-operator/data.tf new file mode 100644 index 0000000000..ea9041ed24 --- /dev/null +++ b/deploy/modules/aliyun/tidb-operator/data.tf @@ -0,0 +1,35 @@ +data "alicloud_zones" "all" { + network_type = "Vpc" +} + +data "alicloud_vswitches" "default" { + vpc_id = var.vpc_id +} + +data "alicloud_instance_types" "default" { + availability_zone = data.alicloud_zones.all.zones[0]["id"] + cpu_core_count = var.default_worker_cpu_core_count +} + +# Workaround map to list transformation, see stackoverflow.com/questions/43893295 +data "template_file" "vswitch_id" { + count = var.vpc_id == "" ? 0 : length(data.alicloud_vswitches.default.vswitches) + template = data.alicloud_vswitches.default.vswitches[count.index]["id"] +} + +# Get cluster bootstrap token +data "external" "token" { + depends_on = [alicloud_cs_managed_kubernetes.k8s] + + # Terraform use map[string]string to unmarshal the result, transform the json to conform + program = ["bash", "-c", "aliyun --region ${var.region} cs POST /clusters/${alicloud_cs_managed_kubernetes.k8s.id}/token --body '{\"is_permanently\": true}' | jq \"{token: .token}\""] +} + +data "template_file" "local-volume-provisioner" { + template = file("${path.module}/templates/local-volume-provisioner.yaml.tpl") + + vars = { + access_key_id = var.access_key + access_key_secret = var.secret_key + } +} diff --git a/deploy/modules/aliyun/tidb-operator/main.tf b/deploy/modules/aliyun/tidb-operator/main.tf new file mode 100644 index 0000000000..51e85e6b61 --- /dev/null +++ b/deploy/modules/aliyun/tidb-operator/main.tf @@ -0,0 +1,85 @@ +# Alicloud ACK module launches a managed kubernetes cluster +resource "alicloud_key_pair" "default" { + count = var.key_pair_name == "" ? 1 : 0 + key_name_prefix = "${var.cluster_name}-key" + key_file = var.key_file != "" ? var.key_file : format("%s/%s-key", path.module, var.cluster_name) +} + +# If there is not specifying vpc_id, create a new one +resource "alicloud_vpc" "vpc" { + count = var.vpc_id == "" ? 1 : 0 + cidr_block = var.vpc_cidr + name = "${var.cluster_name}-vpc" + + lifecycle { + ignore_changes = [cidr_block] + } +} + +# For new vpc or existing vpc with no vswitches, create vswitch for each zone +resource "alicloud_vswitch" "all" { + count = var.vpc_id != "" && length(data.alicloud_vswitches.default.vswitches) != 0 ? 0 : length(data.alicloud_zones.all.zones) + vpc_id = alicloud_vpc.vpc[0].id + cidr_block = cidrsubnet( + alicloud_vpc.vpc[0].cidr_block, + var.vpc_cidr_newbits, + count.index, + ) + availability_zone = data.alicloud_zones.all.zones[count.index % length(data.alicloud_zones.all.zones)]["id"] + name = format("vsw-%s-%d", var.cluster_name, count.index + 1) +} + +resource "alicloud_security_group" "group" { + count = var.group_id == "" ? 1 : 0 + name = "${var.cluster_name}-sg" + vpc_id = var.vpc_id != "" ? var.vpc_id : alicloud_vpc.vpc[0].id + description = "Security group for ACK worker nodes" +} + +# Allow traffic inside VPC +resource "alicloud_security_group_rule" "cluster_worker_ingress" { + count = var.group_id == "" ? 1 : 0 + security_group_id = alicloud_security_group.group[0].id + type = "ingress" + ip_protocol = "all" + nic_type = "intranet" + port_range = "-1/-1" + cidr_ip = var.vpc_id != "" ? var.vpc_cidr : alicloud_vpc.vpc[0].cidr_block +} + +# Create a managed Kubernetes cluster +resource "alicloud_cs_managed_kubernetes" "k8s" { + name = var.cluster_name + + // split and join: workaround for terraform's limitation of conditional list choice, similarly hereinafter + vswitch_ids = [ + element( + split( + ",", + var.vpc_id != "" && length(data.alicloud_vswitches.default.vswitches) != 0 ? join(",", data.template_file.vswitch_id.*.rendered) : join(",", alicloud_vswitch.all.*.id), + ), + 0, + )] + key_name = alicloud_key_pair.default[0].key_name + pod_cidr = var.k8s_pod_cidr + service_cidr = var.k8s_service_cidr + new_nat_gateway = var.create_nat_gateway + cluster_network_type = var.cluster_network_type + slb_internet_enabled = var.public_apiserver + kube_config = var.kubeconfig_file != "" ? var.kubeconfig_file : format("%s/kubeconfig", path.cwd) + worker_numbers = [var.default_worker_count] + worker_instance_types = [var.default_worker_type != "" ? var.default_worker_type : data.alicloud_instance_types.default.instance_types[0].id] + + # These varialbes are 'ForceNew' that will cause kubernetes cluster re-creation + # on variable change, so we make all these variables immutable in favor of safety. + lifecycle { + ignore_changes = [ + vswitch_ids, + worker_instance_types, + key_name, + pod_cidr, + service_cidr, + cluster_network_type, + ] + } +} diff --git a/deploy/modules/aliyun/tidb-operator/manifest/alicloud-disk-storageclass.yaml b/deploy/modules/aliyun/tidb-operator/manifest/alicloud-disk-storageclass.yaml new file mode 100644 index 0000000000..efd262544e --- /dev/null +++ b/deploy/modules/aliyun/tidb-operator/manifest/alicloud-disk-storageclass.yaml @@ -0,0 +1,9 @@ +apiVersion: storage.k8s.io/v1 +kind: StorageClass +metadata: + name: alicloud-disk +parameters: + type: available +provisioner: alicloud/disk +reclaimPolicy: Retain +volumeBindingMode: WaitForFirstConsumer diff --git a/deploy/modules/aliyun/tidb-operator/operator.tf b/deploy/modules/aliyun/tidb-operator/operator.tf new file mode 100644 index 0000000000..670e83b109 --- /dev/null +++ b/deploy/modules/aliyun/tidb-operator/operator.tf @@ -0,0 +1,64 @@ +# Hack, instruct terraform that the kubeconfig_filename is only available until the k8s created +data "template_file" "kubeconfig_filename" { + template = var.kubeconfig_file + vars = { + kubernetes_depedency = alicloud_cs_managed_kubernetes.k8s.client_cert + } +} + +provider "helm" { + alias = "initial" + insecure = true + install_tiller = false + kubernetes { + config_path = data.template_file.kubeconfig_filename.rendered + } +} + + +resource "null_resource" "setup-env" { + depends_on = [data.template_file.kubeconfig_filename] + + provisioner "local-exec" { + working_dir = path.cwd + # Note for the patch command: ACK has a toleration issue with the pre-deployed flexvolume daemonset, we have to patch + # it manually and the resource namespace & name are hard-coded by convention + command = <