Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add prow build clusters #830

Merged
merged 19 commits into from
May 26, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions infra/gcp/clusters/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
This directory contains Terraform cluster configurations for the various GCP
projects that the Kubernetes project maintains.

Each directory represents a GCP project. Each sub-directory of those represents
a GKE cluster configuration. We may template these into modules at some point,
but for now they are designed to be straight forward and verbose.
Each directory except `modules` represents a GCP project. Each sub-directory of
spiffxp marked this conversation as resolved.
Show resolved Hide resolved
those represents a GKE cluster configuration. Not everything is able to use the
modules yet due to differences in google provider version.

Prerequisites:
- Be a member of the k8s-infra-cluster-admins@kubernetes.io group.
Expand Down
4 changes: 2 additions & 2 deletions infra/gcp/clusters/kubernetes-public/prow-build-test/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -83,15 +83,15 @@ resource "google_service_account_iam_policy" "boskos_janitor_sa_iam" {
}

module "prow_build_test_cluster" {
source = "./k8s-infra-gke-cluster"
source = "../../modules/k8s-infra-gke-cluster"
spiffxp marked this conversation as resolved.
Show resolved Hide resolved
project_name = data.google_project.project.name
cluster_name = local.cluster_name
cluster_location = local.cluster_location
bigquery_location = local.bigquery_location
}

module "prow_build_test_nodepool" {
source = "./k8s-infra-gke-nodepool"
source = "../../modules/k8s-infra-gke-nodepool"
project_name = data.google_project.project.name
cluster_name = module.prow_build_test_cluster.cluster.name
location = module.prow_build_test_cluster.cluster.location
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,11 @@ resource "google_project_iam_member" "cluster_node_sa_monitoring_metricwriter" {
}

// BigQuery dataset for usage data
resource "google_bigquery_dataset" "usage_metering" {
// Workaround from https://github.com/hashicorp/terraform/issues/22544#issuecomment-582974372
// to set delete_contents_on_destroy to false if is_prod_cluster
// keep prod_ and test_ identical except for "unique to " comments
resource "google_bigquery_dataset" "prod_usage_metering" {
count = var.is_prod_cluster == "true" ? 1 : 0
dataset_id = replace("usage_metering_${var.cluster_name}", "-", "_")
project = var.project_name
description = "GKE Usage Metering for cluster '${var.cluster_name}'"
Expand All @@ -55,20 +59,141 @@ resource "google_bigquery_dataset" "usage_metering" {
}

// This restricts deletion of this dataset if there is data in it
// IMPORTANT: Should be true on test clusters
// unique to prod_usage_metering
delete_contents_on_destroy = false
}
resource "google_bigquery_dataset" "test_usage_metering" {
count = var.is_prod_cluster == "true" ? 0 : 1
dataset_id = replace("usage_metering_${var.cluster_name}", "-", "_")
project = var.project_name
description = "GKE Usage Metering for cluster '${var.cluster_name}'"
location = var.bigquery_location

access {
role = "OWNER"
special_group = "projectOwners"
}
access {
role = "WRITER"
user_by_email = google_service_account.cluster_node_sa.email
}

// This restricts deletion of this dataset if there is data in it
// unique to test_usage_metering
delete_contents_on_destroy = true
}

// Create GKE cluster, but with no node pools. Node pools can be provisioned below
resource "google_container_cluster" "cluster" {
// Workaround from https://github.com/hashicorp/terraform/issues/22544#issuecomment-582974372
// to set lifecycle.prevent_destroy to false if is_prod_cluster
// keep prod_ and test_ identical except for "unique to " comments
spiffxp marked this conversation as resolved.
Show resolved Hide resolved
resource "google_container_cluster" "prod_cluster" {
count = var.is_prod_cluster == "true" ? 1 : 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you toggle this on the same object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think going from not->prod with a bare terraform apply would nuke the test resources and create the prod resources.

I think going from prod->not would leave you with two copies of the resources actuated, even if terraform "forgot" about the prod resources.

I suspect you could get out of this and just have terraform start treating a resource differently if you used terraform state mv + terraform plan to verify there were no changes to actuate


name = var.cluster_name
location = var.cluster_location

provider = google-beta
project = var.project_name

// GKE clusters are critical objects and should not be destroyed
// IMPORTANT: should be false on test clusters
// unique to prod_cluster
lifecycle {
prevent_destroy = true
}

// Network config
network = "default"

// Start with a single node, because we're going to delete the default pool
initial_node_count = 1

// Removes the default node pool, so we can custom create them as separate
// objects
remove_default_node_pool = true

// Disable local and certificate auth
master_auth {
username = ""
password = ""

client_certificate_config {
issue_client_certificate = false
}
}

// Enable google-groups for RBAC
authenticator_groups_config {
security_group = "gke-security-groups@kubernetes.io"
}

// Enable workload identity for GCP IAM
workload_identity_config {
identity_namespace = "${var.project_name}.svc.id.goog"
}

// Enable Stackdriver Kubernetes Monitoring
logging_service = "logging.googleapis.com/kubernetes"
monitoring_service = "monitoring.googleapis.com/kubernetes"

// Set maintenance time
maintenance_policy {
daily_maintenance_window {
start_time = "11:00" // (in UTC), 03:00 PST
}
}

// Restrict master to Google IP space; use Cloud Shell to access
master_authorized_networks_config {
}

// Enable GKE Usage Metering
resource_usage_export_config {
enable_network_egress_metering = true
bigquery_destination {
dataset_id = google_bigquery_dataset.prod_usage_metering[0].dataset_id
}
}

// Enable GKE Network Policy
network_policy {
enabled = true
provider = "CALICO"
}

// Configure cluster addons
addons_config {
horizontal_pod_autoscaling {
disabled = false
}
http_load_balancing {
disabled = false
}
network_policy_config {
disabled = false
}
}

// Enable PodSecurityPolicy enforcement
pod_security_policy_config {
enabled = false // TODO: we should turn this on
}

// Enable VPA
vertical_pod_autoscaling {
enabled = true
}
}
resource "google_container_cluster" "test_cluster" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to understand this resource. Why not define a module test-k8s-infra-gke-cluster ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt like copy-pasting resource definitions between modules was more likely to fall out-of-sync than copy-pasting within the same module. Copy-paste is the only approach I can use for any resource whose lifecycle depends on a flag/environment, since terraform doesn't allow these values to be derived from variables.

count = var.is_prod_cluster == "true" ? 0 : 1

name = var.cluster_name
location = var.cluster_location

provider = google-beta
project = var.project_name

// unique to test_cluster
lifecycle {
prevent_destroy = false
}
Expand Down Expand Up @@ -122,7 +247,7 @@ resource "google_container_cluster" "cluster" {
resource_usage_export_config {
enable_network_egress_metering = true
bigquery_destination {
dataset_id = google_bigquery_dataset.usage_metering.dataset_id
dataset_id = google_bigquery_dataset.test_usage_metering[0].dataset_id
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,12 @@

output "cluster" {
description = "The cluster"
value = google_container_cluster.cluster
// Workaround from https://github.com/hashicorp/terraform/issues/22544#issuecomment-582974372
// This should be either test_cluster or prod_cluster
value = coalescelist(
google_container_cluster.test_cluster.*,
google_container_cluster.prod_cluster.*
)[0]
}

output "cluster_node_sa" {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,9 @@ variable "bigquery_location" {
description = "The bigquery specific location where the dataset should be created"
type = string
}

variable "is_prod_cluster" {
description = "If this is not a prod cluster it's safe to delete resources on destroy"
type = string
default = "false"
}
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
*/

terraform {
required_version = ">= 0.12.8"
required_version = "~> 0.12.20"
required_providers {
google = "~> 3.19.0"
google-beta = "~> 3.19.0"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
*/

terraform {
required_version = ">= 0.12.8"
required_version = "~> 0.12.20"
required_providers {
google = "~> 3.19.0"
google-beta = "~> 3.19.0"
Expand Down