Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modularization for AWS terraform scripts #650

Merged
merged 12 commits into from
Jul 17, 2019
Merged

Conversation

aylei
Copy link
Contributor

@aylei aylei commented Jul 10, 2019

What problem does this PR solve?

close #628

What is changed and how does it work?

As the issue stated, this PR separate the modules of AWS terraform scripts more clearly and suggest the best practice to manage multiple environments.

The directory name and UX of aws is not changed in favor of consistence, but it should be easy for uses to create a new directory and write some terraform scripts from scratch following the README.

This PR also introduce two minor features based on the community feedbacks:

  • Now bastion machine can be configured to ssh to worker nodes, and this is enabled by default;
  • tidb-cluster can be configured to not create the real helm release (means only provision worker nodes), which is useful if users want to manage helm on their own;

Check List

Tests

Not yet

Side effects

  • Breaking backward compatibility

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation
    • docs-cn
    • docs

Does this PR introduce a user-facing change?:

No

- fix hardcode chart version
- fix unreachable bastion
- adjust tidb-cluster no-wait to avoid scheduled backup pvc blocking
- add version check
- disable pd node public_ip

Signed-off-by: Aylei <rayingecho@gmail.com>
Signed-off-by: Aylei <rayingecho@gmail.com>
@aylei aylei requested a review from gregwebs July 10, 2019 17:14
@@ -1,5 +1,5 @@
resource "null_resource" "wait-tiller-ready" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module is separated in favor of reusing across cloud providers

Signed-off-by: Aylei <rayingecho@gmail.com>
@gregwebs
Copy link
Contributor

Looks like there are some conflicts.

@gregwebs
Copy link
Contributor

This looks great! However, I think putting this in a separate advanced section will end up with poor results. Once someone edits the module directly they will have multiple problems

  • It will be difficult to start a new environment because direct changes have already been made to the module.
  • It is very difficult (I don't feel comfortable doing this, but in theory it is possible with terraform import) to transition that existing setup into an environment setup

Consider direct edits to be like distributing a python library and telling users to edit constants in the source code rather than using it as a library and making function calls.

Its a lot better to make a clean break from the past since we just created this and only have users using it in a testing phase right now.

@gregwebs
Copy link
Contributor

Following on the above comment, the way to make it easy to use it to start with using as modules is to make an existing environment called "default" that is focused on ease of use rather than flexibility.

@gregwebs
Copy link
Contributor

The default environment shouldn't need to be much more than

module "default" {
  source = "../"
}

@aylei
Copy link
Contributor Author

aylei commented Jul 11, 2019

@gregwebs Thanks for your thoughtful suggestions!

The reason why I keep the deploy/aws directory "as is" is to not break users.

And I think we have to reach a consensus about the purpose of the deploy/aws directory.

In this PR, I consider it as an example of how to compose the reusable modules located in the deploy/modules directory rather than a reusable module itself. Because It is hard to write a single module that provisions multiple tidb-clusters.

It is possible to write a reusable module that provisions VPC, bastion, key-pair and EKS, which effectively composes these four modules. This will reduce the code length to start a new environment, but I suppose composing these modules should require little effort just as the README stated.

So, the first problem you mentioned is now addressed: users don't call the deploy/aws module, they call the modules in the deploy/modules dir instead.

Another problem you mentioned is transition, actually they don't need, because the deploy/aws is already an individual environment. It is parallel to the dirs created by users, e.g. deploy/aws-staging, deploy/aws-production.

@aylei
Copy link
Contributor Author

aylei commented Jul 11, 2019

The default environment shouldn't need to be much more than

module "default" {
  source = "../"
}

IIUC, this example is a different UX for users. There are two UXs that may make sense:

  1. provide a "big", highly customizable module, users call this module like the code example above;
  2. provide several customizable small modules as shown in the PR, users call these modules like the code example in the README.md

I suppose both of them has pros and cons.

Comparing to UX-2, UX-1 has:

  • Pros:
    • Easy to use
    • Less error-prone because of proper encapsulation
  • Cons:
    • Inflexible, all the customization possibilities should be exposed via module variables
    • (things I concern) requires too much developing efforts than we can afford (especially, provisioning multiple TiDB clusters in a single module)

Actually I don't bias on either method technically, but I do prefer the UX-2 in favor of the overall ROI.

@aylei
Copy link
Contributor Author

aylei commented Jul 11, 2019

@tennix Could you please take a look?

@aylei aylei requested a review from tennix July 11, 2019 03:59
@aylei
Copy link
Contributor Author

aylei commented Jul 11, 2019

However, I think putting this in a separate advanced section will end up with poor results. Once someone edits the module directly they will have multiple problems.

Yes, I agree we should emphasize this section for the users who need multiple environment management. I will try to refine this README

Signed-off-by: Aylei <rayingecho@gmail.com>
@aylei
Copy link
Contributor Author

aylei commented Jul 11, 2019

Having said that, I will try to investigate whether I can build a monolithic reusable module in an acceptable amount of development

@tennix
Copy link
Member

tennix commented Jul 11, 2019

The quick-start guide should be easy enough, so users can really quick start. In most cases, this setup would be for testing. After they get a glimpse of what it looks like, they will try to deploy a production environment. And the quick-start setup may be wiped out. So I agree with @aylei to keep the quick-start as it is and document the best practice guide of how to manage production environment in a separate section.

@gregwebs
Copy link
Contributor

okay, I get your point, and I agree with it. Things are well organized now.
But I would still move what is in deploy/aws into an environment directory. I think this update will already break existing users.

gregwebs
gregwebs previously approved these changes Jul 12, 2019
Signed-off-by: Aylei <rayingecho@gmail.com>
@aylei
Copy link
Contributor Author

aylei commented Jul 14, 2019

@gregwebs I've addressed some merge conflicts, could you please take a look again?

And, I've investigate if we can approach to the structure you suggested, and I find that it is possible in the future! Once the hashicorp/terraform#17179 addressed, we can define multiple tidb clusters by a single map variable and things won't break when some of the clusters are removed (which is impossible now because currently the loop of resources is index based, which means we cannot remove element in the middle of the list).

@aylei aylei requested a review from gregwebs July 14, 2019 15:39
@aylei
Copy link
Contributor Author

aylei commented Jul 14, 2019

@tennix PTAL

gregwebs
gregwebs previously approved these changes Jul 14, 2019
@aylei
Copy link
Contributor Author

aylei commented Jul 15, 2019

/run-e2e-tests

@aylei
Copy link
Contributor Author

aylei commented Jul 16, 2019

/run-e2e-tests

deploy/modules/aws/tidb-operator/README.md Outdated Show resolved Hide resolved
deploy/modules/aws/tidb-operator/README.md Outdated Show resolved Hide resolved
default = "c5.2xlarge"
}

variable "override_values" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the path to values.yaml file? It's better add a description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this is the content of values.yaml, it is recommend to provide the content by the file() function call from a local file

data "external" "tidb_hostname" {
depends_on = [helm_release.tidb-cluster]
working_dir = path.cwd
program = ["bash", "-c", "kubectl --kubeconfig ${var.kubeconfig_filename} get svc -n ${var.cluster_name} ${var.cluster_name}-tidb -o json | jq '.status.loadBalancer.ingress[0]'"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this read the right kubeconfig file when in a separated module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the kubeconfig_filename should be the relative path to path.cwd by convention, and this is now guaranteed by all the modules we provide

deploy/modules/aws/tidb-cluster/cluster.tf Outdated Show resolved Hide resolved
tidb_count = var.tidb_count
tidb_cluster_chart_version = var.tidb_cluster_chart_version
override_values = var.override_values
local_exec_interpreter = var.local_exec_interpreter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this work if a user does not provide local_exec_interpreter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a reasonable default ["bash", "-c"]


variable "enable_ssh_to_workers" {
description = "Whether enable ssh from bastion to workers, if true, the worker_security_group_id must be provided"
default = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we enable this by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the worker_security_group_id must be provided with this enabled, which can not have a default value. So user will be forced to provide the id or disable this explicitly if we enable this by default.

deploy/modules/aws/bastion/bastion.tf Outdated Show resolved Hide resolved
deploy/.gitignore Outdated Show resolved Hide resolved
Co-Authored-By: Tennix <tennix@users.noreply.github.com>
Signed-off-by: Aylei <rayingecho@gmail.com>
Signed-off-by: Aylei <rayingecho@gmail.com>
@aylei
Copy link
Contributor Author

aylei commented Jul 16, 2019

@tennix I've addressed some of the review comments and add comments for others, PTAL again

@aylei
Copy link
Contributor Author

aylei commented Jul 16, 2019

/run-e2e-tests

Copy link
Member

@tennix tennix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aylei
Copy link
Contributor Author

aylei commented Jul 16, 2019

@gregwebs I've addressed some comments from tennix, could you please take a look again (again🤣)? Thanks!

Copy link
Contributor

@onlymellb onlymellb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tennix tennix merged commit aa62176 into pingcap:master Jul 17, 2019
weekface pushed a commit that referenced this pull request Jul 29, 2019
* Small fixes for terraform aws
- fix hardcode chart version
- fix unreachable bastion
- adjust tidb-cluster no-wait to avoid scheduled backup pvc blocking
- add version check
- disable pd node public_ip

Signed-off-by: Aylei <rayingecho@gmail.com>

* Modularization for AWS terraform scripts

Signed-off-by: Aylei <rayingecho@gmail.com>

* Refine README of AWS terraform

Signed-off-by: Aylei <rayingecho@gmail.com>

* Fix bastion ssh

Signed-off-by: Aylei <rayingecho@gmail.com>

* Rephrase section title

Signed-off-by: Aylei <rayingecho@gmail.com>

* Update deploy/modules/aws/tidb-operator/README.md

Co-Authored-By: Tennix <tennix@users.noreply.github.com>

* Address review comments

Signed-off-by: Aylei <rayingecho@gmail.com>
(cherry picked from commit aa62176)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS terraform default to multiple environments
4 participants