Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait cluster responsive #639

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ project adheres to [Semantic Versioning](http://semver.org/).

## [[v8.?.?](https://github.com/terraform-aws-modules/terraform-aws-eks/compare/v7.0.0...HEAD)] - 2019-??-??]

- Wait for cluster to respond to kubectl before applying auth map_config (@shaunc)
- Added flag `create_eks` to conditionally create resources (by @syst0m / @tbeijen)
- Support for AWS EKS Managed Node Groups. (by @wmorgan6796)
- Added a if check on `aws-auth` configmap when `map_roles` is empty (by @shanmugakarna)
Expand Down
3 changes: 2 additions & 1 deletion aws_auth.tf
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,8 @@ data "template_file" "node_group_arns" {
}

resource "kubernetes_config_map" "aws_auth" {
count = var.create_eks && var.manage_aws_auth ? 1 : 0
depends_on = [aws_eks_cluster.this]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this explicit depends_on required? The provider doesn't work before the cluster's creation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on how the kubernetes provider is set up(?) I wasn't sure it wasn't impossible make dependence on cluster opaque to terraform, depending on how the provider is setup, so thought explicit dependency would be useful to avoid bug in edge case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very true. It's not necessarily implicit from other variables.

count = var.create_eks && var.manage_aws_auth ? 1 : 0

metadata {
name = "aws-auth"
Expand Down
5 changes: 5 additions & 0 deletions cluster.tf
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,11 @@ resource "aws_eks_cluster" "this" {
aws_iam_role_policy_attachment.cluster_AmazonEKSServicePolicy,
aws_cloudwatch_log_group.this
]
provisioner "local-exec" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance to avoid local-exec ? The http provider doesn't work ? We wanted to be agnostic from the OS. Using the kubernetes provider was a huge improvement in that way and would like to avoid going back to local-exec with curl if possible.

Copy link
Contributor Author

@shaunc shaunc Dec 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ... missed that part of your comment; sorry! Hmm... I'd worry about three things: (1) That the http provider would sometimes timeout, and it doesn't provide any control over timeouts, (2) when not combined with the cluster resource itself, terraform could schedule other resources that depend on the cluster before its ready, and outside the module, its a bit opaque to know what to do (though it could be documented). (3) Doc says they verify chain of trust for https, and certificate is likely to be unchained.

I guess (2) could be possibly solved by having (say) cluster_id depend on an expression involving some output from http resource. (3) just testing will confirm/deny :)....

vs (1) -- I'm a terraform newbie ... do you know anything about http data source timeout behavior? Would seem to be a risk to me, but I'll go ahead and try it if you guys want to go that way.

UPDATE: Unfortunately, test gives:

Error: Error during making a request: https://....sk1.us-west-2.eks.amazonaws.com/healthz

  on ../../cluster.tf line 40, in data "http" "wait_for_cluster":
  40: data "http" "wait_for_cluster" {

Doesn't say if failed because not up yet, or if failed because insecure.

Hmm... could the local-exec provisioner branch by OS and appropriately install curl? Or perhaps it should use a tool ("curl" or alternative appropriate for the OS) passed in in a variable (including no-op)? Your call -- I can give it a whirl if not too complex.

Copy link
Contributor Author

@shaunc shaunc Dec 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most general would be to have variable for command be a template required to embed (say) ${URL}. I guess the shell varies by platform, but I guess this type of variable substitution should be broadly compatible? I'm not sure if the "until" loop is available "everywhere"? In any case, AFAIK the options are to code up a custom provider (or patch the http provider, to accept options for retry on error, and for insecure https), or to try to make local-exec be as cross-platform as possible.

UPDATE ... If you are going to patch/create providers, then perhaps changing the kubernetes provider to wait until endpoint is up would also be an option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hen perhaps changing the kubernetes provider to wait until endpoint is up would also be an option.

hashicorp/terraform-provider-kubernetes#96

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what we should do now but open to opinions. Either merge with a new local-exec or think of something else 🙂

Copy link
Contributor Author

@shaunc shaunc Dec 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think easiest way to fix something without upstream changes would be to use local-exec, but make it configurable for cross-platform support. Do you have an idea how you'd like to configure it? Do you want me to propose something? (Adding variables for configuration -- what program to use for "curl", for instance, or possibly whether to install curl appropriate for platform.)?

Together with this you could open ticket(s) for upstream change(s) -- for instance if the kubernetes or the http provider had a "wait for cluster"/"wait for url" configuration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I'm reading from stackoverflow, it sounds like since powershell 3.O, wget and curl are aliases for Invoke-WebRequest http://support.moonpoint.com/os/windows/PowerShell/wget-curl.php. It means, we can probably use curl in both OS, but i'm not sure for arguments.

Any windows guy around ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we forget about Windows. It's proved to be too much effort to support both platforms natively in the past. Windows users can use Docker I think.

command = <<EOT
until curl -k ${aws_eks_cluster.this[0].endpoint}/healthz >/dev/null; do sleep 4; done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This with no limit is OK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I thought that overall terraform timeout would be by default how user would control)

EOT
}
}

resource "aws_security_group" "cluster" {
Expand Down
11 changes: 6 additions & 5 deletions versions.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,11 @@ terraform {
required_version = ">= 0.12.9"

required_providers {
aws = ">= 2.38.0"
local = ">= 1.2"
null = ">= 2.1"
template = ">= 2.1"
random = ">= 2.1"
aws = ">= 2.38.0"
local = ">= 1.2"
null = ">= 2.1"
template = ">= 2.1"
random = ">= 2.1"
kubernetes = ">= 1.6.2"
}
}