Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update hashicorp/terraform-provider-aws version #319

Closed
11 tasks done
ialidzhikov opened this issue Apr 19, 2021 · 6 comments
Closed
11 tasks done

Update hashicorp/terraform-provider-aws version #319

ialidzhikov opened this issue Apr 19, 2021 · 6 comments
Labels
kind/bug Bug platform/aws Amazon web services platform/infrastructure

Comments

@ialidzhikov
Copy link
Member

ialidzhikov commented Apr 19, 2021

/platform aws
/kind bug

We have the following issues reported in the upstream terraform-provider-aws:

They continuously reproduce on our environments and require manual ops effort to be fixed.
It seems that now both of the issues are fixed (hopefully) in 3.34.0 (see CHANGELOG). So we should update to terraform-provider-aws version >= 3.34.0 to adopt these fixes.

However terraform-provider-aws >= 3.29.1 requires additional permission that is currently not part of our recommended AWS IAM policy - see https://github.com/gardener/gardener-extension-provider-aws/blob/v1.22.2/docs/usage-as-end-user.md#permissions. The corresponding upstream change is hashicorp/terraform-provider-aws#5904 - it requires additional permission iam:ListRolePolicies to be able to successfully apply an aws_iam_role resource.

The required change to the AWS IAM policy:

diff --git a/docs/usage-as-end-user.md b/docs/usage-as-end-user.md
index 8e2bec76..eb94fe26 100644
--- a/docs/usage-as-end-user.md
+++ b/docs/usage-as-end-user.md
@@ -61,6 +61,7 @@ Please make sure that the provided credentials have the correct privileges. You
           "iam:GetRole",
           "iam:GetRolePolicy",
           "iam:ListPolicyVersions",
+          "iam:ListRolePolicies",
           "iam:ListAttachedRolePolicies",
           "iam:ListInstanceProfilesForRole",
           "iam:CreateInstanceProfile",

So before update to terraform-provider-aws >= 3.29.1 we need to update our recommended AWS IAM policy that is currently maintained in few separate places:

We should also treat this AWS IAM policy change as breaking one and we need to document in properly in the release notes.

As hashicorp/terraform-provider-aws is under active development, terraform-provider-aws@latest can potentially have other breaking changes. So that's why I propose to first update to 3.32.0 to tackle the breaking change in the AWS IAM policy. Then we can work on updating to terraform-provider-aws@latest which will include the above bug fixes.

Tasks related to g/terraformer (update to terraform-provider-aws@3.32.0):

Tasks related to g/terraformer (update to terraform-provider-aws version >= 3.34.0):

@gardener-robot gardener-robot added kind/bug Bug platform/aws Amazon web services platform/infrastructure labels Apr 19, 2021
@rfranzke
Copy link
Member

Thanks for opening the issue @ialidzhikov. As this is a breaking change for end-users, I'm thinking about having some fallback mechanism to the current provider plugin version in case the user's account does not include the required new permission. Do you think something like would be reasonable to prevent too many clusters from becoming stuck in a failed state, or would you rather go the "standard" way (operators announcing the change so that end-users can adapt until a certain time, and only afterwards the version is updated)?

@ialidzhikov
Copy link
Member Author

ialidzhikov commented Apr 19, 2021

Such fallback mechanism can be implemented using the operation in question - try to ListRolePolicies, if the operation fails assume that the current user does not have the new permission -> fallback to the old terraformer version.
My initial idea was to give end-user enough period of time after announcing the upcoming breaking change (at least 2-3 months). And proactively check that most of them already adapted their policies before we roll-out the new terraformer.

To be honest I am not sure. What would we gain from such fallback mechanism? I assume that there will be always folks that will miss the action required and will potentially update after the Infrastructure reconciliation is failing. I rather see complexity that will be added on top:

  • We will have to "maintain" 2 terraform-provider-aws versions
  • We have to make sure that the update from terraformer@legacy to terraformer@latest is always possible and works without any issues. You known that there are many restrictions coming into the game - don't skip minor version when you update the terraform version; update to the latest patch release of the current minor version, before you update to the next minor version, etc.

So I would rather vote for "announce the upcoming change; give enough period for adaptation; actively monitor the adaptation and roll-out when we are sure".

@rfranzke
Copy link
Member

Yeah, I agree, let's do it as usual. I was just brain-storming about potential other ways how to deal with this situation and wanted to get some feedback. The points you raised are very valid and there is no good reason to not follow the usual approach.

@vpnachev
Copy link
Member

Should we add an empty commit with a breaking release note that we are planning to upgrade the terraform-provider-aws in June/July/August and it requires additional permissions in the AWS IAM? Or/and we should send a notification to https://groups.google.com/g/gardener and https://kubernetes.slack.com/archives/CB57N0BFG?

@ialidzhikov
Copy link
Member Author

@ialidzhikov
Copy link
Member Author

I am closing this issue as all subtasks are completed (11/11).

/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Bug platform/aws Amazon web services platform/infrastructure
Projects
None yet
Development

No branches or pull requests

4 participants