Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper management of aws auth config map #157

Conversation

sebastianmacarescu
Copy link

what

  • Use newer kubernetes_config_map_v1_data to force management of config map from a single place and have field ownership
  • Implement self reference as described here: Unable to add additional iam roles to cluster #155 (comment) in order to detect if any iam roles were removed from terraform config
  • Preserve any existing config map setting that was added outside terraform; basically terraform will manage only what is passed in variables

why

Mainly the reasons from #155.

references

*closes #155

@sebastianmacarescu sebastianmacarescu requested review from a team as code owners July 20, 2022 12:36
@Gowiem Gowiem requested a review from Nuru July 20, 2022 18:06
@Gowiem
Copy link
Member

Gowiem commented Jul 20, 2022

Pulling in @Nuru since he's got strong opinions in this area of the code (and rightfully so 👍)

Copy link
Contributor

@Nuru Nuru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sebastianmacarescu Thank you very much for this PR.

My apologies if I said something misleading, but I really want to avoid using remote state in this module. My hope was that the kubernetes_config_map_v1_data resource would remove the need for referencing remote state by merging the EKS created roles with the customer roles.

Let's discuss this in Slack.

auth.tf Outdated
Comment on lines 195 to 241
force = true

# This fails because the eks managed node group changes the ownership of the mapRoles in configMap if deployed separately
# vpcLambda is the sole owner of it so apply again resets it
# Force true will overwrite the node group role
//force = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain this further.

@mergify
Copy link

mergify bot commented Jul 25, 2022

This pull request is now in conflict. Could you fix it @sebastianmacarescu? 🙏

@sebastianmacarescu sebastianmacarescu force-pushed the feature/manage-aws-auth-config-map branch 2 times, most recently from 69ad4a0 to 084ffbb Compare August 26, 2022 13:20
@sebastianmacarescu
Copy link
Author

@Nuru I have refactored this:

  • removed kubernetes_config_map_v1_data
  • remove the state data source - this failed on first terraform apply as the tf state didn't exist yet and the data source must reference existing stuff
  • save the previous terraform input in AWS SSM Parameter and read it on apply to detect if any roles must be removed

This seems to work a lot better and is a lot cleaner.

As for this issue: hashicorp/terraform-provider-kubernetes#1671 (comment)
"A aidanmelen/terraform-aws-eks-auth#14 will not automatically create an aws-auth configmap." I didn't manage to replicate it, my data source successfully reads the aws config map. So I suspect AWS automatically creates one.

@stecullum
Copy link

Are there plans to merge this?

@Nuru
Copy link
Contributor

Nuru commented Oct 15, 2022

@stevec-aztech This was completely on hold pending resolution of hashicorp/terraform-provider-aws#25335

@sebastianmacarescu I really do appreciate the effort you have put into this. How much and what kind of testing have you done on it? Just looking at it makes me scared, particularly with the regular expressions, as those have a way of causing hard-to-find bugs and security vulnerabilities.

What size are the strings you have been storing in parameter store? I would like to know where the breaking point is, as some people have added a lot of roles to their auth map.

Plus I'm always worried about the destroy behavior, as problems in destroying the cluster keep cropping up.

I really am looking forward to AWS giving us a proper API for the auth map and we could end all these workarounds.

@sebastianmacarescu
Copy link
Author

So I have spent some time to realistically generate some big map_aditional_iam_roles for the config map as this is the only stuff I store in AWS Paramter Store.

The maximum size of the parameter value for Advanced Tier in AWS SSM Parameter Store is 8KB as can be seen HERE.

The current implementation allows for around 30 roles. But we can use base64gzip on the json data and we will get to around 60 roles. Plus we can get rid of those regexes and be a lot safer.

For who wants to play the python code to generate a random dataset (really long values for the iam role name, group names and usernames that most won't have) is:

import string
import random
import json
import gzip
import base64
from sys import getsizeof

def generate_role():
	# Role name between 30 and 64 chars
	account_id = ''.join(random.choices(string.digits, k=12))
	role_name = ''.join(random.choices(string.ascii_lowercase, k=random.randint(30, 64)))
	arn = f"arn:aws:iam::{account_id}/role/{role_name}"
	return arn

def generate_user():
	return ''.join(random.choices(string.ascii_lowercase, k=random.randint(10, 36)))

def generate_groups():
	return random.choices([''.join(random.choices(string.ascii_lowercase, k=random.randint(4, 20))) for i in range(10)], k=random.randint(1, 8))

data = [
	{
		"groups": generate_groups(),
		"rolearn": generate_role(),
		"username": generate_user()
	}
	for i in range(60)
]
json_str = json.dumps(data)

print("\nInitial data\n---------------")
# print(f"Data {json_str}")
print(f"Number of chars {len(json_str)}")
print(f"Size in bytes 	{getsizeof(json_str)}")
# print(f"Number KB (from chars) {len(json_str) / 1000}")

compressed = gzip.compress(json_str.encode('utf-8'))
encoded = base64.b64encode(compressed).decode('utf-8')

print("\nGzip compression\n---------------")
# print(f"Gzip {compressed}")
print(f"Number of chars {len(compressed)}")
print(f"Size in bytes 	{getsizeof(compressed)}")

# Emulate https://www.terraform.io/language/functions/base64gzip
print("\nB64 encoding of gzip\n---------------")
# print(f"B64 encoded of gzip {encoded}")
print(f"Number of chars {len(encoded)}")
print(f"Size in bytes 	{getsizeof(encoded)}")

Some output:

Initial data
---------------
Number of chars 12881
Size in bytes 	12930

Gzip compression
---------------
Number of chars 5772
Size in bytes 	5805

B64 encoding of gzip
---------------
Number of chars 7696
Size in bytes 	7745
Initial data
---------------
Number of chars 12515
Size in bytes 	12564

Gzip compression
---------------
Number of chars 5651
Size in bytes 	5684

B64 encoding of gzip
---------------
Number of chars 7536
Size in bytes 	7585

Also I'm running the current code in production EKS cluster and works just fine as it is, with around 8 aditional iam roles in the config map.

If you think this can be merged I can simplify the code and use base64gzip to eliminate those hard to understand regexes, plus increasing the number of extra iam roles. Also I can add a note in the doumentation and possibly terraform parameter validation.

@Nuru
Copy link
Contributor

Nuru commented Oct 15, 2022

@sebastianmacarescu Thank you for this extra effort.

If you think this can be merged I can simplify the code and use base64gzip to eliminate those hard to understand regexes, plus increasing the number of extra iam roles. Also I can add a note in the doumentation and possibly terraform parameter validation.

We have other issues with using remote state triggering Terraform bugs, so yes, if you can use base64gzip and simplify the code, I think I can work it in with some other stuff I'm doing this weekend. I may need to make some changes to make the SSM parameter an optional solution (allow people to keep using the current "remote state" solution). You certainly deserve credit and we can give you credit in the release notes, but if it turns out to be easier, would it be OK with you if I copy your work into a new PR I am working on and close this as superseded by that? If that is not OK, I totally understand, and it would not be much extra work for me, just checking on how you feel about it.

Also, we want to change the version pinning to be >= 3.38, !=4.18.0, !=4.19.0 ..., !=4.32.0

@sebastianmacarescu
Copy link
Author

Hi @Nuru ,
I will update the code to remove the regexes together with the new version constraint.
I would keep the work in this PR and separate release because to me it seems like a medium change that should not be mixed with other stuff features. Moreover it will be easier to understand the new changes when browsing the comments on the original issue here.
But if it's extra hard for you to integrate in a separate release feel free to integrate in your new PR.
Thank you :)

@sebastianmacarescu sebastianmacarescu force-pushed the feature/manage-aws-auth-config-map branch from 9e6e5e1 to cf8a404 Compare November 3, 2022 14:24
@sebastianmacarescu
Copy link
Author

sebastianmacarescu commented Nov 3, 2022

Hi @Nuru. Can you review the new changes?

  • removed those regexes
  • replaced parameter store with secrets manager due to bigger data size (8KB vs 64KB).
  • updated readme with some more edge cases on delete failures together with links to terraform eks provider bugs
  • rebased the work on the latest upstream release

This new implementation will allow us to store around 180 IAM roles specified in the terraform configuration. I have tested with both terraform create, update, add and remove iam roles from the config map and it works great.

Also there is no gunzip in terraform so storing the json string as gzip compressed base64 would not work: hashicorp/terraform#22568

sebastianmacarescu and others added 3 commits November 4, 2022 15:03
…tianmacarescu/terraform-aws-eks-cluster into feature/manage-aws-auth-config-map
@salemgolemugoo
Copy link

Guys, any progress on this one?

@Gowiem
Copy link
Member

Gowiem commented Oct 19, 2023

Since this went far on feedback and there is interest, friendly ping of @Nuru + @aknysh

@hans-d hans-d added the stale This PR has gone stale label Mar 5, 2024
@hans-d
Copy link

hans-d commented Mar 5, 2024

final ping for @Nuru , @aknysh and @Gowiem

Copy link

mergify bot commented Mar 11, 2024

This pull request now has conflicts. Could you fix it @sebastianmacarescu? 🙏

@mergify mergify bot added the conflict This PR has conflicts label Mar 11, 2024
@mergify mergify bot closed this Mar 11, 2024
Copy link

mergify bot commented Mar 11, 2024

This PR has been closed due to inactivity and merge conflicts.
Please resolve the conflicts and reopen if necessary.

Copy link

mergify bot commented Mar 11, 2024

Thanks @sebastianmacarescu for creating this pull request!

A maintainer will review your changes shortly. Please don't be discouraged if it takes a while.

While you wait, make sure to review our contributor guidelines.

Tip

Need help or want to ask for a PR review to be expedited?

Join us on Slack in the #pr-reviews channel.

@mergify mergify bot removed the conflict This PR has conflicts label Mar 11, 2024
@mergify mergify bot removed the stale This PR has gone stale label Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to add additional iam roles to cluster
7 participants