-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: native Terraform test framework #28
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as spam.
This comment was marked as spam.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (9)
tests/test-harness/variable.tf (1)
1-11
: Add descriptions and validation blocks to improve maintainability.While the variable declarations are structurally correct, consider enhancing them with descriptions and validation blocks to improve maintainability and prevent invalid inputs.
Apply this diff to add descriptions and validation:
variable "ssm_document_name_from_test" { type = string + description = "Name of the SSM document to fetch for testing" + validation { + condition = length(var.ssm_document_name_from_test) > 0 + error_message = "SSM document name cannot be empty" + } } variable "iam_role_name_from_test" { type = string + description = "Name of the IAM role to fetch for testing" + validation { + condition = length(var.iam_role_name_from_test) > 0 + error_message = "IAM role name cannot be empty" + } } variable "instance_name" { type = string + description = "Name tag value of the EC2 instance to fetch for testing" + validation { + condition = length(var.instance_name) > 0 + error_message = "Instance name cannot be empty" + } }tests/setup/main.tf (4)
1-8
: Consider using a more flexible version constraint for the random provider.The current constraint
~> 3.0
only allows patch updates within the 3.0.x range. For test infrastructure, consider using>= 3.0, < 4.0
to catch compatibility issues early while still maintaining major version compatibility.required_providers { random = { source = "hashicorp/random" - version = "~> 3.0" + version = ">= 3.0, < 4.0" } }
10-13
: Add documentation to explain the purpose of this random number.The random number's purpose and its specific range (1-9999) should be documented. Consider adding a descriptive comment explaining how this value is used in the testing framework.
+# Generates a random suffix for resource naming in tests to prevent +# naming conflicts when running tests concurrently resource "random_integer" "random_number" { min = 1 max = 9999 }
15-18
: Consider using a more specific output name.Since this random number appears to be used for test resource naming, consider renaming the output to something more specific like
resource_name_suffix
to better indicate its purpose.-output "random_number" { +output "resource_name_suffix" { value = random_integer.random_number.result - description = "Random number between 1 and 9999" + description = "Random suffix used to prevent naming conflicts in test resources" }
1-18
: Consider adding a README to document the test infrastructure.Since this setup module is part of a larger testing framework that includes integration tests and test harnesses, consider adding a
README.md
to thetests/setup
directory. This would help other contributors understand:
- The role of this setup module in the testing framework
- How it interacts with other test components
- How to use it when writing new tests
Would you like me to help draft a README template for the test infrastructure?
tests/test-harness/main.tf (2)
1-10
: Consider adding upper version constraints for better predictability.While the lower bounds are well-defined, adding upper version constraints would help prevent unexpected breaking changes in future provider releases.
terraform { required_version = ">= 1.0" required_providers { aws = { source = "hashicorp/aws" - version = ">= 5.0" + version = ">= 5.0, < 6.0" } } }
1-25
: LGTM! The test harness structure is well-organized.The overall structure of the test harness is clean and follows Terraform best practices. It provides the necessary data sources for integration testing of the SSM agent module.
Consider documenting the expected test setup and prerequisites in a README.md file within the test-harness directory to help other contributors understand how to use this test framework.
tests/cpu-compatibility.tftest.hcl (1)
13-16
: Enhance test documentation.While the AWS documentation links are helpful, consider adding:
- Brief description of what aspects of instance types are being tested
- Test prerequisites or setup requirements
- Expected test outcomes
tests/main.tftest.hcl (1)
72-72
: Simplify root block device encryption checkAt line 72, you're accessing the
encrypted
attribute of the root block device usingtolist(data.aws_instance.from_test.root_block_device)[0].encrypted == true
. Ifroot_block_device
is already a list, you may not need to usetolist
. Also, consider adding a splat operator for better compatibility.You can simplify the condition as:
- condition = tolist(data.aws_instance.from_test.root_block_device)[0].encrypted == true + condition = data.aws_instance.from_test.root_block_device[0].encrypted == trueOr, if you want to ensure all root block devices are encrypted:
- condition = tolist(data.aws_instance.from_test.root_block_device)[0].encrypted == true + condition = alltrue([for device in data.aws_instance.from_test.root_block_device : device.encrypted])This makes the assertion clearer and ensures compatibility.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
tests/terratest/Gopkg.lock
is excluded by!**/*.lock
📒 Files selected for processing (5)
- tests/cpu-compatibility.tftest.hcl (1 hunks)
- tests/main.tftest.hcl (1 hunks)
- tests/setup/main.tf (1 hunks)
- tests/test-harness/main.tf (1 hunks)
- tests/test-harness/variable.tf (1 hunks)
🧰 Additional context used
🔇 Additional comments (5)
tests/test-harness/variable.tf (1)
1-11
: LGTM! Variables align with test framework requirements.The variables are correctly structured to support the integration tests by enabling dynamic data source lookups for SSM documents, IAM roles, and EC2 instances. This aligns well with the PR's objective of implementing a native Terraform test framework to verify component existence and connectivity.
Let's verify these variables are used correctly in the test harness:
✅ Verification successful
Variables are correctly utilized in corresponding data source blocks
The verification confirms that all three variables are properly used in their respective data source blocks in
tests/test-harness/main.tf
:
ssm_document_name_from_test
is used to look up an SSM documentiam_role_name_from_test
is used to look up an IAM roleinstance_name
is used in a filter to look up an EC2 instance by its Name tag🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the variables are used in data source blocks # Test: Search for data source blocks using these variables rg -A 5 'data\s+"aws_(ssm_document|iam_role|instance)"\s+"from_test"'Length of output: 713
tests/test-harness/main.tf (1)
20-25
: 🛠️ Refactor suggestionVerify instance uniqueness in the test environment.
The current instance lookup only filters by the "Name" tag, which could potentially return multiple instances if the tag value is not unique. Consider adding additional filters (like instance state) or verify that your test environment ensures tag uniqueness.
Consider adding additional filters for more precise instance selection:
data "aws_instance" "from_test" { filter { name = "tag:Name" values = [var.instance_name] } + filter { + name = "instance-state-name" + values = ["running"] + } }tests/cpu-compatibility.tftest.hcl (2)
100-103
:⚠️ Potential issueFix incorrect error message in graphics instance test.
The error message references
g3s.xlarge
but the test is usingg4dn.xlarge
.assert { condition = local.is_instance_compatible - error_message = "Expected instance type g3s.xlarge to be compatible with x86_64 architecture" + error_message = "Expected instance type g4dn.xlarge to be compatible with x86_64 architecture" }Likely invalid or redundant comment.
27-28
: Verify test dependencies implementation.The tests rely on
local.is_instance_compatible
andnull_resource.validate_instance_type
. Let's verify their implementation:Also applies to: 56-58
✅ Verification successful
Based on the results, I can now generate the final response since I have found the implementation of both dependencies.
Dependencies are properly implemented and serve their purpose
The test dependencies are correctly implemented in
main.tf
:
local.is_instance_compatible
is a boolean expression that validates architecture compatibility:
- For x86_64: Ensures instance type doesn't have 'g' in third position
- For arm64: Ensures instance type has 'g' in third position
null_resource.validate_instance_type
uses this condition in its lifecycle precondition blockThe test assertions align perfectly with the implementation logic, validating both valid and invalid scenarios for x86_64 and arm64 architectures.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for the implementation of test dependencies echo "Searching for is_instance_compatible implementation..." rg "is_instance_compatible" -A 5 echo "Searching for validate_instance_type resource..." ast-grep --pattern 'resource "null_resource" "validate_instance_type" { $$$ }'Length of output: 2365
tests/main.tftest.hcl (1)
19-19
:⚠️ Potential issueVerify
run.setup.random_number
is properly defined and accessibleAt lines 19 and 55, you're referencing
run.setup.random_number
to generate unique identifiers. Ensure that thesetup
run block providesrandom_number
as an output that can be accessed in subsequent runs. Ifrandom_number
is not outputted or properly exposed, this could lead to errors when trying to use it.To confirm that
random_number
is correctly outputted from thesetup
module, you can use the following shell script:This script searches for the 'run "setup"' block in
tests/main.tftest.hcl
and checks for the presence ofrandom_number
in the outputs.Also applies to: 55-55
✅ Verification successful
Let me gather more information about the setup module and its outputs.
Let me check the setup module content specifically since it wasn't shown in the previous output.
random_number
is properly defined and accessibleThe setup module correctly defines and exposes the
random_number
output:
- It creates a
random_integer
resource namedrandom_number
- The value is properly exposed through an output variable named
random_number
- The output can be accessed in subsequent runs using
run.setup.random_number
as done in the test file🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that 'random_number' is outputted from 'run.setup' # Test: Check if 'random_number' is available as an output in 'run.setup' ast-grep --pattern $'run "setup" {\n module {\n source = "./tests/setup"\n }\n}' -A5 tests/main.tftest.hcl | grep 'random_number'Length of output: 129
Script:
#!/bin/bash # Let's first check the content of the test setup module fd "setup" tests/ --type f --exec cat {} # Also check the main test file for the complete context cat tests/main.tftest.hclLength of output: 3292
Script:
#!/bin/bash # Check if setup.tf exists in tests/setup directory cat tests/setup/main.tf 2>/dev/null || echo "File not found" # Also check for any .tf files in the setup directory fd ".tf$" tests/setup/ --type f --exec cat {}Length of output: 721
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oycyc looks good, I left a couple of comments/thoughts.
I agree it's a simple module, and I don't think we should overload it with tests. I guess our main goal here is to establish a standardized testing framework and pipeline so it can be spread across other modules.
Also, I see that terratest was added in 2020, which definitely requires revisiting. Do you know if there was any discussion around it? Are we going to support both? Do we expect to replace it?
run "valid_x86_64_instance" { | ||
command = plan | ||
|
||
variables { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should improve the readability by moving this into a helper module where we can use loops and locals like this:
locals {
instances = [
{
instance_type = "t3.micro"
architecture = "x86_64"
expected = true
},
{
instance_type = "t4g.micro"
architecture = "arm64"
expected = true
},
{
instance_type = "t4g.micro"
architecture = "x86_64"
expected = false
},
{
instance_type = "t3.micro"
architecture = "arm64"
expected = false
},
{
instance_type = "g3s.xlarge"
architecture = "arm64"
expected = false
},
{
instance_type = "g4dn.xlarge"
architecture = "x86_64"
expected = true
}
]
}
Unfortunately, for_each is not supported yet for tests, but we can workaround that with run.module
block.
How does that sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this becomes interesting because if we have it in a module in the run.module
block, that module being called is actually going to be applied, whereas in .tftest
we can specify it's planning and we're asserting against the plan. This file is more meant to be trying out the unit tests since this we don't need to apply first to validate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use the setup
module just to build this map of instance types, architecture, and results, similar to how you generate a random ID for the test? In this case it's fine to apply, it won't create any AWS resources.
TBH, I need to dig deeper into test best practices and patterns. My only concern here now is that we have tons of repeatable pieces of code. I wonder what are the best ways to handle that fare or the native test framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried looking around too for see how others do it and/or best practices, but since it's a relatively new thing, there's not much out there on it haha.
Maybe we can use the setup module just to build this map of instance types, architecture, and results,
if we build this map, we still can't use any loops in the.tftest
, so we'd still have to reference it likerun.setup.cpuCase1.instance_type
, and then at that point it might be an additional necessary layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oycyc @gberenice -- I think this is fine for now. @gberenice I get your point to make this more DRY, but sometimes test code just needs to be that way? And considering there is obviously lacking functionality within TF test... I'm of the mind that we just accept this.
This is exactly the type of thing that we want to be testing btw: We have non-trivial logic in locals
in the child module that we want to assert matches what we expect for various inputs. Well done 👍
Terratest right now is just pretty empty, just runs an apply and confirms there's output. I think we can remove it since we're not going to support it and it's pretty empty. We're not supporting both. |
even for unit tests (plans), it requires AWS credentials to call the API and do a plan. what do you think would be the best way to go about this? I'm thinking we set up OIDC connection in our AWS account (testing account maybe?) and use it with GitHub actions? or should we just use a static access key. |
Definitely OIDC. Yeah, we can utilize |
@oycyc @Gowiem I was thinking about what we want to achieve here, and I believe that the main value would be establishing a testing process, not just adding integration/unit tests. This includes:
What are your thoughts? |
@gberenice @oycyc -- First off, well done on both this initial implementation and the back+forth discussion above. Love to see it! @gberenice I like you last comment (right above this one) and I agree. We just want structure + process to come out of this PR. These tests might not be the most useful to due to the nature of the PR, but in the long run we need to start building up this practice and ensuring we're testing in ALL of our TF code where it makes sense. Let me also address each of your "includes":
Yes, we should understand this and it should be repeatable + leave us open to future expansion. It seems like this is relatively straight forward, so I think we're all good here.
@oycyc is going to work on this. Instead of doing that for this PR, I am going to say that we ship this and then we follow up with updating it with a Pipeline + OIDC usage. That way we can ship a smaller PR and follow up as we add on the additional, useful automation.
Same as above -- I wanna add this as a follow up PR.
There is 100% a lack of this in the community and I think this is something I'm very interested in. This is something for both of you to think on: What blog posts, best practice guides, or OSS projects that we might be able to put together to help out the community + our selves. This is a topic I want us to get GREAT at, have strong opinions, and be thought leaders in, so I am glad we're finally moving this forward. Let's be sure to discuss this live at some point soon -- Maybe we schedule a separate meeting once we get these tests + the corresponding automation in place? @oycyc I will read over your actual code now and give this a proper review, but I say we push this forward and get it merged ASAP so we have something to start with and then we can iterate on the automation next like I know you're already working on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oycyc few points of feedback, but I think overall this looks great. We just need to have a discussion as a team surrounding "What makes sense to test?". That feels critical for TF tests -- I think it's really easy to write tests that are just filler.
I am going to get a discussion on the calendar for us for next week or the following and we'll circle back soon. For now... let's ship this PR since it has been open for a while and things are working. We can update these tests according to my feedback as we discuss and we iterate on our process + pipelines.
Thanks for putting this together @oycyc 👍
run "valid_x86_64_instance" { | ||
command = plan | ||
|
||
variables { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oycyc @gberenice -- I think this is fine for now. @gberenice I get your point to make this more DRY, but sometimes test code just needs to be that way? And considering there is obviously lacking functionality within TF test... I'm of the mind that we just accept this.
This is exactly the type of thing that we want to be testing btw: We have non-trivial logic in locals
in the child module that we want to assert matches what we expect for various inputs. Well done 👍
assert { | ||
condition = module.ssm_agent.autoscaling_group_id != "" | ||
error_message = "The ID of the SSM Agent Autoscaling Group is empty, possibly not created." | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I question whether we should test this sort of thing... If the ASG didn't have an ID then it would have failed during the apply, correct?
I feel like we should only be testing that X === Y. e.g. We might add -asg
onto the label that we provide to the ASG module / resource... then we can test that the ASG name equals what we expect.
If we do tests like this, we could just be creating unnecessary noise.
This is my first reaction at least.
assert { | ||
condition = aws_launch_template.default.monitoring[0].enabled == true | ||
error_message = "Instance monitoring not enabled" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something I question if we should test. The simplicity of "passed in this bool, let's confirm bool is passed down correctly"... seems unnecessary to test maybe? Maybe we could argue that is it is confirming that passing in monitoring_enabled
in the future will not break, but I'm still a bit iffy on it.
I also almost always want to test things that revolve around locals (which you've done well elsewhere) OR they could revolve around locals e.g. http_tokens == "required"
is a good example as we translate a bool
variable into a string that is required by the provider.
I think this requires some discussion. We should save topics from this PR to discuss in an upcoming sync on TF testing. @oycyc can I ask you to start notes around a meeting like that and we'll use that as a jumping off point for some of the important discussions that need to come out of this PR?
assert { | ||
condition = aws_launch_template.default.iam_instance_profile[0].name == aws_iam_instance_profile.default.name | ||
error_message = "Launch template IAM instance profile name does not match the created instance profile" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another one that I question (I am questioning "is this useful" for a lot of these). Since there is no translation or logic surrounding this name... I'm 99.9% that this test would never fail. It would have to be the downstream provider that would be breaking for it to fail. Considering that... Do we need this test? Is it useful to us? I think we need to be continually asking that question of ourselves in the context of TF tests to ensure that we're not writing tests that are purely filler.
what
main.tftest.hcl
for the built in Terraform test frameworkmain.tftest
contains the integration tests, whereas the other test I hadcpu-compatability
is a unit test (planning)/setup
and/test-harness
module for setup and further validation using data sources.Test output
I could do more unit tests, but this module is pretty straightforward and simple, and I got the gist of different scenarios. Would appreciate second pairs of eyes to see if there's other things that would be worth testing here.
Most of the tests is just checking that it exists / connected together, not necessarily any specific configurations because there's not much in this module.
references
Summary by CodeRabbit
New Features
Bug Fixes
Documentation