feat: native Terraform test framework #28

oycyc · 2024-10-22T23:51:35Z

what

Remove the old default boilerplate Terratest
Added new main.tftest.hcl for the built in Terraform test framework
main.tftest contains the integration tests, whereas the other test I had cpu-compatability is a unit test (planning)
The integration test needs the /setup and /test-harness module for setup and further validation using data sources.

Test output

I could do more unit tests, but this module is pretty straightforward and simple, and I got the gist of different scenarios. Would appreciate second pairs of eyes to see if there's other things that would be worth testing here.

Most of the tests is just checking that it exists / connected together, not necessarily any specific configurations because there's not much in this module.

references

https://www.notion.so/masterpoint/Terraform-Testing-Investigation-Pipeline-1907ad6d5a4944b5b356e90fde3e2ee9

Summary by CodeRabbit

New Features
- Introduced comprehensive tests for validating EC2 instance compatibility with specified architectures.
- Added integration tests for the SSM Agent module, focusing on resource validation.
- Implemented Terraform configurations for random number generation and AWS data retrieval.
- Established structured tests for verifying various AWS resources and configurations.
Bug Fixes
- Enhanced error handling and validation in existing tests.
Documentation
- Updated variable declarations for improved configuration management.

coderabbitai

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (9)

tests/test-harness/variable.tf (1)
1-11: Add descriptions and validation blocks to improve maintainability.

While the variable declarations are structurally correct, consider enhancing them with descriptions and validation blocks to improve maintainability and prevent invalid inputs.

Apply this diff to add descriptions and validation:
 variable "ssm_document_name_from_test" {
   type = string
+  description = "Name of the SSM document to fetch for testing"
+  validation {
+    condition     = length(var.ssm_document_name_from_test) > 0
+    error_message = "SSM document name cannot be empty"
+  }
 }

 variable "iam_role_name_from_test" {
   type = string
+  description = "Name of the IAM role to fetch for testing"
+  validation {
+    condition     = length(var.iam_role_name_from_test) > 0
+    error_message = "IAM role name cannot be empty"
+  }
 }

 variable "instance_name" {
   type = string
+  description = "Name tag value of the EC2 instance to fetch for testing"
+  validation {
+    condition     = length(var.instance_name) > 0
+    error_message = "Instance name cannot be empty"
+  }
 }
tests/setup/main.tf (4)
1-8: Consider using a more flexible version constraint for the random provider.

The current constraint ~> 3.0 only allows patch updates within the 3.0.x range. For test infrastructure, consider using >= 3.0, < 4.0 to catch compatibility issues early while still maintaining major version compatibility.
   required_providers {
     random = {
       source  = "hashicorp/random"
-      version = "~> 3.0"
+      version = ">= 3.0, < 4.0"
     }
   }
10-13: Add documentation to explain the purpose of this random number.

The random number's purpose and its specific range (1-9999) should be documented. Consider adding a descriptive comment explaining how this value is used in the testing framework.
+# Generates a random suffix for resource naming in tests to prevent
+# naming conflicts when running tests concurrently
 resource "random_integer" "random_number" {
   min = 1
   max = 9999
 }
15-18: Consider using a more specific output name.

Since this random number appears to be used for test resource naming, consider renaming the output to something more specific like resource_name_suffix to better indicate its purpose.
-output "random_number" {
+output "resource_name_suffix" {
   value       = random_integer.random_number.result
-  description = "Random number between 1 and 9999"
+  description = "Random suffix used to prevent naming conflicts in test resources"
 }
1-18: Consider adding a README to document the test infrastructure.

Since this setup module is part of a larger testing framework that includes integration tests and test harnesses, consider adding a README.md to the tests/setup directory. This would help other contributors understand:

The role of this setup module in the testing framework

How it interacts with other test components

How to use it when writing new tests

Would you like me to help draft a README template for the test infrastructure?
tests/test-harness/main.tf (2)
1-10: Consider adding upper version constraints for better predictability.

While the lower bounds are well-defined, adding upper version constraints would help prevent unexpected breaking changes in future provider releases.
 terraform {
   required_version = ">= 1.0"

   required_providers {
     aws = {
       source  = "hashicorp/aws"
-      version = ">= 5.0"
+      version = ">= 5.0, < 6.0"
     }
   }
 }
1-25: LGTM! The test harness structure is well-organized.

The overall structure of the test harness is clean and follows Terraform best practices. It provides the necessary data sources for integration testing of the SSM agent module.

Consider documenting the expected test setup and prerequisites in a README.md file within the test-harness directory to help other contributors understand how to use this test framework.
tests/cpu-compatibility.tftest.hcl (1)

13-16: Enhance test documentation.

While the AWS documentation links are helpful, consider adding:

Brief description of what aspects of instance types are being tested

Test prerequisites or setup requirements

Expected test outcomes
tests/main.tftest.hcl (1)
72-72: Simplify root block device encryption check

At line 72, you're accessing the encrypted attribute of the root block device using tolist(data.aws_instance.from_test.root_block_device)[0].encrypted == true. If root_block_device is already a list, you may not need to use tolist. Also, consider adding a splat operator for better compatibility.

You can simplify the condition as:
-    condition     = tolist(data.aws_instance.from_test.root_block_device)[0].encrypted == true
+    condition     = data.aws_instance.from_test.root_block_device[0].encrypted == true
Or, if you want to ensure all root block devices are encrypted:
-    condition     = tolist(data.aws_instance.from_test.root_block_device)[0].encrypted == true
+    condition     = alltrue([for device in data.aws_instance.from_test.root_block_device : device.encrypted])
This makes the assertion clearer and ensures compatibility.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between c7953f1 and 04f7d2f.

⛔ Files ignored due to path filters (1)

tests/terratest/Gopkg.lock is excluded by !**/*.lock

📒 Files selected for processing (5)

tests/cpu-compatibility.tftest.hcl (1 hunks)
tests/main.tftest.hcl (1 hunks)
tests/setup/main.tf (1 hunks)
tests/test-harness/main.tf (1 hunks)
tests/test-harness/variable.tf (1 hunks)

🧰 Additional context used

🔇 Additional comments (5)

tests/test-harness/variable.tf (1)
1-11: LGTM! Variables align with test framework requirements.

The variables are correctly structured to support the integration tests by enabling dynamic data source lookups for SSM documents, IAM roles, and EC2 instances. This aligns well with the PR's objective of implementing a native Terraform test framework to verify component existence and connectivity.

Let's verify these variables are used correctly in the test harness:

✅ Verification successful

Variables are correctly utilized in corresponding data source blocks

The verification confirms that all three variables are properly used in their respective data source blocks in tests/test-harness/main.tf:

ssm_document_name_from_test is used to look up an SSM document

iam_role_name_from_test is used to look up an IAM role

instance_name is used in a filter to look up an EC2 instance by its Name tag
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the variables are used in data source blocks

# Test: Search for data source blocks using these variables
rg -A 5 'data\s+"aws_(ssm_document|iam_role|instance)"\s+"from_test"'
Length of output: 713
tests/test-harness/main.tf (1)
20-25: 🛠️ Refactor suggestion

Verify instance uniqueness in the test environment.

The current instance lookup only filters by the "Name" tag, which could potentially return multiple instances if the tag value is not unique. Consider adding additional filters (like instance state) or verify that your test environment ensures tag uniqueness.

Consider adding additional filters for more precise instance selection:
 data "aws_instance" "from_test" {
   filter {
     name   = "tag:Name"
     values = [var.instance_name]
   }
+  filter {
+    name   = "instance-state-name"
+    values = ["running"]
+  }
 }
tests/cpu-compatibility.tftest.hcl (2)
100-103: ⚠️ Potential issue

Fix incorrect error message in graphics instance test.

The error message references g3s.xlarge but the test is using g4dn.xlarge.
   assert {
     condition     = local.is_instance_compatible
-    error_message = "Expected instance type g3s.xlarge to be compatible with x86_64 architecture"
+    error_message = "Expected instance type g4dn.xlarge to be compatible with x86_64 architecture"
   }
Likely invalid or redundant comment.

27-28: Verify test dependencies implementation.

The tests rely on local.is_instance_compatible and null_resource.validate_instance_type. Let's verify their implementation:

Also applies to: 56-58

✅ Verification successful

Based on the results, I can now generate the final response since I have found the implementation of both dependencies.

Dependencies are properly implemented and serve their purpose

The test dependencies are correctly implemented in main.tf:

local.is_instance_compatible is a boolean expression that validates architecture compatibility:

For x86_64: Ensures instance type doesn't have 'g' in third position

For arm64: Ensures instance type has 'g' in third position

null_resource.validate_instance_type uses this condition in its lifecycle precondition block

The test assertions align perfectly with the implementation logic, validating both valid and invalid scenarios for x86_64 and arm64 architectures.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for the implementation of test dependencies
echo "Searching for is_instance_compatible implementation..."
rg "is_instance_compatible" -A 5

echo "Searching for validate_instance_type resource..."
ast-grep --pattern 'resource "null_resource" "validate_instance_type" {
  $$$
}'
Length of output: 2365
tests/main.tftest.hcl (1)
19-19: ⚠️ Potential issue

Verify run.setup.random_number is properly defined and accessible

At lines 19 and 55, you're referencing run.setup.random_number to generate unique identifiers. Ensure that the setup run block provides random_number as an output that can be accessed in subsequent runs. If random_number is not outputted or properly exposed, this could lead to errors when trying to use it.

To confirm that random_number is correctly outputted from the setup module, you can use the following shell script:

This script searches for the 'run "setup"' block in tests/main.tftest.hcl and checks for the presence of random_number in the outputs.

Also applies to: 55-55

✅ Verification successful

Let me gather more information about the setup module and its outputs.

Let me check the setup module content specifically since it wasn't shown in the previous output.

random_number is properly defined and accessible

The setup module correctly defines and exposes the random_number output:

It creates a random_integer resource named random_number

The value is properly exposed through an output variable named random_number

The output can be accessed in subsequent runs using run.setup.random_number as done in the test file
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that 'random_number' is outputted from 'run.setup'

# Test: Check if 'random_number' is available as an output in 'run.setup'
ast-grep --pattern $'run "setup" {\n  module {\n    source = "./tests/setup"\n  }\n}' -A5 tests/main.tftest.hcl | grep 'random_number'
Length of output: 129

Script:
#!/bin/bash
# Let's first check the content of the test setup module
fd "setup" tests/ --type f --exec cat {}

# Also check the main test file for the complete context
cat tests/main.tftest.hcl
Length of output: 3292

Script:
#!/bin/bash
# Check if setup.tf exists in tests/setup directory
cat tests/setup/main.tf 2>/dev/null || echo "File not found"

# Also check for any .tf files in the setup directory
fd ".tf$" tests/setup/ --type f --exec cat {}
Length of output: 721

tests/test-harness/main.tf

tests/cpu-compatibility.tftest.hcl

tests/main.tftest.hcl

gberenice

@oycyc looks good, I left a couple of comments/thoughts.
I agree it's a simple module, and I don't think we should overload it with tests. I guess our main goal here is to establish a standardized testing framework and pipeline so it can be spread across other modules.

Also, I see that terratest was added in 2020, which definitely requires revisiting. Do you know if there was any discussion around it? Are we going to support both? Do we expect to replace it?

tests/test-harness/main.tf

tests/main.tftest.hcl

gberenice · 2024-10-23T09:14:19Z

tests/cpu-compatibility.tftest.hcl

+run "valid_x86_64_instance" {
+  command = plan
+
+  variables {


I think we should improve the readability by moving this into a helper module where we can use loops and locals like this:

locals { instances = [ { instance_type = "t3.micro" architecture = "x86_64" expected = true }, { instance_type = "t4g.micro" architecture = "arm64" expected = true }, { instance_type = "t4g.micro" architecture = "x86_64" expected = false }, { instance_type = "t3.micro" architecture = "arm64" expected = false }, { instance_type = "g3s.xlarge" architecture = "arm64" expected = false }, { instance_type = "g4dn.xlarge" architecture = "x86_64" expected = true } ] }

Unfortunately, for_each is not supported yet for tests, but we can workaround that with run.module block.
How does that sound?

Hmm this becomes interesting because if we have it in a module in the run.module block, that module being called is actually going to be applied, whereas in .tftest we can specify it's planning and we're asserting against the plan. This file is more meant to be trying out the unit tests since this we don't need to apply first to validate.

Maybe we can use the setup module just to build this map of instance types, architecture, and results, similar to how you generate a random ID for the test? In this case it's fine to apply, it won't create any AWS resources.

TBH, I need to dig deeper into test best practices and patterns. My only concern here now is that we have tons of repeatable pieces of code. I wonder what are the best ways to handle that fare or the native test framework.

I tried looking around too for see how others do it and/or best practices, but since it's a relatively new thing, there's not much out there on it haha.

Maybe we can use the setup module just to build this map of instance types, architecture, and results,
if we build this map, we still can't use any loops in the .tftest, so we'd still have to reference it like run.setup.cpuCase1.instance_type, and then at that point it might be an additional necessary layer.

@oycyc @gberenice -- I think this is fine for now. @gberenice I get your point to make this more DRY, but sometimes test code just needs to be that way? And considering there is obviously lacking functionality within TF test... I'm of the mind that we just accept this.

This is exactly the type of thing that we want to be testing btw: We have non-trivial logic in locals in the child module that we want to assert matches what we expect for various inputs. Well done 👍

oycyc · 2024-10-23T17:44:18Z

@oycyc looks good, I left a couple of comments/thoughts. I agree it's a simple module, and I don't think we should overload it with tests. I guess our main goal here is to establish a standardized testing framework and pipeline so it can be spread across other modules.

Also, I see that terratest was added in 2020, which definitely requires revisiting. Do you know if there was any discussion around it? Are we going to support both? Do we expect to replace it?

Terratest right now is just pretty empty, just runs an apply and confirms there's output. I think we can remove it since we're not going to support it and it's pretty empty. We're not supporting both.

oycyc · 2024-10-23T18:14:59Z

@oycyc looks good, I left a couple of comments/thoughts. I agree it's a simple module, and I don't think we should overload it with tests. I guess our main goal here is to establish a standardized testing framework and pipeline so it can be spread across other modules.

even for unit tests (plans), it requires AWS credentials to call the API and do a plan. what do you think would be the best way to go about this? I'm thinking we set up OIDC connection in our AWS account (testing account maybe?) and use it with GitHub actions? or should we just use a static access key.

gberenice · 2024-10-24T11:02:01Z

even for unit tests (plans), it requires AWS credentials to call the API and do a plan. what do you think would be the best way to go about this? I'm thinking we set up OIDC connection in our AWS account (testing account maybe?) and use it with GitHub actions? or should we just use a static access key.

Definitely OIDC. Yeah, we can utilize testing account for these purposes.
The question is how we'd like to manage the resources required for tests setup as code. We have mp-infra repo, but it's quite messy. The components directory is organized for use by Terraform Controller, and we're not going to proceed with this project, unfortunately. However, you can add OIDC IAC there as a vanilla Terraform code, I believe, and we can discuss further repo reorganization on the next internal call.

gberenice · 2024-10-29T16:40:47Z

@oycyc @Gowiem I was thinking about what we want to achieve here, and I believe that the main value would be establishing a testing process, not just adding integration/unit tests. This includes:

git structure for test files/configs
configuring access to AWS (and Spacelift in the nearest future)
adding pipeline and configuring it to run for each PR
start writing our own best practices if we see the lack of those among the community

What are your thoughts?

Gowiem · 2024-11-07T21:03:15Z

@gberenice @oycyc -- First off, well done on both this initial implementation and the back+forth discussion above. Love to see it!

@gberenice I like you last comment (right above this one) and I agree. We just want structure + process to come out of this PR. These tests might not be the most useful to due to the nature of the PR, but in the long run we need to start building up this practice and ensuring we're testing in ALL of our TF code where it makes sense.

Let me also address each of your "includes":

git structure for test files/configs

Yes, we should understand this and it should be repeatable + leave us open to future expansion. It seems like this is relatively straight forward, so I think we're all good here.

configuring access to AWS (and Spacelift in the nearest future)

@oycyc is going to work on this. Instead of doing that for this PR, I am going to say that we ship this and then we follow up with updating it with a Pipeline + OIDC usage. That way we can ship a smaller PR and follow up as we add on the additional, useful automation.

adding pipeline and configuring it to run for each PR

Same as above -- I wanna add this as a follow up PR.

start writing our own best practices if we see the lack of those among the community

There is 100% a lack of this in the community and I think this is something I'm very interested in. This is something for both of you to think on: What blog posts, best practice guides, or OSS projects that we might be able to put together to help out the community + our selves. This is a topic I want us to get GREAT at, have strong opinions, and be thought leaders in, so I am glad we're finally moving this forward. Let's be sure to discuss this live at some point soon -- Maybe we schedule a separate meeting once we get these tests + the corresponding automation in place?

@oycyc I will read over your actual code now and give this a proper review, but I say we push this forward and get it merged ASAP so we have something to start with and then we can iterate on the automation next like I know you're already working on.

Gowiem

@oycyc few points of feedback, but I think overall this looks great. We just need to have a discussion as a team surrounding "What makes sense to test?". That feels critical for TF tests -- I think it's really easy to write tests that are just filler.

I am going to get a discussion on the calendar for us for next week or the following and we'll circle back soon. For now... let's ship this PR since it has been open for a while and things are working. We can update these tests according to my feedback as we discuss and we iterate on our process + pipelines.

Thanks for putting this together @oycyc 👍

Gowiem · 2024-11-07T21:12:51Z

tests/cpu-compatibility.tftest.hcl

+run "valid_x86_64_instance" {
+  command = plan
+
+  variables {


@oycyc @gberenice -- I think this is fine for now. @gberenice I get your point to make this more DRY, but sometimes test code just needs to be that way? And considering there is obviously lacking functionality within TF test... I'm of the mind that we just accept this.

This is exactly the type of thing that we want to be testing btw: We have non-trivial logic in locals in the child module that we want to assert matches what we expect for various inputs. Well done 👍

tests/main.tftest.hcl

Gowiem · 2024-11-07T21:16:07Z

tests/main.tftest.hcl

+  assert {
+    condition     = module.ssm_agent.autoscaling_group_id != ""
+    error_message = "The ID of the SSM Agent Autoscaling Group is empty, possibly not created."
+  }


I question whether we should test this sort of thing... If the ASG didn't have an ID then it would have failed during the apply, correct?

I feel like we should only be testing that X === Y. e.g. We might add -asg onto the label that we provide to the ASG module / resource... then we can test that the ASG name equals what we expect.

If we do tests like this, we could just be creating unnecessary noise.

This is my first reaction at least.

Gowiem · 2024-11-07T21:24:37Z

tests/unit.tftest.hcl

+  assert {
+    condition     = aws_launch_template.default.monitoring[0].enabled == true
+    error_message = "Instance monitoring not enabled"
+  }


This is something I question if we should test. The simplicity of "passed in this bool, let's confirm bool is passed down correctly"... seems unnecessary to test maybe? Maybe we could argue that is it is confirming that passing in monitoring_enabled in the future will not break, but I'm still a bit iffy on it.

I also almost always want to test things that revolve around locals (which you've done well elsewhere) OR they could revolve around locals e.g. http_tokens == "required" is a good example as we translate a bool variable into a string that is required by the provider.

I think this requires some discussion. We should save topics from this PR to discuss in an upcoming sync on TF testing. @oycyc can I ask you to start notes around a meeting like that and we'll use that as a jumping off point for some of the important discussions that need to come out of this PR?

Gowiem · 2024-11-07T21:26:58Z

tests/unit.tftest.hcl

+  assert {
+    condition     = aws_launch_template.default.iam_instance_profile[0].name == aws_iam_instance_profile.default.name
+    error_message = "Launch template IAM instance profile name does not match the created instance profile"
+  }


This is another one that I question (I am questioning "is this useful" for a lot of these). Since there is no translation or logic surrounding this name... I'm 99.9% that this test would never fail. It would have to be the downstream provider that would be breaking for it to fail. Considering that... Do we need this test? Is it useful to us? I think we need to be continually asking that question of ourselves in the context of TF tests to ensure that we're not writing tests that are purely filler.

oycyc added 2 commits October 17, 2024 14:26

chore: move terratest to own folder

4493fde

feat: integration tests with native framework

04f7d2f

oycyc requested a review from a team as a code owner October 22, 2024 23:51

This comment was marked as spam.

Sign in to view

coderabbitai bot reviewed Oct 22, 2024

View reviewed changes

gberenice reviewed Oct 23, 2024

View reviewed changes

oycyc added 2 commits October 23, 2024 13:44

chore: remove terratest

20697bc

additional unit tests for validation

78560d9

This comment was marked as duplicate.

Sign in to view

Gowiem approved these changes Nov 7, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: native Terraform test framework #28

feat: native Terraform test framework #28

oycyc commented Oct 22, 2024 •

edited

Loading

This comment was marked as spam.

coderabbitai bot left a comment

gberenice left a comment

gberenice Oct 23, 2024

oycyc Oct 23, 2024

gberenice Oct 24, 2024

oycyc Oct 25, 2024

Gowiem Nov 7, 2024

oycyc commented Oct 23, 2024 •

edited

Loading

oycyc commented Oct 23, 2024

This comment was marked as duplicate.

gberenice commented Oct 24, 2024

gberenice commented Oct 29, 2024

Gowiem commented Nov 7, 2024 •

edited

Loading

Gowiem left a comment

Gowiem Nov 7, 2024

Gowiem Nov 7, 2024

Gowiem Nov 7, 2024

Gowiem Nov 7, 2024

feat: native Terraform test framework #28

Are you sure you want to change the base?

feat: native Terraform test framework #28

Conversation

oycyc commented Oct 22, 2024 • edited Loading

what

references

Summary by CodeRabbit

This comment was marked as spam.

coderabbitai bot left a comment

Choose a reason for hiding this comment

gberenice left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oycyc commented Oct 23, 2024 • edited Loading

oycyc commented Oct 23, 2024

This comment was marked as duplicate.

gberenice commented Oct 24, 2024

gberenice commented Oct 29, 2024

Gowiem commented Nov 7, 2024 • edited Loading

Gowiem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oycyc commented Oct 22, 2024 •

edited

Loading

oycyc commented Oct 23, 2024 •

edited

Loading

Gowiem commented Nov 7, 2024 •

edited

Loading