Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform test performs progressively slower as more test runs execute #34009

Closed
GSDragoon opened this issue Oct 6, 2023 · 6 comments · Fixed by #34026
Closed

Terraform test performs progressively slower as more test runs execute #34009

GSDragoon opened this issue Oct 6, 2023 · 6 comments · Fixed by #34026
Assignees
Labels
bug confirmed a Terraform Core team member has reproduced this issue explained a Terraform Core team member has described the root cause of this issue in code terraform test

Comments

@GSDragoon
Copy link

GSDragoon commented Oct 6, 2023

Terraform Version

Terraform v1.6.0
on windows_amd64

Terraform Configuration Files

This is the most simple example of recreating the issue that I could come up with. Remote-exe provision is what I found to be the biggest offender and is really easy to recreate with. I have also recreated this issue with more tests and not using a provisioner, but that terraform is a lot more complex and not something I can easily share.

main.tf

resource "terraform_data" "provision" {
  connection {
    host = "host"
  }
  provisioner "remote-exec" {
    inline = ["echo test"]
  }
}

./tests/main.tftest.hcl

run "test 02" { command = plan }
run "test 03" { command = plan }
run "test 04" { command = plan }
run "test 05" { command = plan }
run "test 06" { command = plan }
run "test 07" { command = plan }
run "test 08" { command = plan }
run "test 09" { command = plan }
run "test 10" { command = plan }
run "test 11" { command = plan }
run "test 12" { command = plan }
run "test 13" { command = plan }
run "test 14" { command = plan }
run "test 15" { command = plan }
run "test 16" { command = plan }
run "test 17" { command = plan }
run "test 18" { command = plan }
run "test 19" { command = plan }
run "test 20" { command = plan }
run "test 21" { command = plan }
run "test 22" { command = plan }
run "test 23" { command = plan }
run "test 24" { command = plan }
run "test 25" { command = plan }
run "test 26" { command = plan }
run "test 27" { command = plan }
run "test 28" { command = plan }
run "test 29" { command = plan }
run "test 30" { command = plan }

Debug Output

Debug output from 23 test runs from the example configuration shared here: https://gist.github.com/GSDragoon/3b8b661a377623d269a55233afdb8242

At the end of the file, when things are the slowest, these sections seem to be noteworthy as the time jumps.

2023-10-06T11:48:38.617-0400 [TRACE] Executing graph transform *terraform.ReferenceTransformer
2023-10-06T11:49:13.518-0400 [DEBUG] ReferenceTransformer: "terraform_data.provision (expand)" references: []
2023-10-06T11:49:48.500-0400 [DEBUG] ReferenceTransformer: "terraform_data.provision" references: []

Meanwhile, the first time this runs (doing the same exact test), it completes instantly.

2023-10-06T11:43:17.218-0400 [TRACE] Executing graph transform *terraform.ReferenceTransformer
2023-10-06T11:43:17.218-0400 [DEBUG] ReferenceTransformer: "terraform_data.provision (expand)" references: []
2023-10-06T11:43:17.218-0400 [DEBUG] ReferenceTransformer: "terraform_data.provision" references: []

Expected Behavior

I would expect each test run to execute within a similar time span.

Actual Behavior

The more test runs that execute, the slower it gets. To the point of taking so long it's unusable. The terraform process memory will also keep increasing. I had it hit 9GB before killing it once. Doing some very rough stopwatching (once things start slowing down, ~15 or so), each test run takes almost twice as long as the previous (12 sec, 27 sec, 56 sec, 1 min 52 sec, 3 min 45 sec).

Steps to Reproduce

Using the provided terraform configuration and terraform 1.6.0:

  1. terraform init
  2. terraform test
  3. Notice as the test runs execute, they get slower and slower. You should also be able to see the terraform process memory continuously climb up as well.

Additional Context

Love the new testing features in 1.6. ❤️ The eventual goal is to run this in a CI process and to have a much larger test suite, but right now I've only just begun creating some tests using the new features in 1.6.0 and have been running things locally. It's been recreated on 3 different machines.

References

No response

@GSDragoon GSDragoon added bug new new issue not yet triaged labels Oct 6, 2023
@crw
Copy link
Contributor

crw commented Oct 6, 2023

Thanks for this report!

@liamcervante liamcervante added confirmed a Terraform Core team member has reproduced this issue and removed new new issue not yet triaged labels Oct 9, 2023
@liamcervante
Copy link
Member

Hi @GSDragoon, I can replicate this and I think I've pinpointed the issue. I just wanted to check whether you've seen this with any tests that don't use the provisioner block in any resources?

The issue is arising from this line:

p.Connection.Config = configs.MergeBodies(n.Config.Managed.Connection.Config, p.Connection.Config)

Basically, the Terraform internal graph is modifying the config by merging the two locations of config together. It then does this on every iteration, and the testing framework is sharing the same config object between all the runs. Every run is therefore basically doubling the size of the provisioner block, and then we get the behaviour that you described.

If you change the definition of the resource to:

resource "tfcoremock_simple_resource" "one" {
  provisioner "remote-exec" {
    inline = ["echo test"]

    connection {
      host = "host"
    }
  }
}

We don't see this behaviour as that particular line of code is not being triggered.

This is definitely a bug, and we'll try and fix as soon as possible - I just need to look into the best way to do this. I'm not sure if we want to modify the testing framework so it reloads the config for every run block, or maybe we can change Terraform itself so it's not modifying the configuration during the graph execution.

Thanks for finding this and reporting it!

@liamcervante liamcervante added the explained a Terraform Core team member has described the root cause of this issue in code label Oct 9, 2023
@liamcervante liamcervante self-assigned this Oct 9, 2023
@liamcervante liamcervante added this to the v1.6.1 milestone Oct 9, 2023
@GSDragoon
Copy link
Author

Hi @GSDragoon, I can replicate this and I think I've pinpointed the issue. I just wanted to check whether you've seen this with any tests that don't use the provisioner block in any resources?

The issue is arising from this line:

p.Connection.Config = configs.MergeBodies(n.Config.Managed.Connection.Config, p.Connection.Config)

Basically, the Terraform internal graph is modifying the config by merging the two locations of config together. It then does this on every iteration, and the testing framework is sharing the same config object between all the runs. Every run is therefore basically doubling the size of the provisioner block, and then we get the behaviour that you described.

If you change the definition of the resource to:

resource "tfcoremock_simple_resource" "one" {
  provisioner "remote-exec" {
    inline = ["echo test"]

    connection {
      host = "host"
    }
  }
}

We don't see this behaviour as that particular line of code is not being triggered.

This is definitely a bug, and we'll try and fix as soon as possible - I just need to look into the best way to do this. I'm not sure if we want to modify the testing framework so it reloads the config for every run block, or maybe we can change Terraform itself so it's not modifying the configuration during the graph execution.

Thanks for finding this and reporting it!

Hi @liamcervante Thanks for looking into this. I have only been able to recreate the issue using a provisioner block. It's even slower if I use more than one (such as a file and a remote-exe type). If either the the provisioner or connection blocks are removed from the resource, the tests run okay. They still seem a little slow (about 16 seconds for 78 test runs), but totally usable and consistent performance across each run.

I was having a hard time working around the issue using tfcoremock_simple_resource instead of terraform_data and completely missed that you also moved the connection block inside the provisioner block instead of having the connection block directly inside the resource. Just to confirm (and to make clear in case anyone else is following along), this is the intent of the change you posted?

resource "terraform_data" "slow" {
  provisioner "remote-exec" {
    inline = ["echo test"]
  }
  connection {
    host = "host"
  }
}

resource "terraform_data" "even_slower" {
  provisioner "remote-exec" {
    inline = ["echo test"]
  }
  provisioner "remote-exec" {
    inline = ["echo test2"]
  }
  connection {
    host = "host"
  }
}

resource "terraform_data" "fast" {
  provisioner "remote-exec" {
    inline = ["echo test"]
    connection {
      host = "host"
    }
  }
}

resource "terraform_data" "still_fast" {
  provisioner "remote-exec" {
    inline = ["echo test"]
    connection {
      host = "host"
    }
  }
  provisioner "remote-exec" {
    inline = ["echo test2"]
    connection {
      host = "host"
    }
  }
}

@liamcervante
Copy link
Member

liamcervante commented Oct 9, 2023

Hi @GSDragoon, yes - you're exactly right! Sorry, I missed that I'd changed the resource type in my response. Yeah, if there is no root connection block within the resource I think the issue is avoided. We should have this completely fixed for v1.6.1 either way, so both approaches will work in the end.

@apparentlymart apparentlymart removed this from the v1.6.1 milestone Oct 9, 2023
@liamcervante
Copy link
Member

Just a quick update here, this will be fixed 1.6.2 rather than 1.6.1. Apologies for the delay!

Copy link
Contributor

github-actions bot commented Dec 8, 2023

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug confirmed a Terraform Core team member has reproduced this issue explained a Terraform Core team member has described the root cause of this issue in code terraform test
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants