Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue when run Terraform Plan with many Block Sets or NestedAttribute Sets #775

Open
jeremmfr opened this issue Jun 16, 2023 · 10 comments
Labels
bug Something isn't working

Comments

@jeremmfr
Copy link

jeremmfr commented Jun 16, 2023

Module version

v1.2.0 & v1.3.1

Description

After migrating resources of my provider jeremmfr/junos from SDK plugin to this framework plugin, the time to run a terraform plan with a config that have many block sets has been increased very significantly.

For 1000 items of block set, a terraform plan (with empty state) with provider including module sdk/v2 take ~1.3 seconds and take ~ 70 seconds with provider including plugin-framework.
A user of my provider has a time of 19 minutes to run terraform plan with the new version of the provider including plugin-framework instead of 52 seconds with the previous version of the provider including module sdk/v2

Tests

I tested to remove all PlanModifiers and Validators fields on each attribute, and ValidateConfig function for resources, using the latest version of this module (v1.3.1) but there's no change.
The problem disappears if replace schema.SetNestedBlock with schema.ListNestedBlock: 70 seconds to 1 second.

I reproduce the problem with provider hashicups and hashicups_order resources.
I tested 1000 blocksitems on a resource hashicups_order with different modifications of items block.

hashicorp/terraform-provider-hashicups schema.TypeSet : 3.8 - 4 seconds
hashicorp/terraform-provider-hashicups schema.TypeList : 0.5 - 0.8 seconds
hashicorp/terraform-provider-hashicups-pf schema.SetNestedBlock : 49.1 - 49.7 seconds
hashicorp/terraform-provider-hashicups-pf schema.ListNestedBlock : 0.8 - 1.1 seconds
hashicorp/terraform-provider-hashicups-pf schema.SetNestedAttribute : 49.0 - 49.4 seconds
hashicorp/terraform-provider-hashicups-pf schema.ListNestedAttribute : 0.7 - 0.9 seconds

Relevant provider source code

To test with block sets, clone hashicorp/terraform-provider-hashicups-pf and apply this patch :

diff --git a/internal/provider/order_resource.go b/internal/provider/order_resource.go
index 4fe16cc..67203cd 100644
--- a/internal/provider/order_resource.go
+++ b/internal/provider/order_resource.go
@@ -76,10 +76,12 @@ func (r *orderResource) Schema(_ context.Context, _ resource.SchemaRequest, resp
 				Description: "Timestamp of the last Terraform update of the order.",
 				Computed:    true,
 			},
-			"items": schema.ListNestedAttribute{
+		},
+		Blocks: map[string]schema.Block{
+			"items": schema.SetNestedBlock{
 				Description: "List of items in the order.",
-				Required:    true,
-				NestedObject: schema.NestedAttributeObject{
+				//	Required:    true,
+				NestedObject: schema.NestedBlockObject{
 					Attributes: map[string]schema.Attribute{
 						"quantity": schema.Int64Attribute{
 							Description: "Count of this item in the order.",
@@ -87,7 +89,7 @@ func (r *orderResource) Schema(_ context.Context, _ resource.SchemaRequest, resp
 						},
 						"coffee": schema.SingleNestedAttribute{
 							Description: "Coffee item in the order.",
-							Required:    true,
+							Optional:    true,
 							Attributes: map[string]schema.Attribute{
 								"id": schema.Int64Attribute{
 									Description: "Numeric identifier of the coffee.",

& disable hashicups.NewClient code in provider configure function

To test with nested atttribute sets, clone hashicorp/terraform-provider-hashicups-pf and apply this patch :

diff --git a/internal/provider/order_resource.go b/internal/provider/order_resource.go
index 4fe16cc..ea229a7 100644
--- a/internal/provider/order_resource.go
+++ b/internal/provider/order_resource.go
@@ -76,7 +76,7 @@ func (r *orderResource) Schema(_ context.Context, _ resource.SchemaRequest, resp
 				Description: "Timestamp of the last Terraform update of the order.",
 				Computed:    true,
 			},
-			"items": schema.ListNestedAttribute{
+			"items": schema.SetNestedAttribute{
 				Description: "List of items in the order.",
 				Required:    true,
 				NestedObject: schema.NestedAttributeObject{
@@ -87,7 +87,7 @@ func (r *orderResource) Schema(_ context.Context, _ resource.SchemaRequest, resp
 						},
 						"coffee": schema.SingleNestedAttribute{
 							Description: "Coffee item in the order.",
-							Required:    true,
+							Optional:    true,
 							Attributes: map[string]schema.Attribute{
 								"id": schema.Int64Attribute{
 									Description: "Numeric identifier of the coffee.",

& disable hashicups.NewClient code in provider configure function

Terraform Configuration Files

To test with block sets:

resource "hashicups_order" "example" {
  dynamic "items" {
    for_each = range(1, 1000)

    content {
      quantity = items.value
    }
  }
}

To test with nested attribute sets:

resource "hashicups_order" "example2" {
  items = [
    for e in range(1, 1000) : {
      quantity = e
    }
  ]
}

Expected Behavior

The terraform plan command takes a reasonable time to run when there are many block sets.

Actual Behavior

The terraform plan command takes a very long time to run when there are many block sets.

Steps to Reproduce

  1. build hashicups provider
  2. terraform init
  3. terraform plan

References

jeremmfr/terraform-provider-junos#498

@jeremmfr jeremmfr added the bug Something isn't working label Jun 16, 2023
@bflad
Copy link
Contributor

bflad commented Jun 16, 2023

Hi @jeremmfr 👋 Thank you for reporting this issue and sorry for the trouble here. I just wanted to acknowledge that the maintainers have noticed this bug report and do plan on taking a further look into this, but that effort will probably happen early next week. In the meantime, if you have any familiarity with pprof and flamegraph profiling, that might be a good methodology for starting to narrow down where the unexpected extra time is being spent.

@jeremmfr
Copy link
Author

jeremmfr commented Jun 17, 2023

Hi,

I've run profiling with pprof, the provider hashicups, attribute items with type schema.SetNestedBlock, and a resource hashicups_order with 1000 elements of dynamic items.

I've also updated all the dependencies for the provider hashicups :

 require (
 	github.com/hashicorp-demoapp/hashicups-client-go v0.1.0
-	github.com/hashicorp/terraform-plugin-docs v0.14.1
-	github.com/hashicorp/terraform-plugin-framework v1.2.0
-	github.com/hashicorp/terraform-plugin-go v0.15.0
-	github.com/hashicorp/terraform-plugin-log v0.8.0
-	github.com/hashicorp/terraform-plugin-testing v1.2.0
+	github.com/hashicorp/terraform-plugin-docs v0.15.0
+	github.com/hashicorp/terraform-plugin-framework v1.3.1
+	github.com/hashicorp/terraform-plugin-go v0.16.0
+	github.com/hashicorp/terraform-plugin-log v0.9.0
+	github.com/hashicorp/terraform-plugin-testing v1.3.0
 )

I attach a png and a flamegrah (flamegraph in gist cpu.svg)

cpu

@jeremmfr
Copy link
Author

jeremmfr commented Jun 17, 2023

A profiling result for a complete call of PlanResourceChange and ValidateResourceConfig instead of 30 seconds dump.
(flamegraph in gist cpu2.svg)

cpu2

@jeremmfr
Copy link
Author

jeremmfr commented Jun 17, 2023

A profiling result for a complete call of ValidateResourceConfig (~7 seconds)
(flamegraph in gist cpu3.svg)

cpu3

@bflad
Copy link
Contributor

bflad commented Jun 30, 2023

Hi @jeremmfr 👋 Would you be able to try out hashicorp/terraform-plugin-go#308? From my local testing with the hashicups-pf example, it reduced the plan time by over half. I'm curious what level of real world benefit you might see as well.

@jeremmfr
Copy link
Author

jeremmfr commented Jul 1, 2023

Hi @bflad 👋

We launch multiple scenarios to compare potential benefit.

The PR reduce time but there is always a gap between SDKv2 and framework version of provider.

The result:

in seconds SDKv2 Framework Framework with
terraform-plugin-go#308
hashicups_order
with 1000 items block
~ 3.8 ~ 48 ~ 14
junos_security_address_book
with 1000 network_address block
~ 1.3 ~ 78 ~ 17
junos_security_address_book
with 1000 address_set block
~ 3.2 ~ 198 ~ 45

The difference between network_address block and address_set block is that network_address block has only primitive attributes (String) and address_set block has primitive attributes (String) and primitive set attributes (String sets)

Terraform code
  • hashicups_order with items
resource "hashicups_order" "example" {
  dynamic "items" {
    for_each = range(1, 1000)
    content {
      quantity = items.value
    }
  }
}
  • junos_security_address_book with network_address
resource "junos_security_address_book" "global" {
  name = "global"

  dynamic "network_address" {
    for_each = range(1, 1000)
    content {
      name  = "test${network_address.value}"
      value = "1.1.1.1/32"
    }
  }
}
  • junos_security_address_book with address_set
resource "junos_security_address_book" "global" {
  name = "global"

  dynamic "address_set" {
    for_each = range(1, 1000)
    content {
      name    = "test${address_set.value}"
      address = ["baz"]
    }
  }
}
Go replace

in go.mod to use PR terraform-plugin-go#308

replace github.com/hashicorp/terraform-plugin-go => github.com/hashicorp/terraform-plugin-go v0.17.1-0.20230630191346-2155f0b1810f

I run new profiling with pprof in last scenario (junos_security_address_book with 1000 address_set block and framework with terraform-plugin-go#308)

The result in png and flamegraph (in gist)

cpu4

@bflad
Copy link
Contributor

bflad commented Jul 3, 2023

Thanks for the confirmation and the additional context, @jeremmfr. 😄 Given the overall very positive effects, I've pulled in the upstream changes and will cut a release of those shortly.

I'm guessing there are two threads to pull on next with the framework side of things:

  • Why logging via terraform-plugin-log and go-hclog is so expensive with syscalls. Maybe something can be done to alleviate that compute pressure, such as ensuring that each logging call is not resulting in a flush (or seeing if that behavior can be configured).
  • Investigating the reduction of allocations for path information. One challenge there is that paths are implemented as a slice of path steps, which in Go means that even when paths are passed by reference that the underlying steps data is mutable unless the implementation is very careful to try and prevent that possibility.

One other reality is that the framework went GA with its data handling half in terraform-plugin-go and half in terraform-plugin-framework. There is a lot of passing back and forth between the type systems in the PlanResourceChange logic. Ideally, everything would have been done in terraform-plugin-framework and only using terraform-plugin-go's type system for the request/response handling. It's unfortunately very difficult to change that until a major version of the framework to remove the directly exposed terraform-plugin-go types in certain places.

bflad added a commit that referenced this issue Jul 3, 2023
…rmValue()

Reference: #775

The prior implementation used the exported `AttributeTypes()` method, which in this case was unnecessarily creating a copy of the map when it was only be used for walking the existing data. At scale, such as handling 1000+ objects, this creates unnecessary pressure on the Go garbage collector.

`benchstat` comparison with new benchmark test:

```text
goos: darwin
goarch: arm64
pkg: github.com/hashicorp/terraform-plugin-framework/types/basetypes
                                   │  original   │              proposed               │
                                   │   sec/op    │   sec/op     vs base                │
ObjectValueToTerraformValue1000-10   408.4µ ± 1%   318.4µ ± 1%  -22.02% (p=0.000 n=10)

                                   │   original   │               proposed               │
                                   │     B/op     │     B/op      vs base                │
ObjectValueToTerraformValue1000-10   449.2Ki ± 0%   286.6Ki ± 0%  -36.19% (p=0.000 n=10)

                                   │  original   │              proposed              │
                                   │  allocs/op  │  allocs/op   vs base               │
ObjectValueToTerraformValue1000-10   2.051k ± 0%   2.020k ± 0%  -1.51% (p=0.000 n=10)
```
bflad added a commit that referenced this issue Jul 5, 2023
…rmValue() (#791)

Reference: #775

The prior implementation used the exported `AttributeTypes()` method, which in this case was unnecessarily creating a copy of the map when it was only be used for walking the existing data. At scale, such as handling 1000+ objects, this creates unnecessary pressure on the Go garbage collector.

`benchstat` comparison with new benchmark test:

```text
goos: darwin
goarch: arm64
pkg: github.com/hashicorp/terraform-plugin-framework/types/basetypes
                                   │  original   │              proposed               │
                                   │   sec/op    │   sec/op     vs base                │
ObjectValueToTerraformValue1000-10   408.4µ ± 1%   318.4µ ± 1%  -22.02% (p=0.000 n=10)

                                   │   original   │               proposed               │
                                   │     B/op     │     B/op      vs base                │
ObjectValueToTerraformValue1000-10   449.2Ki ± 0%   286.6Ki ± 0%  -36.19% (p=0.000 n=10)

                                   │  original   │              proposed              │
                                   │  allocs/op  │  allocs/op   vs base               │
ObjectValueToTerraformValue1000-10   2.051k ± 0%   2.020k ± 0%  -1.51% (p=0.000 n=10)
```
@bflad
Copy link
Contributor

bflad commented Jul 5, 2023

@jeremmfr if you have a moment, I'm curious if 9fe2ca5 (which should include terraform-plugin-go#308) offers some slight benefits with your provider. I'm guessing it'll be fairly minimal, but it was low hanging fruit to prevent unnecessary memory allocation and cleanup work for the runtime.

@jeremmfr
Copy link
Author

jeremmfr commented Jul 6, 2023

Hi @bflad , I ran the same previous scenarios with the PR #791, but the difference is not visible on the Plan time.

@alexhung
Copy link

@bflad Any update on this issue? Or any suggestion on workaround?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants