Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The terraform-provider-google_v6.7.0_x5 plugin crashed #19895

Assignees
Labels
bug crash forward/review In review; remove label to forward service/container

Comments

@arueth
Copy link

arueth commented Oct 17, 2024

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to a user, that user is claiming responsibility for the issue.
  • Customers working with a Google Technical Account Manager or Customer Engineer can ask them to reach out internally to expedite investigation and resolution of this issue.

Terraform Version & Provider Version(s)

Terraform v1.9.7
on x86_64/AMD64

  • provider registry.terraform.io/hashicorp/google v6.7.0
  • provider registry.terraform.io/hashicorp/google-beta v6.7.0

Affected Resource(s)

google_container_node_pool
google_compute_managed_ssl_certificate
google_compute_global_address

Terraform Configuration

I am using the Terraform here

Debug Output

No response

Expected Behavior

No response

Actual Behavior

No response

Steps to reproduce

  1. terraform apply

Important Factoids

I haven't been able to reproduce this outside of some automation we are using in Qwiklabs. It does not happen every time and can be quite sporadic.

Stack trace

│ Stack trace from the terraform-provider-google_v6.7.0_x5 plugin:
│ 
│ panic: runtime error: invalid memory address or nil pointer dereference
│ [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x36eca79]
│ 
│ goroutine 2306 [running]:
│ github.com/hashicorp/terraform-provider-google/google/services/container.flattenNodePool(0xc001650b50?,
│ 0xc00032f500, 0xc002628840, {0x0, 0x0})
│ 	github.com/hashicorp/terraform-provider-google/google/services/container/resource_container_node_pool.go:1200
│ +0x12d9
│ github.com/hashicorp/terraform-provider-google/google/services/container.resourceContainerNodePoolRead(0xc001a7db80,
│ {0x4213ba0?, 0xc00032f500})
│ 	github.com/hashicorp/terraform-provider-google/google/services/container/resource_container_node_pool.go:712
│ +0x6d8
│ github.com/hashicorp/terraform-provider-google/google/services/container.resourceContainerNodePoolCreate(0xc001a7db80,
│ {0x4213ba0?, 0xc00032f500?})
│ 	github.com/hashicorp/terraform-provider-google/google/services/container/resource_container_node_pool.go:658
│ +0xfd7
│ github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).create(0x4a35198?,
│ {0x4a35198?, 0xc0008a5d40?}, 0xd?, {0x4213ba0?, 0xc00032f500?})
│ 	github.com/hashicorp/terraform-plugin-sdk/v2@v2.33.0/helper/schema/resource.go:766
│ +0x163
│ github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).Apply(0xc000aa8460,
│ {0x4a35198, 0xc0008a5d40}, 0xc001b508f0, 0xc001a7da00, {0x4213ba0,
│ 0xc00032f500})
│ 	github.com/hashicorp/terraform-plugin-sdk/v2@v2.33.0/helper/schema/resource.go:909
│ +0xa89
│ github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*GRPCProviderServer).ApplyResourceChange(0xc000a99938,
│ {0x4a35198?, 0xc0008a5c50?}, 0xc0015b9040)
│ 	github.com/hashicorp/terraform-plugin-sdk/v2@v2.33.0/helper/schema/grpc_provider.go:1078
│ +0xdbc
│ github.com/hashicorp/terraform-plugin-mux/tf5muxserver.(*muxServer).ApplyResourceChange(0x4a351d0?,
│ {0x4a35198?, 0xc0008a56e0?}, 0xc0015b9040)
│ 	github.com/hashicorp/terraform-plugin-mux@v0.15.0/tf5muxserver/mux_server_ApplyResourceChange.go:36
│ +0x193
│ github.com/hashicorp/terraform-plugin-go/tfprotov5/tf5server.(*server).ApplyResourceChange(0xc0004fa780,
│ {0x4a35198?, 0xc000b7bb00?}, 0xc0021da460)
│ 	github.com/hashicorp/terraform-plugin-go@v0.23.0/tfprotov5/tf5server/server.go:865
│ +0x3d0
│ github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/tfplugin5._Provider_ApplyResourceChange_Handler({0x41b15e0?,
│ 0xc0004fa780}, {0x4a35198, 0xc000b7bb00}, 0xc001a7d000, 0x0)
│ 	github.com/hashicorp/terraform-plugin-go@v0.23.0/tfprotov5/internal/tfplugin5/tfplugin5_grpc.pb.go:518
│ +0x169
│ google.golang.org/grpc.(*Server).processUnaryRPC(0xc0010a4400, {0x4a35198,
│ 0xc000b7b9e0}, {0x4a405c8, 0xc0001b0600}, 0xc001fc9680, 0xc000fdd620,
│ 0x681f2b8, 0x0)
│ 	google.golang.org/grpc@v1.65.0/server.go:1379 +0xe23
│ google.golang.org/grpc.(*Server).handleStream(0xc0010a4400, {0x4a405c8,
│ 0xc0001b0600}, 0xc001fc9680)
│ 	google.golang.org/grpc@v1.65.0/server.go:1790 +0x1016
│ google.golang.org/grpc.(*Server).serveStreams.func2.1()
│ 	google.golang.org/grpc@v1.65.0/server.go:1029 +0x8b
│ created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine
│ 23
│ 	google.golang.org/grpc@v1.65.0/server.go:1040 +0x135
│ 
│ Error: The terraform-provider-google_v6.7.0_x5 plugin crashed!
│ 
│ This is always indicative of a bug within the plugin. It would be immensely
│ helpful if you could report the crash with the plugin's maintainers so that
│ it
│ can be fixed. The output above should help diagnose the issue.

References

No response

@arueth
Copy link
Author

arueth commented Oct 17, 2024

The errors and stack trace are pretty consistent

│ Error: Plugin did not respond
│ 
│   with
google_container_node_pool.gpu_h100x8_a3h8_spot,
│   on container_node_pool.tf line 790, in resource
"google_container_node_pool" "gpu_h100x8_a3h8_spot":
│  790: resource "google_container_node_pool"
"gpu_h100x8_a3h8_spot" {
│ 
│ The plugin encountered an error, and failed to respond to
the
│ plugin.(*GRPCProvider).ApplyResourceChange call. The
plugin logs may
│ contain more details.
╵
╷
│ Error: Request cancelled
│ 
│   with google_container_node_pool.gpu_l4x2_g2s24,
│   on container_node_pool.tf line 884, in resource
"google_container_node_pool" "gpu_l4x2_g2s24":
│  884: resource "google_container_node_pool"
"gpu_l4x2_g2s24" {
│ 
│ The plugin.(*GRPCProvider).ValidateResourceConfig request
was cancelled.
╵
╷
│ Error: Plugin did not respond
│ 
│   with google_container_node_pool.gpu_l4x2_g2s24_res,
│   on container_node_pool.tf line 1058, in resource
"google_container_node_pool" "gpu_l4x2_g2s24_res":
│ 1058: resource "google_container_node_pool"
"gpu_l4x2_g2s24_res" {
│ 
│ The plugin encountered an error, and failed to respond to
the
│ plugin.(*GRPCProvider).ApplyResourceChange call. The
plugin logs may
│ contain more details.
╵
╷
│ Error: Plugin did not respond
│ 
│   with google_container_node_pool.gpu_l4x2_g2s24_spot,
│   on container_node_pool.tf line 1143, in resource
"google_container_node_pool" "gpu_l4x2_g2s24_spot":
│ 1143: resource "google_container_node_pool"
"gpu_l4x2_g2s24_spot" {
│ 
│ The plugin encountered an error, and failed to respond to
the
│ plugin.(*GRPCProvider).ApplyResourceChange call. The
plugin logs may
│ contain more details.
╵
╷
│ Error: Request cancelled
│ 
│   with
google_compute_managed_ssl_certificate.external_gateway,
│   on gateway.tf line 39, in resource
"google_compute_managed_ssl_certificate" "external_gateway":
│   39: resource "google_compute_managed_ssl_certificate"
"external_gateway" {
│ 
│ The plugin.(*GRPCProvider).ValidateResourceConfig request
was cancelled.
╵
╷
│ Error: Request cancelled
│ 
│   with
google_compute_global_address.external_gateway_https,
│   on gateway.tf line 52, in resource
"google_compute_global_address" "external_gateway_https":
│   52: resource "google_compute_global_address"
"external_gateway_https" {
│ 
│ The plugin.(*GRPCProvider).ValidateResourceConfig request
was cancelled.
╵
╷
│ Error: Request cancelled
│ 
│   with google_iap_client.ray_head_client,
│   on gateway.tf line 136, in resource "google_iap_client"
"ray_head_client":
│  136: resource "google_iap_client" "ray_head_client"
{
│ 
│ The plugin.(*GRPCProvider).ValidateResourceConfig request
was cancelled.

@c2thorn
Copy link
Collaborator

c2thorn commented Oct 17, 2024

Hi @arueth,

I'm going to add a nil check to that logic to prevent this crash with GoogleCloudPlatform/magic-modules#12037

However looking into it, it seems that not many users are running into this and we've accessed the np.Management object without nil checking it for several years. I'm curious if you are running into a rare case. Can you supply a debug log that shows the http request? Either sanitized and posted here, or in a gpaste link since you are a Googler.

@c2thorn
Copy link
Collaborator

c2thorn commented Oct 17, 2024

The configuration you've linked should have set the management field: https://github.com/GoogleCloudPlatform/accelerated-platforms/blob/4849714113ffdb56597c4d130769eacc4a9d3a0a/platforms/gke-aiml/playground/container_cluster.tf#L83

Possibly there could be non-Terraform interference in your environment?

@arueth
Copy link
Author

arueth commented Oct 18, 2024

Hi @arueth,

I'm going to add a nil check to that logic to prevent this crash with GoogleCloudPlatform/magic-modules#12037

However looking into it, it seems that not many users are running into this and we've accessed the np.Management object without nil checking it for several years. I'm curious if you are running into a rare case. Can you supply a debug log that shows the http request? Either sanitized and posted here, or in a gpaste link since you are a Googler.

I'm using this repo to spin up environments in Qwiklabs. I had never run into this issue until I started doing the Qwiklabs where I'm spinning up 100s of projects at a time. I would say that I get this failure maybe 20% of the time or less.

I will try and get the debug logs early next week, today I'm running a workshop and more than likely won't have a chance.

@c2thorn
Copy link
Collaborator

c2thorn commented Oct 18, 2024

I'm using this repo to spin up environments in Qwiklabs. I had never run into this issue until I started doing the Qwiklabs where I'm spinning up 100s of projects at a time. I would say that I get this failure maybe 20% of the time or less.

I will try and get the debug logs early next week, today I'm running a workshop and more than likely won't have a chance.

Gotcha. The nil check has been merged so it should be resolved altogether. Good luck on your workshop!

@arueth
Copy link
Author

arueth commented Oct 24, 2024

Do you know what version this will make it into.

@c2thorn
Copy link
Collaborator

c2thorn commented Oct 24, 2024

Do you know what version this will make it into.

It is part of the 6.9.0 branch that is scheduled to get released Monday

Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.