Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed ES domain upgrade error isn't helpful #11061

Open
tomelliff opened this issue Nov 28, 2019 · 10 comments
Open

Failed ES domain upgrade error isn't helpful #11061

tomelliff opened this issue Nov 28, 2019 · 10 comments
Labels
service/elasticsearch Issues and PRs that pertain to the elasticsearch service.

Comments

@tomelliff
Copy link
Contributor

tomelliff commented Nov 28, 2019

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Terraform v0.12.10

Affected Resource(s)

  • aws_elasticsearch_domain

Terraform Configuration Files

# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key: https://keybase.io/hashicorp

Debug Output

The relevant part of the debug log is small so posting it directly here:

2019-11-28T22:56:01.273Z [DEBUG] plugin.terraform-provider-aws_v2.33.99_x4: 2019/11/28 22:56:01 [DEBUG] [aws-sdk-go] DEBUG: Response es/GetUpgradeStatus Details:
2019-11-28T22:56:01.273Z [DEBUG] plugin.terraform-provider-aws_v2.33.99_x4: ---[ RESPONSE ]--------------------------------------
2019-11-28T22:56:01.273Z [DEBUG] plugin.terraform-provider-aws_v2.33.99_x4: HTTP/1.1 200 OK
2019-11-28T22:56:01.273Z [DEBUG] plugin.terraform-provider-aws_v2.33.99_x4: Connection: close
2019-11-28T22:56:01.273Z [DEBUG] plugin.terraform-provider-aws_v2.33.99_x4: Content-Length: 97
2019-11-28T22:56:01.273Z [DEBUG] plugin.terraform-provider-aws_v2.33.99_x4: Content-Type: application/json
2019-11-28T22:56:01.273Z [DEBUG] plugin.terraform-provider-aws_v2.33.99_x4: Date: Thu, 28 Nov 2019 22:56:00 GMT
2019-11-28T22:56:01.273Z [DEBUG] plugin.terraform-provider-aws_v2.33.99_x4: X-Amzn-Requestid: 3f850bbc-1232-11ea-bc06-1fdf099cbf0b
2019-11-28T22:56:01.273Z [DEBUG] plugin.terraform-provider-aws_v2.33.99_x4: 
2019-11-28T22:56:01.273Z [DEBUG] plugin.terraform-provider-aws_v2.33.99_x4: 
2019-11-28T22:56:01.273Z [DEBUG] plugin.terraform-provider-aws_v2.33.99_x4: -----------------------------------------------------
2019-11-28T22:56:01.273Z [DEBUG] plugin.terraform-provider-aws_v2.33.99_x4: 2019/11/28 22:56:01 [DEBUG] [aws-sdk-go] {"StepStatus":"FAILED","UpgradeName":"Upgrade from 6.8 to 7.1","UpgradeStep":"PRE_UPGRADE_CHECK"}
2019/11/28 22:56:01 [DEBUG] module.elasticsearch.aws_elasticsearch_domain.elasticsearch: apply errored, but we're indicating that via the Error pointer rather than returning it: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: %!s(<nil>)

Expected Behavior

My cluster is failing the upgrade eligibility checks but I'd expect to see the error correctly reported by Terraform with something like the following:

Cluster has 1160.0 shards per node which exceeds the setting cluster.max_shards_per_node value 1000

Actual Behavior

Error: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: %!s(<nil>)

Steps to Reproduce

  1. Get an ES cluster in a position that it can't be upgraded for whatever reason
  2. Set the Terraform config to upgrade the version via a valid in place upgrade path
  3. terraform apply

Important Factoids

I've moved from a 2 AZ ES cluster to a 3 AZ ES cluster in place and then immediately moved to 6.8 and then attempted to again upgrade to 7.2 but this is causing the above error on the AWS side. That bit is fine but I'd expect Terraform to properly show the error instead of %!s(<nil>)

I wrote this in place upgrade code but didn't have a good way of inducing an upgrade failure so couldn't really test what happened in that case but it looks like AWS's API doesn't return an error, just the FAILED StepStatus field. The GetUpgradeHistory API endpoint will show the results of any attempted upgrades in reverse chronological order so it's possible we could retrieve the first failed result from that for the domain and return the list of UpgradeStepItem.Issues.

I am wary that I don't know a good way to force an ES cluster into a bad state though so this might be tricky to test once my ES cluster is back in to a good place.

References

@ghost ghost added the service/elasticsearch Issues and PRs that pertain to the elasticsearch service. label Nov 28, 2019
@github-actions github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Nov 28, 2019
@justinretzolk
Copy link
Member

Hey @tomelliff 👋 Thank you for taking the time to file this issue! Given that there's been a number of AWS provider releases since you initially filed it, can you confirm whether you're still experiencing this behavior?

@justinretzolk justinretzolk added waiting-response Maintainers are waiting on response from community or contributor. and removed needs-triage Waiting for first response or review from a maintainer. labels Dec 9, 2021
@obourdon
Copy link
Contributor

@justinretzolk just got it reproduced with Terraform AWS provider 3.75.1 (latest pre 4.0 version)

My branch fixes this as follows

Error: error waiting for Elasticsearch Domain Upgrade (arn:aws:es:eu-west-1:614455314739:domain/logs) to succeed: Upgrade from 6.8 to 7.10 FAILED: PRE_UPGRADE_CHECK

still working on adding appropriate tests and more insights as well as running regression tests

@github-actions github-actions bot removed the waiting-response Maintainers are waiting on response from community or contributor. label Mar 27, 2022
@obourdon
Copy link
Contributor

With a new commit in my branch above I was also able to retrieve more detailed information as follows:

Error: error waiting for Elasticsearch Domain Upgrade (arn:aws:es:eu-west-1:614455314739:domain/logs) to succeed: Upgrade from 6.8 to 7.10 FAILED: PRE_UPGRADE_CHECK

	Cluster has 1491 shards per node which exceeds the setting cluster.max_shards_per_node value 1000

obourdon added a commit to obourdon/terraform-provider-aws that referenced this issue Mar 30, 2022
Failed ES domain upgrade error isn't helpful
@obourdon
Copy link
Contributor

Hi Hashicorp / AWS TF provider core team.

in the past I have submitted some patches against the master repo but my fixed branch is currently based on tag 3.75.1

What would be the appropriate method to submit my fix for this issue please ?

Should I try to cherry-pick the changes in the master ?
Many thanks for any insight

@obourdon
Copy link
Contributor

So far I was not able to successfully run the regression tests agains us-west-1 zone:

=== CONT  TestAccElasticsearchDomainDataSource_Data_basic
=== CONT  TestAccElasticsearchDomain_AdvancedSecurityOptions_userDB
--- PASS: TestAccElasticsearchDomainDataSource_Data_basic (1524.16s)
=== CONT  TestAccElasticsearchDomain_policyIgnoreEquivalent
--- PASS: TestAccElasticsearchDomain_AdvancedSecurityOptions_userDB (1542.69s)
=== CONT  TestAccElasticsearchDomain_disappears
--- PASS: TestAccElasticsearchDomain_policyIgnoreEquivalent (1450.18s)
=== CONT  TestAccElasticsearchDomain_Update_version
--- PASS: TestAccElasticsearchDomain_disappears (1515.88s)
=== CONT  TestAccElasticsearchDomain_WithVolumeType_missing
--- PASS: TestAccElasticsearchDomain_WithVolumeType_missing (1192.05s)
=== CONT  TestAccElasticsearchDomain_UpdateVolume_type
--- PASS: TestAccElasticsearchDomain_Update_version (4165.17s)
=== CONT  TestAccElasticsearchDomain_update
--- PASS: TestAccElasticsearchDomain_UpdateVolume_type (3254.57s)
=== CONT  TestAccElasticsearchDomain_tags
--- PASS: TestAccElasticsearchDomain_tags (2097.99s)
=== CONT  TestAccElasticsearchDomain_nodeToNodeEncryption
--- PASS: TestAccElasticsearchDomain_update (2706.42s)
=== CONT  TestAccElasticsearchDomain_EncryptAtRestSpecify_key
--- PASS: TestAccElasticsearchDomain_nodeToNodeEncryption (1289.54s)
=== CONT  TestAccElasticsearchDomain_EncryptAtRestDefault_key
--- PASS: TestAccElasticsearchDomain_EncryptAtRestSpecify_key (1253.09s)
=== CONT  TestAccElasticsearchDomain_Cluster_zoneAwareness
    domain_test.go:146: Step 1/5 error: Error running apply: exit status 1
        2022/03/30 14:35:29 [DEBUG] Using modified User-Agent: Terraform/0.12.31 HashiCorp-terraform-exec/0.15.0

        Error: Error creating Elasticsearch domain: DisabledOperationException: You don't have permission to select three availability zones

          on terraform_plugin_test.tf line 2, in resource "aws_elasticsearch_domain" "test":
           2: resource "aws_elasticsearch_domain" "test" {


--- FAIL: TestAccElasticsearchDomain_Cluster_zoneAwareness (9.07s)
=== CONT  TestAccElasticsearchDomain_AutoTuneOptions
--- PASS: TestAccElasticsearchDomain_EncryptAtRestDefault_key (1299.12s)
=== CONT  TestAccElasticsearchDomain_internetToVPCEndpoint
--- PASS: TestAccElasticsearchDomain_AutoTuneOptions (1623.00s)
=== CONT  TestAccElasticsearchDomain_VPC_update
panic: test timed out after 4h0m0s

I moved from 3h to 4h without more success (making parallelism set to 2 because of my laptop constraints).
I will increase this a give it another try

@obourdon
Copy link
Contributor

For the 3 zones error I just found out that us-west-1 is only 2 zone will change to us-west-2 (4 zones)

@obourdon
Copy link
Contributor

:-( just a little bit more luck after 8h on us-west-2:

at least the previously failing test passed successfully

=== CONT  TestAccElasticsearchDomainDataSource_Data_basic
=== CONT  TestAccElasticsearchDomain_Update_version
--- PASS: TestAccElasticsearchDomainDataSource_Data_basic (1741.86s)
=== CONT  TestAccElasticsearchDomain_AutoTuneOptions
--- PASS: TestAccElasticsearchDomain_AutoTuneOptions (1723.72s)
=== CONT  TestAccElasticsearchDomain_WithVolumeType_missing
--- PASS: TestAccElasticsearchDomain_Update_version (4243.93s)
=== CONT  TestAccElasticsearchDomain_UpdateVolume_type
--- PASS: TestAccElasticsearchDomain_WithVolumeType_missing (1181.96s)
=== CONT  TestAccElasticsearchDomain_update
--- PASS: TestAccElasticsearchDomain_update (2638.81s)
=== CONT  TestAccElasticsearchDomain_tags
--- PASS: TestAccElasticsearchDomain_UpdateVolume_type (3716.47s)
=== CONT  TestAccElasticsearchDomain_nodeToNodeEncryption
--- PASS: TestAccElasticsearchDomain_tags (1403.58s)
=== CONT  TestAccElasticsearchDomain_EncryptAtRestSpecify_key
--- PASS: TestAccElasticsearchDomain_EncryptAtRestSpecify_key (1374.62s)
=== CONT  TestAccElasticsearchDomain_EncryptAtRestDefault_key
--- PASS: TestAccElasticsearchDomain_nodeToNodeEncryption (2155.89s)
=== CONT  TestAccElasticsearchDomain_policyIgnoreEquivalent
--- PASS: TestAccElasticsearchDomain_policyIgnoreEquivalent (1289.11s)
=== CONT  TestAccElasticsearchDomain_policy
--- PASS: TestAccElasticsearchDomain_EncryptAtRestDefault_key (1399.39s)
=== CONT  TestAccElasticsearchDomain_cognitoOptionsUpdate
--- PASS: TestAccElasticsearchDomain_policy (1249.13s)
=== CONT  TestAccElasticsearchDomain_cognitoOptionsCreateAndRemove
--- PASS: TestAccElasticsearchDomain_cognitoOptionsUpdate (2470.77s)
=== CONT  TestAccElasticsearchDomain_LogPublishingOptions_auditLogs
--- PASS: TestAccElasticsearchDomain_cognitoOptionsCreateAndRemove (2913.92s)
=== CONT  TestAccElasticsearchDomain_LogPublishingOptions_esApplicationLogs
--- PASS: TestAccElasticsearchDomain_LogPublishingOptions_auditLogs (1943.01s)
=== CONT  TestAccElasticsearchDomain_LogPublishingOptions_searchSlowLogs
--- PASS: TestAccElasticsearchDomain_LogPublishingOptions_esApplicationLogs (1630.03s)
=== CONT  TestAccElasticsearchDomain_disappears
--- PASS: TestAccElasticsearchDomain_LogPublishingOptions_searchSlowLogs (1641.53s)
=== CONT  TestAccElasticsearchDomain_LogPublishingOptions_indexSlowLogs
--- PASS: TestAccElasticsearchDomain_disappears (1311.56s)
=== CONT  TestAccElasticsearchDomain_AdvancedSecurityOptions_disabled
--- PASS: TestAccElasticsearchDomain_LogPublishingOptions_indexSlowLogs (1597.73s)
=== CONT  TestAccElasticsearchDomain_AdvancedSecurityOptions_userDB
--- PASS: TestAccElasticsearchDomain_AdvancedSecurityOptions_disabled (1828.39s)
=== CONT  TestAccElasticsearchDomain_customEndpoint
--- PASS: TestAccElasticsearchDomain_AdvancedSecurityOptions_userDB (1547.48s)
=== CONT  TestAccElasticsearchDomain_internetToVPCEndpoint
--- PASS: TestAccElasticsearchDomain_customEndpoint (3026.41s)
=== CONT  TestAccElasticsearchDomain_AdvancedSecurityOptions_iam
--- PASS: TestAccElasticsearchDomain_internetToVPCEndpoint (3265.56s)
=== CONT  TestAccElasticsearchDomainSamlOptions_disappears_Domain
--- PASS: TestAccElasticsearchDomain_AdvancedSecurityOptions_iam (1680.43s)
=== CONT  TestAccElasticsearchDomain_requireHTTPS
--- PASS: TestAccElasticsearchDomainSamlOptions_disappears_Domain (1436.80s)
=== CONT  TestAccElasticsearchDomain_basic
--- PASS: TestAccElasticsearchDomain_basic (1493.76s)
=== CONT  TestAccElasticsearchDomainSamlOptions_Disabled
--- PASS: TestAccElasticsearchDomain_requireHTTPS (2650.19s)
=== CONT  TestAccElasticsearchDomainSamlOptions_Update
--- PASS: TestAccElasticsearchDomainSamlOptions_Disabled (1682.28s)
=== CONT  TestAccElasticsearchDomain_VPC_update
panic: test timed out after 8h0m0s

@obourdon
Copy link
Contributor

any insights on this please ?

obourdon added a commit to squarescale/terraform-provider-aws that referenced this issue May 20, 2022
Failed ES domain upgrade error isn't helpful
@obourdon
Copy link
Contributor

obourdon commented Jun 9, 2022

Can someone help with this please ?

@obourdon
Copy link
Contributor

Anyone ?

obourdon added a commit to squarescale/terraform-provider-aws that referenced this issue Dec 20, 2022
Failed ES domain upgrade error isn't helpful
obourdon added a commit to squarescale/terraform-provider-aws that referenced this issue Jan 6, 2023
Failed ES domain upgrade error isn't helpful
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
service/elasticsearch Issues and PRs that pertain to the elasticsearch service.
Projects
None yet
Development

No branches or pull requests

3 participants