-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrading ECK to 2.6.0 and ES to 8.6.0 causes ES to fail to bootstrap/form a cluster #6303
Comments
This seems to be related to file based settings feature (#6148), if I manually disable it in the code the upgrade runs fine. From one of the node that cannot join the cluster: {
"@timestamp": "2023-01-11T07:53:06.100Z",
"log.level": "WARN",
"message": "master not discovered or elected yet, an election requires at least 2 nodes with ids from [YpB5rhdaSuOEMb_fG3npaA, ROL_DH0rQ4W_6oxuL-an8Q, Hgd4D5KsT8SgE5jRF0rlbg], have discovered possible quorum [{elasticsearch-sample-es-default-0}{ROL_DH0rQ4W_6oxuL-an8Q}{CXv0SmFDSNeUjeXS2lwqQw}{elasticsearch-sample-es-default-0}{10.92.146.16}{10.92.146.16:9300}{dilm}, {elasticsearch-sample-es-default-2}{YpB5rhdaSuOEMb_fG3npaA}{6n90WvzbRriBvtG6Ne8s7Q}{elasticsearch-sample-es-default-2}{10.92.144.22}{10.92.144.22:9300}{dilm}, {elasticsearch-sample-es-default-1}{Hgd4D5KsT8SgE5jRF0rlbg}{Rxoftt8STh-veVmP7VKTxQ}{elasticsearch-sample-es-default-1}{10.92.145.15}{10.92.145.15:9300}{dilm}]; discovery will continue using [10.92.144.22:9300, 10.92.145.15:9300] from hosts providers and [{elasticsearch-sample-es-default-0}{ROL_DH0rQ4W_6oxuL-an8Q}{CXv0SmFDSNeUjeXS2lwqQw}{elasticsearch-sample-es-default-0}{10.92.146.16}{10.92.146.16:9300}{dilm}] from last-known cluster state; node term 14, last-accepted version 168 in term 13; joining [{elasticsearch-sample-es-default-2}{YpB5rhdaSuOEMb_fG3npaA}{6n90WvzbRriBvtG6Ne8s7Q}{elasticsearch-sample-es-default-2}{10.92.144.22}{10.92.144.22:9300}{dilm}] in term [14] has status [waiting for response] after [6.6m/398570ms]",
"ecs.version": "1.2.0",
"service.name": "ES_ECS",
"event.dataset": "elasticsearch.server",
"process.thread.name": "elasticsearch[elasticsearch-sample-es-default-0][cluster_coordination][T#1]",
"log.logger": "org.elasticsearch.cluster.coordination.ClusterFormationFailureHelper",
"elasticsearch.node.name": "elasticsearch-sample-es-default-0",
"elasticsearch.cluster.name": "elasticsearch-sample"
} The above IP addresses are correct:
On the other hand {
"@timestamp": "2023-01-11T07:59:15.273Z",
"log.level": "WARN",
"message": "failed to retrieve stats for node [ROL_DH0rQ4W_6oxuL-an8Q]",
"ecs.version": "1.2.0",
"service.name": "ES_ECS",
"event.dataset": "elasticsearch.server",
"process.thread.name": "elasticsearch[elasticsearch-sample-es-default-2][generic][T#4]",
"log.logger": "org.elasticsearch.cluster.InternalClusterInfoService",
"elasticsearch.cluster.uuid": "WluJfwzIRnex8xc5DV8yKQ",
"elasticsearch.node.id": "YpB5rhdaSuOEMb_fG3npaA",
"elasticsearch.node.name": "elasticsearch-sample-es-default-2",
"elasticsearch.cluster.name": "elasticsearch-sample",
"error.type": "org.elasticsearch.transport.NodeNotConnectedException",
"error.message": "[elasticsearch-sample-es-default-0][10.92.146.15:9300] Node not connected",
"error.stack_trace": "org.elasticsearch.transport.NodeNotConnectedException: [elasticsearch-sample-es-default-0][10.92.146.15:9300] Node not connected"
} |
Root cause of the issue is a bug in Elasticsearch see elastic/elasticsearch#92812 for more details. |
I'm closing this issue as ECK
If some Elasticsearch control plane nodes are not joining the cluster with a message similar to
|
Thanks @barkbay! |
Bug Report
What did you do?
Upgraded ECK to 2.6.0 and ES to 8.6.0
What did you expect to see?
Elasticsearch to perform a rolling upgrade and continue functioning.
What did you see instead? Under which circumstances?
The master nodes failed to successfully form a cluster after ECK performed the rolling upgrade.
Environment
2.5.0 > 2.6.0
GKE
Upgrading to 8.6.0 without upgrading ECK to 2.6.0 does not seem to cause the same problem.
The text was updated successfully, but these errors were encountered: