Missing PersistenceVolume settings for bootstrap pod #897

evheniyt · 2024-11-08T12:47:43Z

          I have also experienced unstable cluster bootstrap. I have fully recreated a cluster multiple times and periodically I saw that the cluster was stacked on bootstrapping the second node.

Eventually, I have found a correlation between this issue and a recreation of bootstrap pod.
We are using Karpenter and sometimes, during bootstrap process it could decide to move bootstrap pod to another node. When that happens, the cluster creation stack with this error:

opensearch [2024-10-29T06:38:10,310][WARN ][o.o.c.c.Coordinator      ] [opensearch-primary-bootstrap-0] failed to validate incoming join request from node [{opensearch-primary-nodes-0}{9zZmg5EGRpidHf_0OwLUyA}{kV9
e6qUTSsmvj1lUP-2QjA}{opensearch-primary-nodes-0}{10.152.42.19:9300}{dm}{shard_indexing_pressure_enabled=true}]                                                                                                      
opensearch org.opensearch.transport.RemoteTransportException: [opensearch-primary-nodes-0][10.152.42.19:9300][internal:cluster/coordination/join/validate_compressed]                                               
opensearch Caused by: org.opensearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid 7QJiU55FRcWvBidZD_MF6A than local cluster uuid U4a0ix4h
TwCvij0JF9qoEw, rejecting

I believe it is caused by the fact that bootstrap pod is not using persistent disk, and if it is restarted it gets a new cluster UUID which is non equal with the UUID on node-0

Originally posted by @evheniyt in #811 (comment)

The text was updated successfully, but these errors were encountered:

evheniyt · 2024-11-08T12:50:18Z

@swoehrl-mw @prudhvigodithi
I want to add support of PV for the bootstrap pod, WDYT?

swoehrl-mw · 2024-11-12T13:31:45Z

I want to add support of PV for the bootstrap pod, WDYT?

@evheniyt Fine for me. I think the bootstrap pod being restarted was not a scenario ever considered as it is only running for a few minutes. IMO there are no reasons against having a PV for the pod, but it should be cleaned up afterwards.

prudhvigodithi · 2024-12-24T10:45:29Z

Hey @evheniyt ya fine with me as well, here are some PR's https://github.com/opensearch-project/opensearch-k8s-operator/pulls?q=bootstrap where additional configurations are added to the bootstrap pod, please take a look and let us know if you are interested to in adding PersistenceVolume settings for bootstrap pod.
@getsaurabh02

github-project-automation bot moved this to 🆕 New in Engineering Effectiveness Board Nov 8, 2024

github-project-automation bot added this to Engineering Effectiveness Board Nov 8, 2024

github-actions bot added the untriaged Issues that have not yet been triaged label Nov 8, 2024

swoehrl-mw added bug Something isn't working and removed untriaged Issues that have not yet been triaged labels Nov 12, 2024

swoehrl-mw changed the title ~~[BUG] Missing PersistenceVolume settings for bootstrap pod~~ Missing PersistenceVolume settings for bootstrap pod Nov 12, 2024

peterzhuamazon moved this to 📦 Backlog in Engineering Effectiveness Board Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing PersistenceVolume settings for bootstrap pod #897

Missing PersistenceVolume settings for bootstrap pod #897

evheniyt commented Nov 8, 2024

evheniyt commented Nov 8, 2024

swoehrl-mw commented Nov 12, 2024

prudhvigodithi commented Dec 24, 2024

Missing PersistenceVolume settings for bootstrap pod #897

Missing PersistenceVolume settings for bootstrap pod #897

Comments

evheniyt commented Nov 8, 2024

evheniyt commented Nov 8, 2024

swoehrl-mw commented Nov 12, 2024

prudhvigodithi commented Dec 24, 2024