Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote State] NullPointerException when remote publication enabled with remote state disabled #15182

Closed
shiv0408 opened this issue Aug 9, 2024 · 0 comments · Fixed by #15219
Assignees
Labels
bug Something isn't working Cluster Manager

Comments

@shiv0408
Copy link
Member

shiv0408 commented Aug 9, 2024

Describe the bug

If remote state is disabled when node has cluster.remote_store.state.enabled: false in opensearch.yml but remote publication is also enabled with following setting opensearch.experimental.feature.remote_store.publication.enabled=true. The cluster manager tries to publish the state but encounter null pointer exception as remote state is not enabled.

Related component

Cluster Manager

To Reproduce

  1. Add following settings to the opensearch.yml file
node.attr.remote_store.segment.repository: my-fs-repository
node.attr.remote_store.translog.repository: my-fs-repository
node.attr.remote_store.routing_table.repository: my-fs-repository
node.attr.remote_store.repository.my-fs-repository.type: fs
node.attr.remote_store.repository.my-fs-repository.settings.location: ~/os_data/repos/repo-1
cluster.remote_store.state.enabled: true
node.attr.remote_store.state.repository: my-fs-repository
  1. Run the opensearch process with following settings
OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.remote_store.enabled=true -Dopensearch.experimental.feature.replication_type.enabled=true -Dopensearch.experimental.feature.remote_store.routing.enabled=true -Dopensearch.experimental.feature.remote_store.publication.enabled=true -Daws.region=us-east-1" ./build/distribution/local/opensearch-3.0.0-SNAPSHOT/bin/opensearch -E cluster.name=hishiv-cluster -E path.data=~/os_data/master1 -E path.repo=~/os_data/repos -E node.name=master1 -E node.master=true -E node.data=false -E node.ingest=false  -E cluster.initial_master_nodes=master1
  1. Publication failing with following error
[2024-08-09T15:44:24,322][WARN ][o.o.c.c.PublicationTransportHandler] [master1] error sending remote cluster state to {master1}{ldGqZarCS3K9MBuR7idMlQ}{U9B1LEitTMKYs2hpP8S2vA}{127.0.0.1}{127.0.0.1:9300}{mr}{shard_indexing_pressure_enabled=true}
java.lang.NullPointerException: Cannot invoke "org.opensearch.gateway.GatewayMetaState$RemotePersistedState.getLastUploadedManifestFile()" because the return value of "org.opensearch.cluster.coordination.PersistedStateRegistry.getPersistedState(org.opensearch.cluster.coordination.PersistedStateRegistry$PersistedStateType)" is null
	at org.opensearch.cluster.coordination.PublicationTransportHandler$PublicationContext.sendRemoteClusterState(PublicationTransportHandler.java:527) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.PublicationTransportHandler$PublicationContext.sendPublishRequest(PublicationTransportHandler.java:474) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.Coordinator$CoordinatorPublication.sendPublishRequest(Coordinator.java:1843) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.Publication$PublicationTarget.sendPublishRequest(Publication.java:287) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) [?:?]
	at org.opensearch.cluster.coordination.Publication.start(Publication.java:94) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.Coordinator.publish(Coordinator.java:1356) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.service.MasterService.publish(MasterService.java:385) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:367) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:229) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:210) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:252) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:923) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
[2024-08-09T15:44:24,326][WARN ][o.o.c.s.MasterService    ] [master1] failing [Tasks batched with key: org.opensearch.cluster.coordination.JoinHelper and count: 3]: failed to commit cluster state version [1]
org.opensearch.cluster.coordination.FailedToCommitClusterStateException: publishing failed
	at org.opensearch.cluster.coordination.Coordinator.publish(Coordinator.java:1360) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.service.MasterService.publish(MasterService.java:385) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:367) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:229) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:210) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:252) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:923) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: java.lang.ClassCastException: class java.lang.NullPointerException cannot be cast to class org.opensearch.transport.TransportException (java.lang.NullPointerException is in module java.base of loader 'bootstrap'; org.opensearch.transport.TransportException is in unnamed module of loader 'app')
	at org.opensearch.cluster.coordination.Publication$PublicationTarget$PublishResponseHandler.onFailure(Publication.java:410) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.Coordinator$5.onFailure(Coordinator.java:1403) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.PublicationTransportHandler$PublicationContext$1.onFailure(PublicationTransportHandler.java:465) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.PublicationTransportHandler$PublicationContext.sendRemoteClusterState(PublicationTransportHandler.java:571) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.PublicationTransportHandler$PublicationContext.sendPublishRequest(PublicationTransportHandler.java:474) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.Coordinator$CoordinatorPublication.sendPublishRequest(Coordinator.java:1843) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.Publication$PublicationTarget.sendPublishRequest(Publication.java:287) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) ~[?:?]
	at org.opensearch.cluster.coordination.Publication.start(Publication.java:94) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.Coordinator.publish(Coordinator.java:1356) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	... 11 more

Expected behavior

If we are in such inconsistent state where the remote publication is enabled without remote state being enabled. We should fall back to publication over transport call.

Additional Details

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Cluster Manager
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

1 participant