You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GetSegmentFiles transport request times out during requests with the current timeout of 1 minute from the recovery setting - indices.recovery.internal_action_retry_timeout.
To come up with a better timeout option, we can set it dynamically according to the total file segment size (from FileStoreMetadata) and the cluster's network bandwidth.
Without having access to knowledge of the cluster's network bandwidth, we can experiment to set a value of timeout that takes into account segment files' size.
Caused by: org.opensearch.transport.ReceiveTimeoutTransportException: [seed][10.9.0.166:9300][internal:index/shard/replication/get_segment_files] request_id [552738] timed out after [599988ms]
Failure stack trace from benchmarking
2022-09-02T09:34:08,220][ERROR][o.o.i.r.SegmentReplicationTargetService] [data-e20223d0] replication failure
org.opensearch.OpenSearchException: Segment Replication failed
at org.opensearch.indices.replication.SegmentReplicationTargetService$3.onFailure(SegmentReplicationTargetService.java:293) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.action.ActionListener$1.onFailure(ActionListener.java:88) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.action.ActionRunnable.onFailure(ActionRunnable.java:103) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:341) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:120) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.ListenableFuture.lambda$done$0(ListenableFuture.java:112) [opensearch-2.2.0.jar:2.2.0] at java.util.ArrayList.forEach(ArrayList.java:1511) [?:?]
at org.opensearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:112) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.common.util.concurrent.BaseFuture.setException(BaseFuture.java:178) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.ListenableFuture.onFailure(ListenableFuture.java:149) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.action.StepListener.innerOnFailure(StepListener.java:82) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.action.NotifyOnceListener.onFailure(NotifyOnceListener.java:62) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.action.ActionListener$4.onFailure(ActionListener.java:190) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.action.ActionListener$6.onFailure(ActionListener.java:309) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.action.support.RetryableAction$RetryingListener.onFinalFailure(RetryableAction.java:201) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.action.support.RetryableAction$RetryingListener.onFailure(RetryableAction.java:193) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:74) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1379) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1270) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [opensearch-2.2.0.jar:2.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: org.opensearch.transport.ReceiveTimeoutTransportException: [seed][10.9.0.166:9300][internal:index/shard/replication/get_segment_files] request_id [552738] timed out after [599988ms]
at org.opensearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1273) ~[opensearch-2.2.0.jar:2.2.0]
... 4 more
The text was updated successfully, but these errors were encountered:
dreamer-89
changed the title
Adjust dynamic timeout for get_files operation to prevent request timeouts
Adjust dynamic timeout for get_segment_files operation to prevent request timeouts
Sep 2, 2022
GetSegmentFiles transport request times out during requests with the current timeout of 1 minute from the recovery setting - indices.recovery.internal_action_retry_timeout.
To come up with a better timeout option, we can set it dynamically according to the total file segment size (from FileStoreMetadata) and the cluster's network bandwidth.
Without having access to knowledge of the cluster's network bandwidth, we can experiment to set a value of timeout that takes into account segment files' size.
Failure stack trace from benchmarking
The text was updated successfully, but these errors were encountered: