Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CcrRepositoryIT.testIndividualActionsTimeout failing #38027

Closed
pgomulka opened this issue Jan 30, 2019 · 5 comments · Fixed by #38035 or #38758
Closed

CcrRepositoryIT.testIndividualActionsTimeout failing #38027

pgomulka opened this issue Jan 30, 2019 · 5 comments · Fixed by #38035 or #38758
Assignees
Labels
:Core/Infra/Core Core issues without another label :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >test-failure Triaged test failures from CI

Comments

@pgomulka
Copy link
Contributor

Failing on intake-master
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+intake/1648/console

but unfortunately does not reproduce for me

./gradlew :x-pack:plugin:ccr:internalClusterTest \
  -Dtests.seed=75F3403255B8F3AF \
  -Dtests.class=org.elasticsearch.xpack.ccr.CcrRepositoryIT \
  -Dtests.method="testIndividualActionsTimeout" \
  -Dtests.security.manager=true \
  -Dtests.locale=ar-AE \
  -Dtests.timezone=Europe/Saratov \
  -Dcompiler.java=11 \
  -Druntime.java=8

relates #37887

@pgomulka pgomulka added :Core/Infra/Core Core issues without another label >test-failure Triaged test failures from CI :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features labels Jan 30, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@pgomulka
Copy link
Contributor Author

assertion failure log

java.lang.AssertionError: expected:<0> but was:<1>
	at __randomizedtesting.SeedInfo.seed([75F3403255B8F3AF:11EEACEEE6CAC19C]:0)
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:834)
	at org.junit.Assert.assertEquals(Assert.java:645)
	at org.junit.Assert.assertEquals(Assert.java:631)
	at org.elasticsearch.xpack.ccr.CcrRepositoryIT.testIndividualActionsTimeout(CcrRepositoryIT.java:370)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)

Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this issue Jan 30, 2019
This fixes elastic#38027. Currently we assert that all shards have failed.
However, it is possible that some shards do not have segement files
created yet. The action that we block is fetching these segement files
so it is possible that some shards successfully recover.

This commit changes the assertion to ensure that at least some of the
shards have failed.
Tim-Brooks added a commit that referenced this issue Jan 30, 2019
This fixes #38027. Currently we assert that all shards have failed.
However, it is possible that some shards do not have segement files
created yet. The action that we block is fetching these segement files
so it is possible that some shards successfully recover.

This commit changes the assertion to ensure that at least some of the
shards have failed.
Tim-Brooks added a commit that referenced this issue Jan 31, 2019
This fixes #38027. Currently we assert that all shards have failed.
However, it is possible that some shards do not have segement files
created yet. The action that we block is fetching these segement files
so it is possible that some shards successfully recover.

This commit changes the assertion to ensure that at least some of the
shards have failed.
@alpar-t
Copy link
Contributor

alpar-t commented Jan 31, 2019

Seems like this fails less often but there are still failures

Example build failure

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=ubuntu-14.04&&virtual/215/console
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+intake/1684/console

Reproduction line

does not reproduce locally

./gradlew :x-pack:plugin:ccr:internalClusterTest -Dtests.seed=36B1DA4D705F0B8F -Dtests.class=org.elasticsearch.xpack.ccr.CcrRepositoryIT -Dtests.method="testIndividualActionsTimeout" -Dtests.security.manager=true -Dtests.locale=mt -Dtests.timezone=Atlantic/Madeira -Dcompiler.java=11 -Druntime.java=8

Example relevant log:

11:26:45   1> org.apache.lucene.util.SetOnce$AlreadySetException: The object cannot be set twice!
11:26:45   1> 	at org.apache.lucene.util.SetOnce.set(SetOnce.java:69) ~[lucene-core-8.0.0-snapshot-83f9835.jar:8.0.0-snapshot-83f9835 83f9835a47a00a2ec58a4cf5fc0d492497cf7898 - jpountz - 2019-01-21 13:06:00]
11:26:45   1> 	at org.elasticsearch.common.logging.NodeAndClusterIdConverter.setNodeIdAndClusterId(NodeAndClusterIdConverter.java:59) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
11:26:45   1> 	at org.elasticsearch.common.logging.NodeAndClusterIdStateListener.onNewClusterState(NodeAndClusterIdStateListener.java:69) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
11:26:45   1> 	at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onNewClusterState(ClusterStateObserver.java:308) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
11:26:45   1> 	at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.clusterChanged(ClusterStateObserver.java:193) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
11:26:45   1> 	at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateListeners$6(ClusterApplierService.java:481) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
11:26:45   1> 	at java.util.concurrent.ConcurrentHashMap$KeySpliterator.forEachRemaining(ConcurrentHashMap.java:3527) [?:1.8.0_202]
11:26:45   1> 	at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:743) [?:1.8.0_202]
11:26:45   1> 	at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) [?:1.8.0_202]
11:26:45   1> 	at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateListeners(ClusterApplierService.java:478) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
11:26:45   1> 	at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:467) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
11:26:45   1> 	at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:414) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
11:26:45   1> 	at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:165) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
11:26:45   1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
11:26:45   1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
11:26:45   1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
11:26:45   1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_202]
11:26:45   1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_202]
11:26:45   1> 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]


11:26:45 FAILURE 14.9s J3 | CcrRepositoryIT.testIndividualActionsTimeout <<< FAILURES!
11:26:45    > Throwable #1: java.lang.AssertionError: 
11:26:45    > Expected: a value greater than <0>
11:26:45    >      but: <0> was equal to <0>
11:26:45    > 	at __randomizedtesting.SeedInfo.seed([36B1DA4D705F0B8F:52AC3691C32D39BC]:0)
11:26:45    > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
11:26:45    > 	at org.elasticsearch.xpack.ccr.CcrRepositoryIT.testIndividualActionsTimeout(CcrRepositoryIT.java:372)
11:26:45    > 	at java.lang.Thread.run(Thread.java:748)

Frequency

6 time Today

@alpar-t alpar-t reopened this Jan 31, 2019
alpar-t added a commit that referenced this issue Jan 31, 2019
alpar-t added a commit that referenced this issue Jan 31, 2019
@pgomulka
Copy link
Contributor Author

pgomulka commented Jan 31, 2019

The exceptions from the NodeAndClusterIdListener should be fixed with this PR
#38110
But these were not the cause of the test failure (the exception is caught in ClusterApplierService and just logs failed to notify ClusterStateListener+stacktrace)

Tim-Brooks added a commit that referenced this issue Feb 13, 2019
This commit adds a `ListenerTimeouts` class that will wrap a
`ActionListener` in a listener with a timeout scheduled on the generic
thread pool. If the timeout expires before the listener is completed,
`onFailure` will be called with an `ElasticsearchTimeoutException`.

Timeouts for the get ccr file chunk action are implemented using this
functionality. Additionally, this commit attempts to fix #38027 by also
blocking proxied get ccr file chunk actions. This test being un-muted is
useful to verify the timeout functionality.
Tim-Brooks added a commit that referenced this issue Feb 16, 2019
This commit adds a `ListenerTimeouts` class that will wrap a
`ActionListener` in a listener with a timeout scheduled on the generic
thread pool. If the timeout expires before the listener is completed,
`onFailure` will be called with an `ElasticsearchTimeoutException`.

Timeouts for the get ccr file chunk action are implemented using this
functionality. Additionally, this commit attempts to fix #38027 by also
blocking proxied get ccr file chunk actions. This test being un-muted is
useful to verify the timeout functionality.
Tim-Brooks added a commit that referenced this issue Feb 16, 2019
This commit adds a `ListenerTimeouts` class that will wrap a
`ActionListener` in a listener with a timeout scheduled on the generic
thread pool. If the timeout expires before the listener is completed,
`onFailure` will be called with an `ElasticsearchTimeoutException`.

Timeouts for the get ccr file chunk action are implemented using this
functionality. Additionally, this commit attempts to fix #38027 by also
blocking proxied get ccr file chunk actions. This test being un-muted is
useful to verify the timeout functionality.
Tim-Brooks added a commit that referenced this issue Feb 16, 2019
This commit adds a `ListenerTimeouts` class that will wrap a
`ActionListener` in a listener with a timeout scheduled on the generic
thread pool. If the timeout expires before the listener is completed,
`onFailure` will be called with an `ElasticsearchTimeoutException`.

Timeouts for the get ccr file chunk action are implemented using this
functionality. Additionally, this commit attempts to fix #38027 by also
blocking proxied get ccr file chunk actions. This test being un-muted is
useful to verify the timeout functionality.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >test-failure Triaged test failures from CI
Projects
None yet
4 participants