[CI] CloseWhileRelocatingShardsIT.testCloseWhileRelocatingShards failures #38090

romseygeek · 2019-01-31T13:31:21Z

See https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+internalClusterTest/173/console for an example failure

It looks to be a timing issue, it doesn't reproduce for me:

./gradlew :server:integTest \
  -Dtests.seed=B04B0599D97CB9C5 \
  -Dtests.class=org.elasticsearch.indices.state.CloseWhileRelocatingShardsIT \
  -Dtests.method="testCloseWhileRelocatingShards" \
  -Dtests.security.manager=true \
  -Dtests.locale=ro \
  -Dtests.timezone=Europe/London \
  -Dcompiler.java=11 \
  -Druntime.java=8

2> WARNING: Uncaught exception in thread: Thread[elasticsearch[node_sd2][generic][T#4],5,TGRP-CloseWhileRelocatingShardsIT]
  2> java.lang.AssertionError: java.lang.InterruptedException
  2> 	at __randomizedtesting.SeedInfo.seed([B04B0599D97CB9C5]:0)
  2> 	at org.elasticsearch.indices.state.CloseWhileRelocatingShardsIT.lambda$testCloseWhileRelocatingShards$1(CloseWhileRelocatingShardsIT.java:159)
  2> 	at org.elasticsearch.test.transport.StubbableTransport$WrappedConnection.sendRequest(StubbableTransport.java:223)
  2> 	at org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:626)
  2> 	at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:541)
  2> 	at org.elasticsearch.transport.TransportService.submitRequest(TransportService.java:503)
  2> 	at org.elasticsearch.transport.TransportService.submitRequest(TransportService.java:494)
  2> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.lambda$doRecovery$3(PeerRecoveryTargetService.java:263)
  2> 	at org.elasticsearch.common.util.CancellableThreads.executeIO(CancellableThreads.java:108)
  2> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:257)
  2> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$500(PeerRecoveryTargetService.java:84)
  2> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:658)
  2> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751)
  2> 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
  2> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  2> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  2> 	at java.lang.Thread.run(Thread.java:748)
  2> Caused by: java.lang.InterruptedException
  2> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
  2> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
  2> 	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
  2> 	at org.elasticsearch.indices.state.CloseWhileRelocatingShardsIT.lambda$testCloseWhileRelocatingShards$1(CloseWhileRelocatingShardsIT.java:156)
  2> 	... 15 more

The text was updated successfully, but these errors were encountered:

alpar-t · 2019-01-31T15:07:16Z

I don't see any failures on master, maybe the fix in #37274 didn't work for 6.x ?

alpar-t · 2019-01-31T15:10:01Z

This was muted in fe0a52e

tlrx · 2019-01-31T15:12:24Z

@atorok The issue here is a bit different so I told Alan to create a new issue.

Closes elastic#38090

The current CloseWhileRelocatingShardsIT test adds some "send behavior" rule to a target node's mocked transport service in order to detect when shard relocating are started. These rules are never cleared and prevent the test to complete normally after the rebalance is re-enabled again. This commit changes the test so that rules are cleared and most verifications are done before the rebalance is reenabled again. Closes #38090

tlrx · 2019-02-01T15:58:09Z

Reopened:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+internalClusterTest/283/consoleFull

elasticmachine · 2019-02-05T11:58:47Z

Pinging @elastic/es-distributed

Closes #38090

jtibshirani · 2020-02-26T00:25:23Z

I just saw another failure pop up on master intake: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob+fast+part1/3897/

@tlrx I'm having trouble telling whether I should re-open this issue or file a new one, figured I'd just link to the failure so you're aware.

romseygeek added the >test-failure Triaged test failures from CI label Jan 31, 2019

romseygeek assigned tlrx Jan 31, 2019

tlrx added a commit to tlrx/elasticsearch that referenced this issue Feb 1, 2019

Clear send behavior rule in CloseWhileRelocatingShardsIT

05fff89

Closes elastic#38090

tlrx mentioned this issue Feb 1, 2019

Clear send behavior rule in CloseWhileRelocatingShardsIT #38159

Merged

tlrx closed this as completed in #38159 Feb 1, 2019

tlrx reopened this Feb 1, 2019

colings86 added the :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Feb 5, 2019

tlrx mentioned this issue Feb 11, 2019

Fix CloseWhileRelocatingShardsIT #38728

Merged

tlrx closed this as completed in #38728 Feb 12, 2019

tlrx added a commit that referenced this issue Feb 12, 2019

Fix CloseWhileRelocatingShardsIT (#38728)

f12ac5e

Closes #38090

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] CloseWhileRelocatingShardsIT.testCloseWhileRelocatingShards failures #38090

[CI] CloseWhileRelocatingShardsIT.testCloseWhileRelocatingShards failures #38090

romseygeek commented Jan 31, 2019

alpar-t commented Jan 31, 2019

alpar-t commented Jan 31, 2019

tlrx commented Jan 31, 2019

tlrx commented Feb 1, 2019

elasticmachine commented Feb 5, 2019

jtibshirani commented Feb 26, 2020

[CI] CloseWhileRelocatingShardsIT.testCloseWhileRelocatingShards failures #38090

[CI] CloseWhileRelocatingShardsIT.testCloseWhileRelocatingShards failures #38090

Comments

romseygeek commented Jan 31, 2019

alpar-t commented Jan 31, 2019

alpar-t commented Jan 31, 2019

tlrx commented Jan 31, 2019

tlrx commented Feb 1, 2019

elasticmachine commented Feb 5, 2019

jtibshirani commented Feb 26, 2020