Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] CloseWhileRelocatingShardsIT.testCloseWhileRelocatingShards failures #38090

Closed
romseygeek opened this issue Jan 31, 2019 · 6 comments · Fixed by #38728
Closed

[CI] CloseWhileRelocatingShardsIT.testCloseWhileRelocatingShards failures #38090

romseygeek opened this issue Jan 31, 2019 · 6 comments · Fixed by #38728
Assignees
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >test-failure Triaged test failures from CI

Comments

@romseygeek
Copy link
Contributor

See https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+internalClusterTest/173/console for an example failure

It looks to be a timing issue, it doesn't reproduce for me:

./gradlew :server:integTest \
  -Dtests.seed=B04B0599D97CB9C5 \
  -Dtests.class=org.elasticsearch.indices.state.CloseWhileRelocatingShardsIT \
  -Dtests.method="testCloseWhileRelocatingShards" \
  -Dtests.security.manager=true \
  -Dtests.locale=ro \
  -Dtests.timezone=Europe/London \
  -Dcompiler.java=11 \
  -Druntime.java=8
2> WARNING: Uncaught exception in thread: Thread[elasticsearch[node_sd2][generic][T#4],5,TGRP-CloseWhileRelocatingShardsIT]
  2> java.lang.AssertionError: java.lang.InterruptedException
  2> 	at __randomizedtesting.SeedInfo.seed([B04B0599D97CB9C5]:0)
  2> 	at org.elasticsearch.indices.state.CloseWhileRelocatingShardsIT.lambda$testCloseWhileRelocatingShards$1(CloseWhileRelocatingShardsIT.java:159)
  2> 	at org.elasticsearch.test.transport.StubbableTransport$WrappedConnection.sendRequest(StubbableTransport.java:223)
  2> 	at org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:626)
  2> 	at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:541)
  2> 	at org.elasticsearch.transport.TransportService.submitRequest(TransportService.java:503)
  2> 	at org.elasticsearch.transport.TransportService.submitRequest(TransportService.java:494)
  2> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.lambda$doRecovery$3(PeerRecoveryTargetService.java:263)
  2> 	at org.elasticsearch.common.util.CancellableThreads.executeIO(CancellableThreads.java:108)
  2> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:257)
  2> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$500(PeerRecoveryTargetService.java:84)
  2> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:658)
  2> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751)
  2> 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
  2> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  2> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  2> 	at java.lang.Thread.run(Thread.java:748)
  2> Caused by: java.lang.InterruptedException
  2> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
  2> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
  2> 	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
  2> 	at org.elasticsearch.indices.state.CloseWhileRelocatingShardsIT.lambda$testCloseWhileRelocatingShards$1(CloseWhileRelocatingShardsIT.java:156)
  2> 	... 15 more
@romseygeek romseygeek added the >test-failure Triaged test failures from CI label Jan 31, 2019
@alpar-t
Copy link
Contributor

alpar-t commented Jan 31, 2019

I don't see any failures on master, maybe the fix in #37274 didn't work for 6.x ?

@alpar-t
Copy link
Contributor

alpar-t commented Jan 31, 2019

This was muted in fe0a52e

@tlrx
Copy link
Member

tlrx commented Jan 31, 2019

@atorok The issue here is a bit different so I told Alan to create a new issue.

tlrx added a commit to tlrx/elasticsearch that referenced this issue Feb 1, 2019
tlrx added a commit that referenced this issue Feb 1, 2019
The current CloseWhileRelocatingShardsIT test adds some "send behavior" 
rule to a target node's mocked transport service in order to detect when shard 
relocating are started. These rules are never cleared and prevent the test to 
complete normally after the rebalance is re-enabled again.

This commit changes the test so that rules are cleared and most verifications 
are done before the rebalance is reenabled again.

Closes #38090
tlrx added a commit that referenced this issue Feb 1, 2019
The current CloseWhileRelocatingShardsIT test adds some "send behavior"
rule to a target node's mocked transport service in order to detect when shard
relocating are started. These rules are never cleared and prevent the test to
complete normally after the rebalance is re-enabled again.

This commit changes the test so that rules are cleared and most verifications
are done before the rebalance is reenabled again.

Closes #38090
@tlrx
Copy link
Member

tlrx commented Feb 1, 2019

@tlrx tlrx reopened this Feb 1, 2019
@colings86 colings86 added the :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Feb 5, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@jtibshirani
Copy link
Contributor

I just saw another failure pop up on master intake: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob+fast+part1/3897/

@tlrx I'm having trouble telling whether I should re-open this issue or file a new one, figured I'd just link to the failure so you're aware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants