[ci] IndicesClusterStateServiceRandomUpdatesTests.testRandomClusterStateUpdates #32308

andyb-elastic · 2018-07-24T00:17:56Z

Doesn't reproduce. Has occurred 6 times in the last 90 days

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=centos/2576/console

REPRODUCE WITH: ./gradlew :server:test \
  -Dtests.seed=BDB828B6AFCB1CB3 \
  -Dtests.class=org.elasticsearch.indices.cluster.IndicesClusterStateServiceRandomUpdatesTests \
  -Dtests.method="testRandomClusterStateUpdates" \
  -Dtests.security.manager=true \
  -Dtests.locale=hi-IN \
  -Dtests.timezone=Canada/Mountain

FAILURE 0.16s J0 | IndicesClusterStateServiceRandomUpdatesTests.testRandomClusterStateUpdates <<< FAILURES!                                                                                  
   > Throwable #1: java.lang.AssertionError: a replica can only be promoted when active. current: [index_ytqkcgwvfrlcilx][0], node[node_002], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=96qpiMtlQ_ODQQrpk67g6g], unassigned_info[[reason=INDEX_CREATED], at[2018-07-23T23:45:43.608Z], delayed=false, allocation_status[no_attempt]] new: [index_ytqkcgwvfrlcilx][0], node[node_002], [P], recovery_source[existing recovery], s[INITIALIZING], a[id=96qpiMtlQ_ODQQrpk67g6g], unassigned_info[[reason=ALLOCATION_FAILED], at[2018-07-23T23:45:43.639Z], failed_attempts[1], delayed=false, details[failed shard on node [node_003]: fake shard failure, failure Exception[null]], allocation_status[no_attempt]]                                                   
   >    at __randomizedtesting.SeedInfo.seed([BDB828B6AFCB1CB3:C53F2165F0F0CAFA]:0)           
   >    at org.elasticsearch.indices.cluster.AbstractIndicesClusterStateServiceTestCase$MockIndexShard.updateShardState(AbstractIndicesClusterStateServiceTestCase.java:365)                 
   >    at org.elasticsearch.indices.cluster.IndicesClusterStateService.updateShard(IndicesClusterStateService.java:582)                                                                     
   >    at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:529)                                                            
   >    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:230)                                                               
   >    at org.elasticsearch.indices.cluster.IndicesClusterStateServiceRandomUpdatesTests.testRandomClusterStateUpdates(IndicesClusterStateServiceRandomUpdatesTests.java:127)               
   >    at java.lang.Thread.run(Thread.java:748)

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-07-24T00:17:57Z

Pinging @elastic/es-distributed

For #32308

dnhatn · 2018-07-24T01:19:20Z

@bleskes is working on the fix.

…it. primary with the same aId In rare cases it is possible that a nodes gets an instruction to replace a replica shard that's in POST_RECOVERY with a new initializing primary with the same allocation id. This can happen by batching cluster states that include the starting of the replica, with closing of the indices, opening it up again and allocating the primary shard to the node in question. The node should then clean it's initializing replica and replace it with a new initializing primary. Closes elastic#32308

droberts195 · 2018-07-30T08:42:42Z

Another 6.4 failure:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.4+multijob-unix-compatibility/os=opensuse/6/console

…it. primary with the same aId (#32374) In rare cases it is possible that a nodes gets an instruction to replace a replica shard that's in `POST_RECOVERY` with a new initializing primary with the same allocation id. This can happen by batching cluster states that include the starting of the replica, with closing of the indices, opening it up again and allocating the primary shard to the node in question. The node should then clean it's initializing replica and replace it with a new initializing primary. I'm not sure whether the test I added really adds enough value as existing tests found this. The main reason I added is to allow for simpler reproduction and to double check I fixed it. I'm open to discuss if we should keep. Closes #32308

andyb-elastic added >test-failure Triaged test failures from CI :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Jul 24, 2018

andyb-elastic added a commit that referenced this issue Jul 24, 2018

awaitsfix testRandomClusterStateUpdates

d07b4ec

For #32308

dnhatn assigned bleskes Jul 24, 2018

bleskes mentioned this issue Jul 25, 2018

IndicesClusterStateService should replace an init. replica with an init. primary with the same aId #32374

Merged

bleskes closed this as completed in #32374 Jul 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ci] IndicesClusterStateServiceRandomUpdatesTests.testRandomClusterStateUpdates #32308

[ci] IndicesClusterStateServiceRandomUpdatesTests.testRandomClusterStateUpdates #32308

andyb-elastic commented Jul 24, 2018

elasticmachine commented Jul 24, 2018

dnhatn commented Jul 24, 2018

droberts195 commented Jul 30, 2018

[ci] IndicesClusterStateServiceRandomUpdatesTests.testRandomClusterStateUpdates #32308

[ci] IndicesClusterStateServiceRandomUpdatesTests.testRandomClusterStateUpdates #32308

Comments

andyb-elastic commented Jul 24, 2018

elasticmachine commented Jul 24, 2018

dnhatn commented Jul 24, 2018

droberts195 commented Jul 30, 2018