Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] InternalTestClusterTests.testDifferentRolesMaintainPathOnRestart failure in master #37462

Closed
matriv opened this issue Jan 15, 2019 · 9 comments
Assignees
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >test-failure Triaged test failures from CI

Comments

@matriv
Copy link
Contributor

matriv commented Jan 15, 2019

Log: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+intake/1273/consoleFull
Reproduced locally with:

./gradlew :test:framework:unitTest -Dtests.seed=431709D857335600\
-Dtests.class=org.elasticsearch.test.test.InternalTestClusterTests \
-Dtests.method="testDifferentRolesMaintainPathOnRestart" \
-Dtests.security.manager=true \
-Dtests.locale=ja-JP \
-Dtests.timezone=Europe/Zagreb \
-Dcompiler.java=11 -Druntime.java=8

(maybe needs to run a few iterations...)

From local console:

ERROR   36.3s | InternalTestClusterTests.testDifferentRolesMaintainPathOnRestart <<< FAILURES!
   > Throwable #1: java.lang.IllegalStateException: cluster failed to form with expected nodes [{d1}{vQj9SDp4SSytaQKFOBgpuw}{UheL3S_gQqmtJDn_4vJygw}{127.0.0.1}{127.0.0.1:64519}, {m3}{OAdzjyj2QX2TRScnKvSgGg}{W1HerG1cSMGFpO9z5zL0nA}{127.0.0.1}{127.0.0.1:64522}, {c0}{HFBEKTvPTuivc6kj6p9jWA}{T7o5UO1bSbi8DNLHUfpXKA}{127.0.0.1}{127.0.0.1:64516}, {d2}{HRwBysDhSWC9JeljyWIP8Q}{Aa5z2JHlQ9uVyhhMgQW4DQ}{127.0.0.1}{127.0.0.1:64517}, {m4}{KAt4eG1ORqaOH46A5Ptn0A}{I-deMwVNSCGbxZhyS4LbCA}{127.0.0.1}{127.0.0.1:64523}] and actual nodes nodes:
   >    {d2}{HRwBysDhSWC9JeljyWIP8Q}{Aa5z2JHlQ9uVyhhMgQW4DQ}{127.0.0.1}{127.0.0.1:64517}
   >    {m4}{KAt4eG1ORqaOH46A5Ptn0A}{I-deMwVNSCGbxZhyS4LbCA}{127.0.0.1}{127.0.0.1:64523}, local, master
   >    at __randomizedtesting.SeedInfo.seed([431709D857335600:685839B7C408A5F5]:0)
   >    at org.elasticsearch.test.InternalTestCluster.validateClusterFormed(InternalTestCluster.java:1181)
   >    at org.elasticsearch.test.InternalTestCluster.validateClusterFormed(InternalTestCluster.java:1156)
   >    at org.elasticsearch.test.InternalTestCluster.fullRestart(InternalTestCluster.java:1846)
   >    at org.elasticsearch.test.InternalTestCluster.fullRestart(InternalTestCluster.java:1705)
   >    at org.elasticsearch.test.test.InternalTestClusterTests.testDifferentRolesMaintainPathOnRestart(InternalTestClusterTests.java:478)
   >    at java.lang.Thread.run(Thread.java:748)
  2> NOTE: leaving temporary files on disk at: /Users/matriv/elastic/elasticsearch/test/framework/build/testrun/unitTest/J0/temp/org.elasticsearch.test.test.InternalTestClusterTests_431709D857335600-001
  2> NOTE: test params are: codec=Asserting(Lucene80): {}, docValues:{}, maxPointsInLeafNode=1204, maxMBSortInHeap=6.077152004808969, sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@3a327e1c), locale=ja-JP, timezone=Europe/Zagreb
  2> NOTE: Mac OS X 10.14.2 x86_64/Oracle Corporation 1.8.0_181 (64-bit)/cpus=12,threads=1,free=409719016,total=514850816
  2> NOTE: All tests run in this JVM: [InternalTestClusterTests]
Completed [1/1] in 38.21s, 1 test, 1 error <<< FAILURES!

Maybe relates to this change: #36977 ?

@matriv matriv added >test-failure Triaged test failures from CI :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Jan 15, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

matriv added a commit to matriv/elasticsearch that referenced this issue Jan 15, 2019
@ywelsch ywelsch self-assigned this Jan 15, 2019
@romseygeek
Copy link
Contributor

A different test entirely failed with the same error:

java.lang.IllegalStateException: cluster failed to form with expected nodes [{node_t3}{oBjDlzCCRWyVfZ-P_eRUHQ}{i0jt9hlDTNOsBLqxguBkmA}{127.0.0.1}{127.0.0.1:42657}{ml.machine_memory=63315337216, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, {node_t0}{DlUmj6TbRSiaq7G2tRgZVw}{dXStLUMXTvS1aYnecwto7g}{127.0.0.1}{127.0.0.1:41161}{ml.machine_memory=63315337216, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, {node_t4}{zgB7AcZ_RfG2_dlOhmpaSg}{oZTkIXK7RYKQLPwu49eAkQ}{127.0.0.1}{127.0.0.1:43415}{ml.machine_memory=63315337216, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, {node_t1}{lM32iLA9R2e3DNRYJyUPvQ}{SeYr2NeATHyeZY4ywLyW_w}{127.0.0.1}{127.0.0.1:40785}{ml.machine_memory=63315337216, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, {node_t2}{6bePpYnETHSKhA4qYCennQ}{w14d_pENQO6Vj9ZSWGsD9Q}{127.0.0.1}{127.0.0.1:46133}{ml.machine_memory=63315337216, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}] and actual nodes nodes: 
   {node_t3}{oBjDlzCCRWyVfZ-P_eRUHQ}{i0jt9hlDTNOsBLqxguBkmA}{127.0.0.1}{127.0.0.1:42657}{ml.machine_memory=63315337216, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
   {node_t1}{lM32iLA9R2e3DNRYJyUPvQ}{SeYr2NeATHyeZY4ywLyW_w}{127.0.0.1}{127.0.0.1:40785}{ml.machine_memory=63315337216, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
   {node_t2}{6bePpYnETHSKhA4qYCennQ}{w14d_pENQO6Vj9ZSWGsD9Q}{127.0.0.1}{127.0.0.1:46133}{ml.machine_memory=63315337216, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, local, master

Log here: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.7+internalClusterTest/1123/console

Reproduction line is:

REPRODUCE WITH: ./gradlew :x-pack:plugin:ml:internalClusterTest \
  -Dtests.seed=AA25CE803F029E81 \
  -Dtests.class=org.elasticsearch.xpack.ml.integration.NetworkDisruptionIT \
  -Dtests.method="testJobRelocation" \
  -Dtests.security.manager=true \
  -Dtests.locale=und \
  -Dtests.timezone=Asia/Phnom_Penh \
  -Dcompiler.java=11 \
  -Druntime.java=8

The failure is at the start of the test, in internalCluster().ensureAtLeastNumDataNodes(5);, so I don't think it's specific to the test itself

@hub-cap
Copy link
Contributor

hub-cap commented Mar 8, 2019

Failed again. Exactly the same as the one reported in the comment above.

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.7+internalClusterTest/2034/console

@ywelsch
Copy link
Contributor

ywelsch commented Mar 8, 2019

@hub-cap your link has testJobRelocation failing, but for a different reason? In particular, it does not look to have anything to do with the original test failure here.

@hub-cap
Copy link
Contributor

hub-cap commented Mar 8, 2019

@ywelsch it does not appear to be the same as the initial but the comment here, #37462 (comment), is mentioning

-Dtests.method="testJobRelocation" \

So I put it here. Its possible both of these two need to be in an entirely different issue?

@ywelsch
Copy link
Contributor

ywelsch commented Mar 8, 2019

yeah, I think both of them don't belong here.

@ywelsch
Copy link
Contributor

ywelsch commented Mar 8, 2019

@DaveCTurner you've removed the muting of the test that @matriv had originally done here by #37868. Can we close this issue?

@hub-cap
Copy link
Contributor

hub-cap commented Mar 8, 2019

Cool, ill open a new issue for them :) Sorry for the noise!

moved to #39858

@ywelsch
Copy link
Contributor

ywelsch commented May 7, 2019

No recent failure with this error.

@ywelsch ywelsch closed this as completed May 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

5 participants