Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] RecoveryTests#testRetentionPolicyChangeDuringRecovery failures #32089

Closed
jimczi opened this issue Jul 16, 2018 · 7 comments
Closed

[CI] RecoveryTests#testRetentionPolicyChangeDuringRecovery failures #32089

jimczi opened this issue Jul 16, 2018 · 7 comments
Assignees
Labels
:Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >test-failure Triaged test failures from CI

Comments

@jimczi
Copy link
Contributor

jimczi commented Jul 16, 2018

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=oraclelinux/2561

Does not reproduce locally with:

./gradlew -x :server:testScriptDocValuesMissingV6Behaviour \
  -Dtests.seed=7EC49556174B0437 \
  -Dtests.class=org.elasticsearch.indices.recovery.RecoveryTests \
  -Dtests.method="testRetentionPolicyChangeDuringRecovery" \
  -Dtests.security.manager=true \
  -Dtests.locale=pt-PT \
  -Dtests.timezone=Australia/Eucla

but it failed 4 times last month in CI with the same exception:

Expected: <0>
     but: was <20>
		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
		at org.junit.Assert.assertThat(Assert.java:956)
		at org.junit.Assert.assertThat(Assert.java:923)
		at org.elasticsearch.indices.recovery.RecoveryTests.lambda$testRetentionPolicyChangeDuringRecovery$1(RecoveryTests.java:102)
		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:755)
		... 39 more
@jimczi jimczi added >test-failure Triaged test failures from CI :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. labels Jul 16, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@bleskes
Copy link
Contributor

bleskes commented Jul 16, 2018

@dnhatn do you mind taking a look?

@dnhatn dnhatn self-assigned this Jul 16, 2018
@droberts195
Copy link
Contributor

The same error occurred today in a 6.3 build: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.3+multijob-unix-compatibility/os=oraclelinux/213/console

FAILURE 10.3s J2 | RecoveryTests.testRetentionPolicyChangeDuringRecovery <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: 
   > Expected: <0>
   >      but: was <20>
   > 	at __randomizedtesting.SeedInfo.seed([709FE84094F79193:EAA4166573FF31D7]:0)
   > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   > 	at org.elasticsearch.indices.recovery.RecoveryTests.lambda$testRetentionPolicyChangeDuringRecovery$1(RecoveryTests.java:102)
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:767)
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:741)
   > 	at org.elasticsearch.indices.recovery.RecoveryTests.testRetentionPolicyChangeDuringRecovery(RecoveryTests.java:102)
   > 	at java.lang.Thread.run(Thread.java:748)

It didn't reproduce when I ran this against the 6.3 branch:

./gradlew :server:test \
  -Dtests.seed=709FE84094F79193 \
  -Dtests.class=org.elasticsearch.indices.recovery.RecoveryTests \
  -Dtests.method="testRetentionPolicyChangeDuringRecovery" \
  -Dtests.security.manager=true \
  -Dtests.locale=ko \
  -Dtests.timezone=Asia/Irkutsk

@davidkyle
Copy link
Member

Another instance that did not reproduce

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+release-tests/926/console

Expected: <0>
     but: was <20>
	at __randomizedtesting.SeedInfo.seed([A5DAAA599AC39A8A:3FE1547C7DCB3ACE]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
	at org.junit.Assert.assertThat(Assert.java:956)
	at org.junit.Assert.assertThat(Assert.java:923)
	at org.elasticsearch.indices.recovery.RecoveryTests.lambda$testRetentionPolicyChangeDuringRecovery$1(RecoveryTests.java:102)

The failure actually came from a release build so I had to strip out -Dbuild.snapshot=false \ -Dtests.jvm.argline="-Dbuild.snapshot=false" from the reproduce line to get it to build. Still does not reproduce

./gradlew :server:test \
  -Dtests.seed=A5DAAA599AC39A8A \
  -Dtests.class=org.elasticsearch.indices.recovery.RecoveryTests \
  -Dtests.method="testRetentionPolicyChangeDuringRecovery" \
  -Dtests.security.manager=true \
  -Dtests.locale=be \
  -Dtests.timezone=Indian/Mahe

@jkakavas
Copy link
Member

This also failed today in 6.x https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+intake/2276/console

REPRODUCE WITH: ./gradlew :server:test \
  -Dtests.seed=5D41AD8C90C233F3 \
  -Dtests.class=org.elasticsearch.indices.recovery.RecoveryTests \
  -Dtests.method="testRetentionPolicyChangeDuringRecovery" \
  -Dtests.security.manager=true \
  -Dtests.locale=et-EE \
  -Dtests.timezone=Etc/GMT+0 \
  -Dcompiler.java=10 \
  -Druntime.java=8
07:14:01    > Expected: <0>
07:14:01    >      but: was <20>
07:14:01    > 		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
07:14:01    > 		at org.elasticsearch.indices.recovery.RecoveryTests.lambda$testRetentionPolicyChangeDuringRecovery$1(RecoveryTests.java:106)
07:14:01    > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:835)

dnhatn added a commit that referenced this issue Aug 17, 2018
@dnhatn
Copy link
Member

dnhatn commented Aug 17, 2018

I've muted this test as we should have enough information with the latest failure.

dnhatn added a commit that referenced this issue Aug 17, 2018
dnhatn added a commit to dnhatn/elasticsearch that referenced this issue Aug 17, 2018
Since elastic#28140  when the global checkpoint is advanced, we try to move the
safe commit forward, and clean old index commits if possible. However,
we forget to trim unreferenced translog.

This change makes sure that we prune both translog and index commits
when the safe commit advanced.

Relates elastic#28140
Closes elastic#32089
@dnhatn
Copy link
Member

dnhatn commented Aug 17, 2018

I opened #32967.

dnhatn added a commit that referenced this issue Aug 20, 2018
Since #28140 when the global checkpoint is advanced, we try to move the
safe commit forward, and clean up old index commits if possible. However,
we forget to trim unreferenced translog.

This change makes sure that we prune both old translog and index commits
when the safe commit advanced.

Relates #28140
Closes #32089
dnhatn added a commit that referenced this issue Aug 20, 2018
Since #28140 when the global checkpoint is advanced, we try to move the
safe commit forward, and clean up old index commits if possible. However,
we forget to trim unreferenced translog.

This change makes sure that we prune both old translog and index commits
when the safe commit advanced.

Relates #28140
Closes #32089
dnhatn added a commit that referenced this issue Aug 20, 2018
Since #28140 when the global checkpoint is advanced, we try to move the
safe commit forward, and clean up old index commits if possible. However,
we forget to trim unreferenced translog.

This change makes sure that we prune both old translog and index commits
when the safe commit advanced.

Relates #28140
Closes #32089
dnhatn added a commit that referenced this issue Aug 20, 2018
Since #28140 when the global checkpoint is advanced, we try to move the
safe commit forward, and clean up old index commits if possible. However,
we forget to trim unreferenced translog.

This change makes sure that we prune both old translog and index commits
when the safe commit advanced.

Relates #28140
Closes #32089
jasontedor pushed a commit that referenced this issue Aug 21, 2018
Since #28140 when the global checkpoint is advanced, we try to move the
safe commit forward, and clean up old index commits if possible. However,
we forget to trim unreferenced translog.

This change makes sure that we prune both old translog and index commits
when the safe commit advanced.

Relates #28140
Closes #32089
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

7 participants