Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] RemoveCorruptedShardDataCommandTests testCorruptedIndex fails #34322

Closed
vladimirdolzhenko opened this issue Oct 5, 2018 · 2 comments
Closed
Assignees
Labels
:Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. >test-failure Triaged test failures from CI

Comments

@vladimirdolzhenko
Copy link
Contributor

vladimirdolzhenko commented Oct 5, 2018

logs: https://www.google.com/url?q=https%3A%2F%2Felasticsearch-ci.elastic.co%2Fjob%2Felastic%2Belasticsearch%2B6.x%2Bmultijob-unix-compatibility%2Fos%3Damazon%2F42%2Fconsole&sa=D&sntz=1&usg=AFQjCNFFG4hg9naqoVYMqXTq8OD-lszrlw

It is reproducible with:

./gradlew :server:test -Dtests.seed=F13528ED31F73FE7 -Dtests.class=org.elasticsearch.index.shard.RemoveCorruptedShardDataCommandTests -Dtests.method="testCorruptedIndex" -Dtests.security.manager=true -Dtests.locale=es-CO -Dtests.timezone=Canada/Central -Dcompiler.java=11 -Druntime.java=8

The reason:
corrupting segments file, when Lucene reads it, it throws generic IOException at https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/DataInput.java#L141 and that's why ES would not consider it as an index corruption and does not catch it at https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/store/Store.java#L188 and therefore would not create a corruption marker (while RemoveCorruptedShardDataCommand relies on its presence)

It would be better if Lucene throws more specific exception rather generic one.

update: https://issues.apache.org/jira/browse/LUCENE-8525 to address this problem

@vladimirdolzhenko vladimirdolzhenko added >test-failure Triaged test failures from CI :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. labels Oct 5, 2018
@vladimirdolzhenko vladimirdolzhenko self-assigned this Oct 5, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@vladimirdolzhenko
Copy link
Contributor Author

Another one:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=centos/58/console

REPRODUCE WITH: ./gradlew :server:test \
  -Dtests.seed=C8F1C7CB781F757A \
  -Dtests.class=org.elasticsearch.index.shard.RemoveCorruptedShardDataCommandTests \
  -Dtests.method="testCorruptedIndex" \
  -Dtests.security.manager=true \
  -Dtests.locale=pt-PT \
  -Dtests.timezone=Antarctica/Mawson \
  -Dcompiler.java=11 \
  -Druntime.java=8

stacktrace:

ERROR   0.46s J2 | RemoveCorruptedShardDataCommandTests.testCorruptedIndex <<< FAILURES!
   > Throwable #1: java.io.IOException: Invalid vInt detected (too many bits)
   > 	at __randomizedtesting.SeedInfo.seed([C8F1C7CB781F757A:F553B10F61CC26A7]:0)
   > 	at org.apache.lucene.store.DataInput.readVInt(DataInput.java:141)
   > 	at org.apache.lucene.store.DataInput.readSetOfStrings(DataInput.java:287)
   > 	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:379)
   > 	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
   > 	at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:432)
   > 	at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:429)
   > 	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:680)
   > 	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:632)
   > 	at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:434)
   > 	at org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:260)
   > 	at org.elasticsearch.index.translog.TruncateTranslogAction.getCleanStatus(TruncateTranslogAction.java:71)
   > 	at org.elasticsearch.index.shard.RemoveCorruptedShardDataCommand.lambda$execute$1(RemoveCorruptedShardDataCommand.java:328)

ywelsch added a commit that referenced this issue Feb 1, 2019
Folks at the Lucene project do not seem to be interested in classifying corruptions and
distinguishing them from file-system exceptions (see https://issues.apache.org/jira/browse/LUCENE-8525),
so we'll just cop out as well.

Closes #34322
ywelsch added a commit to ywelsch/elasticsearch that referenced this issue Feb 6, 2019
Folks at the Lucene project do not seem to be interested in classifying corruptions and
distinguishing them from file-system exceptions (see https://issues.apache.org/jira/browse/LUCENE-8525),
so we'll just cop out as well.

Closes elastic#34322
ywelsch added a commit that referenced this issue Feb 11, 2019
Folks at the Lucene project do not seem to be interested in classifying corruptions and
distinguishing them from file-system exceptions (see https://issues.apache.org/jira/browse/LUCENE-8525),
so we'll just cop out as well.

Closes #34322
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

2 participants