Relax TranslogWriter#assertNoSeqNumberConflict #31569

dnhatn · 2018-06-26T03:12:31Z

If the recovery and indexing are concurrently happening, it's possible for a replica to receive the same operation twice: one from the replication, and the other from the recovery. However, these operations are not identical because we don't store the versionType of operations in the Lucene index.

The TranslogWriter#assertNoSeqNumberConflict assertion has been tripped several times in the CCR branch since we use Lucene in peer-recovery. This commit relaxes that assertion by excluding the versionType from the check.

If the recovery and indexing are concurrently happening, it's possible for a replica to receive the same operation twice. One from the replication, and the other from the recovery. These operations will be different because we don't store the versionType of operations in the Lucene index. The TranslogWriter#assertNoSeqNumberConflict assertion has been tripped several times in the CCR branch since we use Lucene in peer-recovery. This commit relaxes that assertion by excluding the versionType from that check.

elasticmachine · 2018-06-26T03:12:33Z

Pinging @elastic/es-distributed

dnhatn · 2018-06-26T03:12:59Z

CI: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+ccr+feature-branch-periodic/879/console

bleskes · 2018-06-27T10:20:19Z

do we still need this now that recovery ships an external version type (like replication)?

dnhatn · 2018-06-27T16:46:56Z

@bleskes We still need this for now. We do not convert EXTERNAL_GTE to EXTERNAL in the replication requests. Should we always use EXTERNAL in both replication and recovery?

[2018-06-27T18:39:26,830][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [node-0] fatal error in thread [elasticsearch[node-0][generic][T#4]], exiting
java.lang.AssertionError: seqNo [0] was processed twice in generation [2], with different data. 

prvOp [Index{id='1', type='test', seqNo=0, primaryTerm=1, 
version=5, versionType=EXTERNAL_GTE, autoGeneratedIdTimestamp=-1}], // <-- from replication

newOp [Index{id='1', type='test', seqNo=0, primaryTerm=1,
version=5, versionType=EXTERNAL, autoGeneratedIdTimestamp=-1}]  // <-- from recovery

bleskes

LGTM

bleskes · 2018-06-28T11:50:32Z

We still need this for now.
fair enough.

One more thing - maybe add a test for this so we won't forget external_gte again?

dnhatn · 2018-06-28T21:42:25Z

Thanks @bleskes and @s1monw

If the recovery and indexing are concurrently happening, it's possible for a replica to receive the same operation twice: one from the replication, and the other from the recovery. However, these operations are not identical because we don't store the versionType of operations in the Lucene index. The TranslogWriter#assertNoSeqNumberConflict assertion has been tripped several times in the CCR branch since we use Lucene in peer-recovery. This commit relaxes that assertion by excluding the versionType from the check.

dnhatn added >feature :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. labels Jun 26, 2018

dnhatn requested review from bleskes and s1monw June 26, 2018 03:12

bleskes approved these changes Jun 28, 2018

View reviewed changes

s1monw approved these changes Jun 28, 2018

View reviewed changes

dnhatn added 3 commits June 28, 2018 10:00

Merge branch 'ccr' into relax-version-type-check

2b779c3

add external_gte test

05bf89d

Merge branch 'ccr' into relax-version-type-check

99d9cd9

dnhatn merged commit 573df2d into elastic:ccr Jun 28, 2018

dnhatn deleted the relax-version-type-check branch June 28, 2018 21:42

dnhatn added the backport pending label Jun 28, 2018

dnhatn removed the backport pending label Jun 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Relax TranslogWriter#assertNoSeqNumberConflict #31569

Relax TranslogWriter#assertNoSeqNumberConflict #31569

Uh oh!

dnhatn commented Jun 26, 2018

Uh oh!

elasticmachine commented Jun 26, 2018

Uh oh!

dnhatn commented Jun 26, 2018

Uh oh!

bleskes commented Jun 27, 2018

Uh oh!

dnhatn commented Jun 27, 2018 •

edited

Loading

Uh oh!

bleskes left a comment

Uh oh!

bleskes commented Jun 28, 2018

Uh oh!

dnhatn commented Jun 28, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Relax TranslogWriter#assertNoSeqNumberConflict #31569

Relax TranslogWriter#assertNoSeqNumberConflict #31569

Uh oh!

Conversation

dnhatn commented Jun 26, 2018

Uh oh!

elasticmachine commented Jun 26, 2018

Uh oh!

dnhatn commented Jun 26, 2018

Uh oh!

bleskes commented Jun 27, 2018

Uh oh!

dnhatn commented Jun 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bleskes left a comment

Choose a reason for hiding this comment

Uh oh!

bleskes commented Jun 28, 2018

Uh oh!

dnhatn commented Jun 28, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dnhatn commented Jun 27, 2018 •

edited

Loading