Introduce translog no-op #22291

jasontedor · 2016-12-21T04:15:26Z

As the translog evolves towards a full operations log as part of the
sequence numbers push, there is a need for the translog to be able to
represent operations for which a sequence number was assigned, but the
operation did not mutate the index. Examples of how this can arise are
operations that fail after the sequence number is assigned, and gaps in
this history that arise when an operation is assigned a sequence number
but the operation never completed (e.g., a node crash). It is important
that these operations appear in the history so that they can be
replicated and replayed during recovery as otherwise the history will be
incomplete and local checkpoints will not be able to advance. This
commit introduces a no-op to the translog to set the stage for these
efforts.

Relates #10708

As the translog evolves towards a full operations log as part of the sequence numbers push, there is a need for the translog to be able to represent operations for which a sequence number was assigned, but the operation did not mutate the index. Examples of how this can arise are operations that fail after the sequence number is assigned, and gaps in this history that arise when an operation is assigned a sequence number but the operation never completed (e.g., a node crash). It is important that these operations appear in the history so that they can be replicated and replayed during recovery as otherwise the history will be incomplete and local checkpoints will not be able to advance. This commit introduces a no-op to the translog to set the stage for these efforts.

bleskes

Looks great. Left some very minor comments.

bleskes · 2016-12-21T14:25:49Z

core/src/main/java/org/elasticsearch/common/io/stream/StreamOutput.java

@@ -223,6 +223,11 @@ public void writeVLong(long i) throws IOException {
        writeByte((byte) i);
    }

+    public static int lengthVLong(long i) {


I removed this method.

bleskes · 2016-12-21T14:26:37Z

core/src/main/java/org/elasticsearch/index/engine/Engine.java

@@ -910,7 +927,7 @@ public void close() {

        /** type of operation (index, delete), subclasses use static types */


nit: comment needs updating

I updated this comment.

bleskes · 2016-12-21T14:31:37Z

core/src/main/java/org/elasticsearch/index/engine/Engine.java

+
+        @Override
+        public int estimatedSizeInBytes() {
+            return 2 * reason.length() + StreamOutput.lengthVLong(seqNo()) + StreamOutput.lengthVLong(primaryTerm());


do we really need to be so correct here? I mean, it's only used for the indexing memory buffer and all other places ignore the seq nos. I think we dont need an extra method?

PS. This made me think of something else - we can't use vlongs for seq# - they can be negative when coming from an old primary..

I changed the serialization to be (read|write)Long and updated the size estimate to just add twice the number of bytes per long.

bleskes · 2016-12-21T14:32:54Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+            noOpResult.freeze();
+            return noOpResult;
+        } finally {
+            if (seqNo != SequenceNumbersService.UNASSIGNED_SEQ_NO) {


seq no must be assigned here, no?

Yes, but I prefer the symmetry with innerIndex and innerDelete.

bleskes · 2016-12-21T14:34:38Z

core/src/main/java/org/elasticsearch/index/translog/Translog.java

@@ -943,7 +952,7 @@ public void writeTo(StreamOutput out) throws IOException {
            out.writeOptionalString(routing);
            out.writeOptionalString(parent);
            out.writeLong(version);
-            
+
            out.writeByte(versionType.getValue());
            out.writeLong(autoGeneratedIdTimestamp);
            out.writeVLong(seqNo);


not related to this change but a (big) bug - this can unassigned. I wonder how our BWC didn't catch it... maybe we don't run the nodes with assertion enabled?

We do run with assertions enabled; isn't it because we guard in the serialization:

if (format >= FORMAT_SEQ_NO) { seqNo = in.readVLong(); primaryTerm = in.readVLong(); }

Either way, I think it's simpler to just use (read|write)Long everywhere.

I'm +1 on using write/read long. I still don't get the bwc aspect though - we shouldn't be able to write a negative long with writeVLong. A request coming in from a primary on an old node should have it's seq# assigned to -2L

I discussed this with @bleskes via another channel. Our current theory is that assertions are not enabled for standalone nodes running in our integration tests. I will investigate and address accordingly.

Assertions are indeed not enabled on the standalone integration tests. I think they should be, they would have caught at least one bug in core (that I found after I enabled assertions there), and the issue here before changing the serialization to (read|write)Long. I will open a PR soon.

bleskes · 2016-12-21T14:37:01Z

core/src/main/java/org/elasticsearch/index/translog/Translog.java

+
+        @Override
+        public long estimateSize() {
+            return 2 * reason.length() + StreamOutput.lengthVLong(seqNo) + StreamOutput.lengthVLong(primaryTerm);


same issue here - we can just use size of long here - reason.length() is also wrong here - we write as utf8.

I changed the serialization to just use (read|write)Long, and the size estimate accordingly.

bleskes · 2016-12-21T14:41:29Z

core/src/test/java/org/elasticsearch/index/translog/TranslogTests.java


-        BytesStreamOutput out = new BytesStreamOutput();
-        total.writeTo(out);


is this total test replaced by something else?

I added a new test for the totals.

* master: Simplify Unicast Zen Ping (elastic#22277) Replace IndicesQueriesRegistry (elastic#22289) Fixed document mistake and fit for 5.1.1 API [TEST] improve error message in ESTestCase#assertWarnings [TEST] remove deleted test classes from checkstyle suppressions [TEST] make ESSingleNodeTestCase tests repeatable (elastic#22283) Link for setting page in elasticsearch.yml is outdated Factor out sort values from InternalSearchHit (elastic#22080) Add ID for percolate query to Java API docs x_refresh.yaml tests should use unique index names and doc ids to ease debugging IndicesStoreIntegrationIT should not use start recovery sending as an indication that the recovery started Added base class for testing aggregators and some initial tests for `terms`, `top_hits` and `min` aggregations. Add link to foreach processor to ingest-attachment.asciidoc

jasontedor · 2016-12-21T15:48:02Z

Thanks @bleskes, I pushed a commit responding to your feedback.

bleskes

Thx @jasontedor . That BWC aspect (not-related to this PR) still puzzles me though - how could we have written -2 using writeVlong?

bleskes · 2016-12-21T18:26:14Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

@@ -532,7 +532,7 @@ public IndexResult index(Index index) {
     *
     * @return failure if the failure is a document specific failure (e.g. analysis chain failure)
     * or throws Exception if the failure caused the engine to fail (e.g. out of disk, lucene tragic event)
-     *
+     * <p>


nit: remove?

I guess the IDE did that? I will remove.

jasontedor · 2016-12-22T04:08:26Z

Thanks @bleskes.

jasontedor added :Sequence IDs >enhancement v6.0.0-alpha1 labels Dec 21, 2016

jasontedor requested a review from bleskes December 21, 2016 04:15

jasontedor force-pushed the translog-no-ops branch from f19e12e to 36b9edb Compare December 21, 2016 12:55

bleskes suggested changes Dec 21, 2016

View reviewed changes

jasontedor added 2 commits December 21, 2016 10:45

Introduce translog no-op (iteration)

0546377

bleskes mentioned this pull request Dec 21, 2016

Add Sequence Numbers to write operations #10708

Closed

64 tasks

Fix translog stats tests

d940346

bleskes approved these changes Dec 21, 2016

View reviewed changes

jasontedor merged commit 7946396 into elastic:master Dec 22, 2016

jasontedor deleted the translog-no-ops branch December 22, 2016 04:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce translog no-op #22291

Introduce translog no-op #22291

jasontedor commented Dec 21, 2016 •

edited

Loading

bleskes left a comment

bleskes Dec 21, 2016

jasontedor Dec 21, 2016

bleskes Dec 21, 2016

jasontedor Dec 21, 2016

bleskes Dec 21, 2016

jasontedor Dec 21, 2016

bleskes Dec 21, 2016

jasontedor Dec 21, 2016

bleskes Dec 21, 2016

bleskes Dec 21, 2016

jasontedor Dec 21, 2016

bleskes Dec 21, 2016

jasontedor Dec 22, 2016

jasontedor Dec 22, 2016

bleskes Dec 21, 2016

jasontedor Dec 21, 2016

bleskes Dec 21, 2016

jasontedor Dec 21, 2016

jasontedor commented Dec 21, 2016

bleskes left a comment

bleskes Dec 21, 2016

jasontedor Dec 21, 2016

jasontedor commented Dec 22, 2016

		@@ -910,7 +927,7 @@ public void close() {

		/** type of operation (index, delete), subclasses use static types */


		BytesStreamOutput out = new BytesStreamOutput();
		total.writeTo(out);

Introduce translog no-op #22291

Introduce translog no-op #22291

Conversation

jasontedor commented Dec 21, 2016 • edited Loading

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasontedor commented Dec 21, 2016

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasontedor commented Dec 22, 2016

jasontedor commented Dec 21, 2016 •

edited

Loading