Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce translog no-op #22291

Merged
merged 4 commits into from
Dec 22, 2016
Merged

Conversation

jasontedor
Copy link
Member

@jasontedor jasontedor commented Dec 21, 2016

As the translog evolves towards a full operations log as part of the
sequence numbers push, there is a need for the translog to be able to
represent operations for which a sequence number was assigned, but the
operation did not mutate the index. Examples of how this can arise are
operations that fail after the sequence number is assigned, and gaps in
this history that arise when an operation is assigned a sequence number
but the operation never completed (e.g., a node crash). It is important
that these operations appear in the history so that they can be
replicated and replayed during recovery as otherwise the history will be
incomplete and local checkpoints will not be able to advance. This
commit introduces a no-op to the translog to set the stage for these
efforts.

Relates #10708

As the translog evolves towards a full operations log as part of the
sequence numbers push, there is a need for the translog to be able to
represent operations for which a sequence number was assigned, but the
operation did not mutate the index. Examples of how this can arise are
operations that fail after the sequence number is assigned, and gaps in
this history that arise when an operation is assigned a sequence number
but the operation never completed (e.g., a node crash). It is important
that these operations appear in the history so that they can be
replicated and replayed during recovery as otherwise the history will be
incomplete and local checkpoints will not be able to advance. This
commit introduces a no-op to the translog to set the stage for these
efforts.
Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Left some very minor comments.

@@ -223,6 +223,11 @@ public void writeVLong(long i) throws IOException {
writeByte((byte) i);
}

public static int lengthVLong(long i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java doc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this method.

@@ -910,7 +927,7 @@ public void close() {

/** type of operation (index, delete), subclasses use static types */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment needs updating

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this comment.


@Override
public int estimatedSizeInBytes() {
return 2 * reason.length() + StreamOutput.lengthVLong(seqNo()) + StreamOutput.lengthVLong(primaryTerm());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to be so correct here? I mean, it's only used for the indexing memory buffer and all other places ignore the seq nos. I think we dont need an extra method?

PS. This made me think of something else - we can't use vlongs for seq# - they can be negative when coming from an old primary..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the serialization to be (read|write)Long and updated the size estimate to just add twice the number of bytes per long.

noOpResult.freeze();
return noOpResult;
} finally {
if (seqNo != SequenceNumbersService.UNASSIGNED_SEQ_NO) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seq no must be assigned here, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I prefer the symmetry with innerIndex and innerDelete.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

@@ -943,7 +952,7 @@ public void writeTo(StreamOutput out) throws IOException {
out.writeOptionalString(routing);
out.writeOptionalString(parent);
out.writeLong(version);

out.writeByte(versionType.getValue());
out.writeLong(autoGeneratedIdTimestamp);
out.writeVLong(seqNo);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this change but a (big) bug - this can unassigned. I wonder how our BWC didn't catch it... maybe we don't run the nodes with assertion enabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do run with assertions enabled; isn't it because we guard in the serialization:

            if (format >= FORMAT_SEQ_NO) {
                seqNo = in.readVLong();
                primaryTerm = in.readVLong();
            }

Either way, I think it's simpler to just use (read|write)Long everywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 on using write/read long. I still don't get the bwc aspect though - we shouldn't be able to write a negative long with writeVLong. A request coming in from a primary on an old node should have it's seq# assigned to -2L

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed this with @bleskes via another channel. Our current theory is that assertions are not enabled for standalone nodes running in our integration tests. I will investigate and address accordingly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertions are indeed not enabled on the standalone integration tests. I think they should be, they would have caught at least one bug in core (that I found after I enabled assertions there), and the issue here before changing the serialization to (read|write)Long. I will open a PR soon.


@Override
public long estimateSize() {
return 2 * reason.length() + StreamOutput.lengthVLong(seqNo) + StreamOutput.lengthVLong(primaryTerm);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same issue here - we can just use size of long here - reason.length() is also wrong here - we write as utf8.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the serialization to just use (read|write)Long, and the size estimate accordingly.


BytesStreamOutput out = new BytesStreamOutput();
total.writeTo(out);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this total test replaced by something else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new test for the totals.

* master:
  Simplify Unicast Zen Ping (elastic#22277)
  Replace IndicesQueriesRegistry (elastic#22289)
  Fixed document mistake and fit for 5.1.1 API
  [TEST] improve error message in ESTestCase#assertWarnings
  [TEST] remove deleted test classes from checkstyle suppressions
  [TEST] make ESSingleNodeTestCase tests repeatable (elastic#22283)
  Link for setting page in elasticsearch.yml is outdated
  Factor out sort values from InternalSearchHit (elastic#22080)
  Add ID for percolate query to Java API docs
  x_refresh.yaml tests should use unique index names and doc ids to ease debugging
  IndicesStoreIntegrationIT should not use start recovery sending as an indication that the recovery started
  Added base class for testing aggregators and some initial tests for `terms`, `top_hits` and `min` aggregations.
  Add link to foreach processor to ingest-attachment.asciidoc
@jasontedor
Copy link
Member Author

Thanks @bleskes, I pushed a commit responding to your feedback.

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx @jasontedor . That BWC aspect (not-related to this PR) still puzzles me though - how could we have written -2 using writeVlong?

@@ -532,7 +532,7 @@ public IndexResult index(Index index) {
*
* @return failure if the failure is a document specific failure (e.g. analysis chain failure)
* or throws Exception if the failure caused the engine to fail (e.g. out of disk, lucene tragic event)
*
* <p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the IDE did that? I will remove.

@jasontedor jasontedor merged commit 7946396 into elastic:master Dec 22, 2016
@jasontedor jasontedor deleted the translog-no-ops branch December 22, 2016 04:08
@jasontedor
Copy link
Member Author

Thanks @bleskes.

@clintongormley clintongormley added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Translog :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v6.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants