-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce translog no-op #22291
Introduce translog no-op #22291
Conversation
As the translog evolves towards a full operations log as part of the sequence numbers push, there is a need for the translog to be able to represent operations for which a sequence number was assigned, but the operation did not mutate the index. Examples of how this can arise are operations that fail after the sequence number is assigned, and gaps in this history that arise when an operation is assigned a sequence number but the operation never completed (e.g., a node crash). It is important that these operations appear in the history so that they can be replicated and replayed during recovery as otherwise the history will be incomplete and local checkpoints will not be able to advance. This commit introduces a no-op to the translog to set the stage for these efforts.
f19e12e
to
36b9edb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Left some very minor comments.
@@ -223,6 +223,11 @@ public void writeVLong(long i) throws IOException { | |||
writeByte((byte) i); | |||
} | |||
|
|||
public static int lengthVLong(long i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
java doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this method.
@@ -910,7 +927,7 @@ public void close() { | |||
|
|||
/** type of operation (index, delete), subclasses use static types */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comment needs updating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this comment.
|
||
@Override | ||
public int estimatedSizeInBytes() { | ||
return 2 * reason.length() + StreamOutput.lengthVLong(seqNo()) + StreamOutput.lengthVLong(primaryTerm()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need to be so correct here? I mean, it's only used for the indexing memory buffer and all other places ignore the seq nos. I think we dont need an extra method?
PS. This made me think of something else - we can't use vlongs for seq# - they can be negative when coming from an old primary..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the serialization to be (read|write)Long
and updated the size estimate to just add twice the number of bytes per long.
noOpResult.freeze(); | ||
return noOpResult; | ||
} finally { | ||
if (seqNo != SequenceNumbersService.UNASSIGNED_SEQ_NO) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seq no must be assigned here, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I prefer the symmetry with innerIndex
and innerDelete
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
@@ -943,7 +952,7 @@ public void writeTo(StreamOutput out) throws IOException { | |||
out.writeOptionalString(routing); | |||
out.writeOptionalString(parent); | |||
out.writeLong(version); | |||
|
|||
out.writeByte(versionType.getValue()); | |||
out.writeLong(autoGeneratedIdTimestamp); | |||
out.writeVLong(seqNo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to this change but a (big) bug - this can unassigned. I wonder how our BWC didn't catch it... maybe we don't run the nodes with assertion enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do run with assertions enabled; isn't it because we guard in the serialization:
if (format >= FORMAT_SEQ_NO) {
seqNo = in.readVLong();
primaryTerm = in.readVLong();
}
Either way, I think it's simpler to just use (read|write)Long
everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm +1 on using write/read long. I still don't get the bwc aspect though - we shouldn't be able to write a negative long with writeVLong. A request coming in from a primary on an old node should have it's seq# assigned to -2L
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed this with @bleskes via another channel. Our current theory is that assertions are not enabled for standalone nodes running in our integration tests. I will investigate and address accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertions are indeed not enabled on the standalone integration tests. I think they should be, they would have caught at least one bug in core (that I found after I enabled assertions there), and the issue here before changing the serialization to (read|write)Long
. I will open a PR soon.
|
||
@Override | ||
public long estimateSize() { | ||
return 2 * reason.length() + StreamOutput.lengthVLong(seqNo) + StreamOutput.lengthVLong(primaryTerm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue here - we can just use size of long here - reason.length() is also wrong here - we write as utf8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the serialization to just use (read|write)Long
, and the size estimate accordingly.
|
||
BytesStreamOutput out = new BytesStreamOutput(); | ||
total.writeTo(out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this total test replaced by something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new test for the totals.
* master: Simplify Unicast Zen Ping (elastic#22277) Replace IndicesQueriesRegistry (elastic#22289) Fixed document mistake and fit for 5.1.1 API [TEST] improve error message in ESTestCase#assertWarnings [TEST] remove deleted test classes from checkstyle suppressions [TEST] make ESSingleNodeTestCase tests repeatable (elastic#22283) Link for setting page in elasticsearch.yml is outdated Factor out sort values from InternalSearchHit (elastic#22080) Add ID for percolate query to Java API docs x_refresh.yaml tests should use unique index names and doc ids to ease debugging IndicesStoreIntegrationIT should not use start recovery sending as an indication that the recovery started Added base class for testing aggregators and some initial tests for `terms`, `top_hits` and `min` aggregations. Add link to foreach processor to ingest-attachment.asciidoc
Thanks @bleskes, I pushed a commit responding to your feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx @jasontedor . That BWC aspect (not-related to this PR) still puzzles me though - how could we have written -2 using writeVlong?
@@ -532,7 +532,7 @@ public IndexResult index(Index index) { | |||
* | |||
* @return failure if the failure is a document specific failure (e.g. analysis chain failure) | |||
* or throws Exception if the failure caused the engine to fail (e.g. out of disk, lucene tragic event) | |||
* | |||
* <p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the IDE did that? I will remove.
Thanks @bleskes. |
As the translog evolves towards a full operations log as part of the
sequence numbers push, there is a need for the translog to be able to
represent operations for which a sequence number was assigned, but the
operation did not mutate the index. Examples of how this can arise are
operations that fail after the sequence number is assigned, and gaps in
this history that arise when an operation is assigned a sequence number
but the operation never completed (e.g., a node crash). It is important
that these operations appear in the history so that they can be
replicated and replayed during recovery as otherwise the history will be
incomplete and local checkpoints will not be able to advance. This
commit introduces a no-op to the translog to set the stage for these
efforts.
Relates #10708