-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BWC layer to seq no infra and enable BWC tests #22185
Conversation
I spoke to @jpountz and indices created on 5.x do not index the timestamp field. Therefore passing a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a bunch of comments - this looks great!
@@ -43,6 +43,5 @@ html_docs | |||
|
|||
# random old stuff that we should look at the necessity of... | |||
/tmp/ | |||
backwards/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? this is where our bwc index creation python tool stores their versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm... maybe we need a different solution then - the problem is that with this line the this file was ignored by git.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call the folder bwc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@s1monw I pushed another commit updating gitignore to be more explicit about the backwords folder (rather than using a global glob pattern). I also updated the comments around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
@@ -58,6 +58,6 @@ public void writeTo(StreamOutput out) throws IOException { | |||
|
|||
@Override | |||
public String toString() { | |||
return "flush {" + super.toString() + "}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
@@ -36,6 +38,9 @@ | |||
public abstract class ReplicatedWriteRequest<R extends ReplicatedWriteRequest<R>> extends ReplicationRequest<R> implements WriteRequest<R> { | |||
private RefreshPolicy refreshPolicy = RefreshPolicy.NONE; | |||
|
|||
long seqNo; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra whitespaces? can this be private or protected?
* Returns the sequence number for this operation. The sequence number is assigned while the operation | ||
* is performed on the primary shard. | ||
*/ | ||
public long seqNo() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get/set prefix if possible :)
@@ -204,7 +204,7 @@ public void writeTo(StreamOutput out) throws IOException { | |||
// timestamp | |||
out.writeBoolean(false); // enabled | |||
out.writeString(DateFieldMapper.DEFAULT_DATE_TIME_FORMATTER.format()); | |||
out.writeOptionalString(null); | |||
out.writeOptionalString("now"); // old default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/old/5.x/
bwcVersion = "6.0.0-alpha1-SNAPSHOT" | ||
numNodes = 4 | ||
numBwcNodes = 2 | ||
bwcVersion = "5.2.0-SNAPSHOT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w00t
retest this please |
@ywelsch and I decided to think about it some more - I noted it down in #10708 and we will adress it later. At the moment we do not test the situation where a primary on a new node fails while have two replicas - one on the old node and one on the new. |
retest this please |
@s1monw can you take another look? CI is happy. |
@@ -36,6 +38,8 @@ | |||
public abstract class ReplicatedWriteRequest<R extends ReplicatedWriteRequest<R>> extends ReplicationRequest<R> implements WriteRequest<R> { | |||
private RefreshPolicy refreshPolicy = RefreshPolicy.NONE; | |||
|
|||
private long seqNo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be initialized with SequenceNumbersService.UNASSIGNED_SEQ_NO
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied over code, but good catch I'll adapt. The problem is that serialization requires a positive number. I'll adapt that too.
@@ -524,7 +524,8 @@ public void writeTo(StreamOutput out) throws IOException { | |||
out.writeOptionalString(routing); | |||
out.writeOptionalString(parent); | |||
if (out.getVersion().before(Version.V_6_0_0_alpha1_UNRELEASED)) { | |||
out.writeOptionalString(null); | |||
// timestamp, at this point #proccess was called which for 5.x meant this was set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand this explanation. why is 0 ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll copy over the explanation from the discussion on the PR
I spoke to @jpountz and indices created on 5.x do not index the timestamp field. Therefore passing a 0 for the transport layer is the simplest solution (as the value doesn't have any effect on the 5.x size but also conforms to 5.x semantics)
@@ -283,7 +283,7 @@ protected String checkActiveShardCount() { | |||
} | |||
|
|||
private void decPendingAndFinishIfNeeded() { | |||
assert pendingActions.get() > 0; | |||
assert pendingActions.get() > 0 : "pending action count goes bellow 0 for request [" + request + "]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/bellow/below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. thx.
protected void sendReplicaRequest(ConcreteShardRequest<ReplicaRequest> concreteShardRequest, DiscoveryNode node, | ||
ActionListener<ReplicationOperation.ReplicaResponse> listener) { | ||
transportService.sendRequest(node, transportReplicaAction, concreteShardRequest, transportOptions, | ||
// Eclipse can't handle when this is <> so we specify the type here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this comment still valid? I see that you removed the type..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. @jpountz maybe you can help verify eclipse is happy with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed it is not happy about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx @jpountz, I'll revert the change
allocationId = in.readString(); | ||
} else { | ||
// we use to read empty responses | ||
Empty.INSTANCE.readFrom(in); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just remove this? it's doing nothing so a comment is fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
@@ -1060,6 +1070,14 @@ public void onFailure(Exception shardFailedError) { | |||
} | |||
} | |||
|
|||
/** sends the give replica request to the supplied nodes */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/give/given
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check
numBwcNodes = 1 | ||
bwcVersion = "6.0.0-alpha1-SNAPSHOT" | ||
numNodes = 4 | ||
numBwcNodes = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is a bit too much to just run the new IndexingIT. The YAML tests will then also execute with that many nodes and the Windows VMs will take forever. It might also double the heap size needed to run the tests (4 * 2GB per ES instance?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was doubting on that one too. 4 nodes total is really required to have a good bwc case for the global checkpoint management (we need two replicas and one primary). I see what you are saying but I was doubting whether it's worth introducing a whole now BWC cluster. Setting it up might be slower than just running the yaml tests..
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks so much for doing this
Sequence BWC logic consists of two elements:
For the sequence number infra to work with a mixed version clusters, we have to consider situation where the primary is on an old node and replicas are on new ones (i.e., the replicas will receive operations without seq#) and also the reverse (i.e., the primary sends operations to a replica but the replica can't process the seq# and respond with local checkpoint). An new primary with an old replica is a rare because we do not allow a replica to recover from a new primary. However, it can occur if the old primary failed and a new replica was promoted or during primary relocation where the source primary is treated as a replica until the master starts the target.
UNASSIGNED_SEQ_NO
, which doesn't confuse the local checkpoint logic (keeping it atNO_OPS_PERFORMED
)This PR also re-enables the BWC tests which were disabled. As such it had to fix any BWC issue that had crept in. Most notably an issue with the removal of the
timestamp
field in #21670, which I marked with nocommit. The problem with the removal is that old nodes still expect a non-null field value . I considered serializing the current time stamp but that causes indexing requests to different shards to be potentially different. The two other options I saw was to always send a crazy but valid value (i.e., 0) or add the logic to 5.x to deal withnull
properly. I currently opted for the first one. I'll reach out to @jpountz to discuss this one further.Last - I added some debugging tools like more sane node names and forcing replication request to implement a
toString