-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a transport action to get cluster formation info #87306
Adding a transport action to get cluster formation info #87306
Conversation
Hi @masseyke, I've created a changelog YAML for you. |
…sseyke/elasticsearch into feature/cluster-formation-info-action
Hi @masseyke, I've updated the changelog YAML for you. |
…sseyke/elasticsearch into feature/cluster-formation-info-action
Pinging @elastic/es-data-management (Team:Data Management) |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this Keith
Left a few comments
in.readBoolean(), | ||
readStatusInfo(in), | ||
in.readList( | ||
in1 -> new JoinStatus( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we have a more meaningful name for the inner reader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or maybe make JoinStatus
readable from the wire too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JoinStatus is now Writeable and readable from the wire.
); | ||
} | ||
|
||
private static StatusInfo readStatusInfo(StreamInput in) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth making StatusInfo
Writeable
?
/** | ||
* This transport action fetches the ClusterFormationState from a remote node. | ||
*/ | ||
public static class TransportAction extends HandledTransportAction< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See @DaveCTurner 's comment on the previous PR
https://github.com/elastic/elasticsearch/pull/85782/files#r849933849
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good tho, it's just a simple HandledTransportAction
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah of course - I just realised it does the needful
transportService.registerRequestHandler(actionName, executor, false, canTripCircuitBreaker, requestReader, new TransportHandler());
Sorry for the noise Keith
|
||
public static class Request extends ActionRequest { | ||
|
||
public Request() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to request this information from all master eligible nodes so the request should be able to express that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a point-to-point request which I think makes sense as-is. The caller will have to do the work of sending one to each master of course, but that's ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okido, sounds good
@@ -284,7 +284,7 @@ public Coordinator( | |||
this.nodeHealthService = nodeHealthService; | |||
} | |||
|
|||
private ClusterFormationState getClusterFormationState() { | |||
public ClusterFormationState getClusterFormationState() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DaveCTurner does it make sense to send this over if the discovery cluster formation timeout has not lapsed yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think so. If we only just lost the master then it might be a little incomplete but I expect it to converge pretty quickly.
); | ||
} | ||
|
||
private static boolean calculateHasDiscoveredQuorum( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would it make sense to obey the isXXX
/hasXXX
naming convention for methods returning boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we're trying to calculate the value to set hasDiscoveredQuorum
to. I think if I named this hasDiscoveredQuorum()
the expectation would be that this is a simple getter on that field. But it's not -- it's meant to be called just once to calculate what that field ought to be. Subsequent access would be through the record's access method to the hasDiscoveredQuorum
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for working on this Keith
/** | ||
* This transport action fetches the ClusterFormationState from a remote node. | ||
*/ | ||
public static class TransportAction extends HandledTransportAction< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah of course - I just realised it does the needful
transportService.registerRequestHandler(actionName, executor, false, canTripCircuitBreaker, requestReader, new TransportHandler());
Sorry for the noise Keith
out.writeOptionalString(status == null ? null : status.name()); | ||
out.writeOptionalString(info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we changing the semantics for StatusInfo
? I don't think these fields are optional now. I think we should keep them non-null (ie. mandatory).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was nothing preventing them from being null. I thought it was very unlikely that someone would pass in nulls (what would be the point?). But if they do, I don't want it to blow up during serialization. From what I can tell there's no good way to enforce null checks on a record. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't change the behaviour here.
Usually, we mark nullable fields with @Nullable
or use Optional
when we accept null, but neither is present on this object.
IMO we should enforce/relax semantics in a separate and deliberate effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm all for enforcing non-nullable fields. I'll change the serialization to assume no nulls, and we can enforce it on a subsequent PR.
This PR exposes ClusterFormationFailureHelper.ClusterFormationState in a transport action (ClusterFormationInfoAction), with the intention that it be used as part of the master stability health check. This was started as #85782, but we realized that we needed more fine-grained information than just the string description of the reason the cluster formation failed.
This action will be used in branch 1.2.2.4 of the attached diagram (near the bottom right corner). That is, when we detect that there has been no master node in the last 30 seconds (default), and when PeerFinder does not believe that there is an elected master node, and when the current node is master eligible, we will use this transport action to asynchronously send a request to all master-eligible nodes to gather their ClusterFormationStates.