-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify stdout summary when node starts #8650
Comments
@mberhault @sploiselle - what's actually supposed to happen when you ask an existing node to join a cluster that it wasn't joined to before? Won't it typically just overwrite the local data if the cluster it joins has a quorum that conflicts with it? I'll send a provisional PR in a little bit that adds some extra logging, but this seems like a strange use case so I may not be understanding it properly. The behavior when initializing a new node makes sense -- in that case, it blocks until it's able to join one of the provided addresses rather than initializing a new cluster. |
@a-robinson When you have a node with an existing cluster ID stored in This is different than the behavior when this issue was opened. Previously, the node looked like it did start, which was confusing. But now it just looks like it fails running start (which is a better behavior). Let me know if you need me to go back through the repro steps. Given the current behavior, we could close the issue, but I think there's some room for improvement communicating the application's behavior; e.g., identifying the directory with the cluster ID that needs to be disassociated from the node before it can join a new cluster. |
@a-robinson: the node should fail to start if the clusterID in one of its stores doesn't match the clusterID of the cluster it's about to join. This is the only safety check to make sure a node doesn't join the wrong cluster, we should keep it. |
Per @mberhault, we could improve the current stdout behavior of starting nodes/joining clusters by:
join[0]
or augment its behavior when--join
failsThis is a potentially better experience in general but specifically improves visibility into issues when having nodes join clusters.
Repro Steps
You are able to get an ambiguous stdout message in this scenario:
cockroach start --insecure
), which creates thecockroach-data
dir.cockroach start --join=<other IP address>:26257
)stdout will then generate something like the following:
Expectation
An error message that my node is unable to join the cluster, or some indication that it's creating its own new cluster instead of joining the existing cluster.
Reality
The command including
--join
looks like it's executed, which leads me to believe the node successfully joined the cluster. The linejoin[0]: <other IP address>:26257
reinforces that supposition.The text was updated successfully, but these errors were encountered: