Clarify stdout summary when node starts #8650

sploiselle · 2016-08-18T16:36:03Z

Per @mberhault, we could improve the current stdout behavior of starting nodes/joining clusters by:

Including cluster ID
Removing join[0] or augment its behavior when --join fails
Including line "Initialized new cluster" vs. "Joined existing cluster" that could include the cluster ID.

This is a potentially better experience in general but specifically improves visibility into issues when having nodes join clusters.

Repro Steps

You are able to get an ambiguous stdout message in this scenario:

Create network that prohibits communication on port 26257.
Start a node (cockroach start --insecure), which creates the cockroach-data dir.
Stop the node.
Have the node attempt to join an existing cluster (cockroach start --join=<other IP address>:26257)

stdout will then generate something like the following:

build:     beta-20160728 @ 2016/07/28 17:02:34 (go1.6.3)
admin:     http://insecure-node3:8080
sql:       postgresql://root@insecure-node3:26257?sslmode=disable
logs:      cockroach-data/logs
store[0]:  path=cockroach-data
join[0]:   <other IP address>:26257

Expectation

An error message that my node is unable to join the cluster, or some indication that it's creating its own new cluster instead of joining the existing cluster.

Reality

The command including --join looks like it's executed, which leads me to believe the node successfully joined the cluster. The line join[0]: <other IP address>:26257 reinforces that supposition.

The text was updated successfully, but these errors were encountered:

a-robinson · 2016-09-02T16:42:54Z

@mberhault @sploiselle - what's actually supposed to happen when you ask an existing node to join a cluster that it wasn't joined to before? Won't it typically just overwrite the local data if the cluster it joins has a quorum that conflicts with it?

I'll send a provisional PR in a little bit that adds some extra logging, but this seems like a strange use case so I may not be understanding it properly. The behavior when initializing a new node makes sense -- in that case, it blocks until it's able to join one of the provided addresses rather than initializing a new cluster.

sploiselle · 2016-09-02T16:57:47Z

@a-robinson When you have a node with an existing cluster ID stored in cockroach-data, regardless of what --join specifies, the node fails to start.

This is different than the behavior when this issue was opened. Previously, the node looked like it did start, which was confusing. But now it just looks like it fails running start (which is a better behavior). Let me know if you need me to go back through the repro steps. Given the current behavior, we could close the issue, but I think there's some room for improvement communicating the application's behavior; e.g., identifying the directory with the cluster ID that needs to be disassociated from the node before it can join a new cluster.

mberhault · 2016-09-03T12:39:11Z

@a-robinson: the node should fail to start if the clusterID in one of its stores doesn't match the clusterID of the cluster it's about to join. This is the only safety check to make sure a node doesn't join the wrong cluster, we should keep it.

a-robinson self-assigned this Sep 2, 2016

a-robinson mentioned this issue Sep 2, 2016

cli: Print more info to stdout when starting a node #9066

Merged

a-robinson closed this as completed in #9066 Sep 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify stdout summary when node starts #8650

Clarify stdout summary when node starts #8650

sploiselle commented Aug 18, 2016

a-robinson commented Sep 2, 2016

sploiselle commented Sep 2, 2016

mberhault commented Sep 3, 2016

Clarify stdout summary when node starts #8650

Clarify stdout summary when node starts #8650

Comments

sploiselle commented Aug 18, 2016

Repro Steps

Expectation

Reality

a-robinson commented Sep 2, 2016

sploiselle commented Sep 2, 2016

mberhault commented Sep 3, 2016