Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify stdout summary when node starts #8650

Closed
sploiselle opened this issue Aug 18, 2016 · 3 comments
Closed

Clarify stdout summary when node starts #8650

sploiselle opened this issue Aug 18, 2016 · 3 comments
Assignees

Comments

@sploiselle
Copy link
Contributor

Per @mberhault, we could improve the current stdout behavior of starting nodes/joining clusters by:

  • Including cluster ID
  • Removing join[0] or augment its behavior when --join fails
  • Including line "Initialized new cluster" vs. "Joined existing cluster" that could include the cluster ID.

This is a potentially better experience in general but specifically improves visibility into issues when having nodes join clusters.

Repro Steps

You are able to get an ambiguous stdout message in this scenario:

  1. Create network that prohibits communication on port 26257.
  2. Start a node (cockroach start --insecure), which creates the cockroach-data dir.
  3. Stop the node.
  4. Have the node attempt to join an existing cluster (cockroach start --join=<other IP address>:26257)

stdout will then generate something like the following:

build:     beta-20160728 @ 2016/07/28 17:02:34 (go1.6.3)
admin:     http://insecure-node3:8080
sql:       postgresql://root@insecure-node3:26257?sslmode=disable
logs:      cockroach-data/logs
store[0]:  path=cockroach-data
join[0]:   <other IP address>:26257

Expectation

An error message that my node is unable to join the cluster, or some indication that it's creating its own new cluster instead of joining the existing cluster.

Reality

The command including --join looks like it's executed, which leads me to believe the node successfully joined the cluster. The line join[0]: <other IP address>:26257 reinforces that supposition.

@a-robinson a-robinson self-assigned this Sep 2, 2016
@a-robinson
Copy link
Contributor

@mberhault @sploiselle - what's actually supposed to happen when you ask an existing node to join a cluster that it wasn't joined to before? Won't it typically just overwrite the local data if the cluster it joins has a quorum that conflicts with it?

I'll send a provisional PR in a little bit that adds some extra logging, but this seems like a strange use case so I may not be understanding it properly. The behavior when initializing a new node makes sense -- in that case, it blocks until it's able to join one of the provided addresses rather than initializing a new cluster.

@sploiselle
Copy link
Contributor Author

@a-robinson When you have a node with an existing cluster ID stored in cockroach-data, regardless of what --join specifies, the node fails to start.

This is different than the behavior when this issue was opened. Previously, the node looked like it did start, which was confusing. But now it just looks like it fails running start (which is a better behavior). Let me know if you need me to go back through the repro steps. Given the current behavior, we could close the issue, but I think there's some room for improvement communicating the application's behavior; e.g., identifying the directory with the cluster ID that needs to be disassociated from the node before it can join a new cluster.

@mberhault
Copy link
Contributor

@a-robinson: the node should fail to start if the clusterID in one of its stores doesn't match the clusterID of the cluster it's about to join. This is the only safety check to make sure a node doesn't join the wrong cluster, we should keep it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants