Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: new command 'start-single-node' #28495

Merged
merged 1 commit into from
Aug 2, 2019

Conversation

knz
Copy link
Contributor

@knz knz commented Aug 11, 2018

Fixes #24118.
Fixes #21429.

Suggested/requested by @bdarnell:

There are currently two ways to initialize a cluster: Starting the
first node without a --join flag, or starting all nodes with --join
flags and running a separate cockroach init command. This redundancy
is confusing (our docs use both methods without explaining the
difference) and it's easy to accidentally initialize a new cluster
by restarting your first node with the wrong flags. (This could have
catastrophic consequences in 1.1 because data from the two clusters
could get mixed. We've added safeguards against this in 2.0 but it's
still an easy way to break things.)
I think that we should deprecate this implicit initialization mode
and make the --join flag mandatory when starting a node (and
therefore the init command would be mandatory too). To avoid adding
too much complexity to single-node clusters, I propose a new
cockroach start-single-node command to perform the start and init
combination that is today implicit in the absence of a --join flag.
This would have a side benefit that we would know when the user
intends to keep the cluster as a single node instead of adding more
later. The start-single-node command could therefore set the
replication factor to 1 to disable the "underreplicated ranges"
warning.

This patch implements the new start-single-node command as follows:

  • the new command does not accept --join.
  • it attempts to set the replication factor to 1 upon starting up.

Meanwhile cockroach start is changed to produce a deprecation
warning if --join is not specified.

Release note (cli change): a new command cockroach start-single-node
is introduced to start single-node clusters with replication disabled.

Release note (cli change): using cockroach start without --join is
now deprecated and this mode of execution will be removed in a later
version of CockroachDB. Consider using cockroach start-single-node
instead or combine cockroach start with cockroach init.

@knz knz requested a review from a team as a code owner August 11, 2018 22:51
@knz knz requested a review from a team August 11, 2018 22:51
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz knz added the docs-todo label Aug 11, 2018
@knz knz force-pushed the 20180812-start-single-node branch 2 times, most recently from 2ca0215 to 08688e8 Compare August 12, 2018 07:45
Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good idea, but I don't think we should merge it for 2.1 at this point because it's too disruptive to the docs. We use the start-without-join mode in some places (including the "start a local cluster" tutorial) and we'd need to audit all the docs and tutorials to make sure we aren't giving people instructions that will lead to a "don't do it this way" warning as their first impression.

Product may also want to weigh in since this is a significant change to the initial user experience.

Should start-single-node also default to --listen-addr=localhost?

Reviewed 21 of 21 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/cli/start.go, line 77 at r1 (raw file):

storage devices, specified via --store flags.

Specify the --join flag to point to any other healthy node ( or list

This needs to be updated to reflect the fact that it is now normal to start nodes (at least the first one, by necessity) with --join pointing to a node that hasn't been started yet.


pkg/cli/start.go, line 676 at r1 (raw file):

			if initialBoot && disableReplication {
				cmd := []string{
					"zone", "set", "." + config.DefaultZoneName,

It's more complicated than this - setting the default zone won't automatically cascade into the other system zones that get default zone configs.

@knz
Copy link
Contributor Author

knz commented Aug 13, 2018

I'm good with keeping this open for a few weeks. I appreciate the pressure it would put on docs. However I do recommend we merge it early in the release cycle so we can start working on the UX improvements early on.

@tbg tbg added the X-noremind Bots won't notify about PRs with X-noremind label Jun 19, 2019
@knz
Copy link
Contributor Author

knz commented Jul 8, 2019

@bdarnell do you want to revive this?

@knz knz removed the request for review from a-robinson July 8, 2019 14:57
@bdarnell
Copy link
Contributor

bdarnell commented Jul 8, 2019

That question should probably go to PM/docs/support to see if we have the docs bandwidth to update everything, and if this would be enough of an improvement to be worth going through the deprecation cycle.

@knz knz force-pushed the 20180812-start-single-node branch from 08688e8 to a8b0fed Compare August 1, 2019 10:46
@knz knz requested a review from a team August 1, 2019 10:46
@knz knz removed the X-noremind Bots won't notify about PRs with X-noremind label Aug 1, 2019
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased, PTAL.

Should start-single-node also default to --listen-addr=localhost?

I don't know. I don't think so - the windows dev would be starting their single-node cluster in a linux vm and expect it to be reachable from the host.
What do you think?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell)


pkg/cli/start.go, line 77 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This needs to be updated to reflect the fact that it is now normal to start nodes (at least the first one, by necessity) with --join pointing to a node that hasn't been started yet.

Done.


pkg/cli/start.go, line 676 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

It's more complicated than this - setting the default zone won't automatically cascade into the other system zones that get default zone configs.

I still have to investigate this.

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 19 of 19 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @knz)


pkg/cli/start.go, line 77 at r1 (raw file):

Previously, knz (kena) wrote…

Done.

This still isn't saying what I was looking for. We need to say something about how the join nodes don't need to be started yet. Something like:

Specify the --join flag to point to another node or nodes that are part of the same cluster. The other nodes do not need to be started yet, and if the address of the other nodes to be added are not yet known it is legal for the first node to join itself.


pkg/cli/start.go, line 568 at r2 (raw file):

		if waitForInit {
			initialBoot = true

This is the multi-node join path, not start-single-node. This seems to disagree with the previous comment about this variable.


pkg/cli/start.go, line 705 at r2 (raw file):

				cmd := []string{
					"sql",
					"-e", "alter range default configure zone using num_replicas = 1",

I guess this works, but I'm surprised to see this go through the full command machinery. Is there not some function in sql_util we could use more directly?

@knz knz force-pushed the 20180812-start-single-node branch from a8b0fed to 18313b9 Compare August 1, 2019 20:51
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell)


pkg/cli/start.go, line 77 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This still isn't saying what I was looking for. We need to say something about how the join nodes don't need to be started yet. Something like:

Specify the --join flag to point to another node or nodes that are part of the same cluster. The other nodes do not need to be started yet, and if the address of the other nodes to be added are not yet known it is legal for the first node to join itself.

Done.


pkg/cli/start.go, line 676 at r1 (raw file):

Previously, knz (kena) wrote…

I still have to investigate this.

Done.


pkg/cli/start.go, line 568 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This is the multi-node join path, not start-single-node. This seems to disagree with the previous comment about this variable.

This entire approach was incorrect/flawed. I used a simpler/better approach. PTAL.


pkg/cli/start.go, line 705 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I guess this works, but I'm surprised to see this go through the full command machinery. Is there not some function in sql_util we could use more directly?

see above

@knz knz force-pushed the 20180812-start-single-node branch 2 times, most recently from 5852081 to 6c06686 Compare August 1, 2019 21:00
Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/cli/context.go, line 111 at r3 (raw file):

	serverCfg.SocketFile = ""
	serverCfg.JoinList = nil
	serverCfg.DefaultSystemZoneConfig = config.DefaultZoneConfig()

Looks like this first line should be removed (or just remove the word System?)

@knz knz force-pushed the 20180812-start-single-node branch from 6c06686 to 99e7943 Compare August 1, 2019 21:15
@knz knz force-pushed the 20180812-start-single-node branch from 99e7943 to cf1bbf1 Compare August 1, 2019 21:24
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell)


pkg/cli/context.go, line 111 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Looks like this first line should be removed (or just remove the word System?)

typo indeed. fixed.

Suggested/requested by @bdarnell:

> There are currently two ways to initialize a cluster: Starting the
  first node without a --join flag, or starting all nodes with --join
  flags and running a separate cockroach init command. This redundancy
  is confusing (our docs use both methods without explaining the
  difference) and it's easy to accidentally initialize a new cluster
  by restarting your first node with the wrong flags. (This could have
  catastrophic consequences in 1.1 because data from the two clusters
  could get mixed. We've added safeguards against this in 2.0 but it's
  still an easy way to break things.)
> I think that we should deprecate this implicit initialization mode
  and make the --join flag mandatory when starting a node (and
  therefore the init command would be mandatory too). To avoid adding
  too much complexity to single-node clusters, I propose a new
  `cockroach start-single-node` command to perform the start and init
  combination that is today implicit in the absence of a --join flag.
> This would have a side benefit that we would know when the user
  intends to keep the cluster as a single node instead of adding more
  later. The `start-single-node` command could therefore set the
  replication factor to 1 to disable the "underreplicated ranges"
  warning.

This patch implements the new `start-single-node` command as follows:

- the new command does not accept `--join`.
- it attempts to set the replication factor to 1 upon starting up.

Meanwhile `cockroach start` is changed to produce a deprecation
warning if `--join` is not specified.

Release note (cli change): a new command `cockroach start-single-node`
is introduced to start single-node clusters with replication disabled.

Release note (cli change): using `cockroach start` without `--join` is
now deprecated and this mode of execution will be removed in a later
version of CockroachDB. Consider using `cockroach start-single-node`
instead or combine `cockroach start` with `cockroach init`.
@knz knz force-pushed the 20180812-start-single-node branch from cf1bbf1 to 47474cb Compare August 2, 2019 01:50
@knz
Copy link
Contributor Author

knz commented Aug 2, 2019

TFYR

bors r+

craig bot pushed a commit that referenced this pull request Aug 2, 2019
28495: cli: new command 'start-single-node' r=knz a=knz

Fixes #24118.
Fixes #21429.

Suggested/requested by @bdarnell:

> There are currently two ways to initialize a cluster: Starting the
  first node without a --join flag, or starting all nodes with --join
  flags and running a separate cockroach init command. This redundancy
  is confusing (our docs use both methods without explaining the
  difference) and it's easy to accidentally initialize a new cluster
  by restarting your first node with the wrong flags. (This could have
  catastrophic consequences in 1.1 because data from the two clusters
  could get mixed. We've added safeguards against this in 2.0 but it's
  still an easy way to break things.)
> I think that we should deprecate this implicit initialization mode
  and make the --join flag mandatory when starting a node (and
  therefore the init command would be mandatory too). To avoid adding
  too much complexity to single-node clusters, I propose a new
  `cockroach start-single-node` command to perform the start and init
  combination that is today implicit in the absence of a --join flag.
> This would have a side benefit that we would know when the user
  intends to keep the cluster as a single node instead of adding more
  later. The `start-single-node` command could therefore set the
  replication factor to 1 to disable the "underreplicated ranges"
  warning.

This patch implements the new `start-single-node` command as follows:

- the new command does not accept `--join`.
- it attempts to set the replication factor to 1 upon starting up.

Meanwhile `cockroach start` is changed to produce a deprecation
warning if `--join` is not specified.

Release note (cli change): a new command `cockroach start-single-node`
is introduced to start single-node clusters with replication disabled.

Release note (cli change): using `cockroach start` without `--join` is
now deprecated and this mode of execution will be removed in a later
version of CockroachDB. Consider using `cockroach start-single-node`
instead or combine `cockroach start` with `cockroach init`.

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Aug 2, 2019

Build succeeded

@craig craig bot merged commit 47474cb into cockroachdb:master Aug 2, 2019
@knz knz deleted the 20180812-start-single-node branch August 2, 2019 02:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants