Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: remove auto-init with cockroach start without --join #51245

Merged

Conversation

irfansharif
Copy link
Contributor

Fixes #44116. Informs #24118. Reviving #44112.

Release note (backward-incompatible change): A CockroachDB node
started with cockroach start without the --join flag no
longer automatically initializes the cluster. The cockrach init
command is now mandatory. The auto-initialization behavior had
been deprecated in version 19.2.

@irfansharif irfansharif requested a review from a team as a code owner July 9, 2020 20:35
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif irfansharif force-pushed the 200709.remove-auto-init-no-join branch from b02a8af to 4f4b3da Compare July 9, 2020 20:50
@irfansharif irfansharif changed the title cli: remove auto-init with cockroach start without `--join cli: remove auto-init with cockroach start without --join Jul 9, 2020
@irfansharif irfansharif force-pushed the 200709.remove-auto-init-no-join branch from 4f4b3da to 1e885cf Compare July 9, 2020 20:50
@irfansharif irfansharif requested review from tbg and knz July 9, 2020 21:10
@irfansharif irfansharif force-pushed the 200709.remove-auto-init-no-join branch from 1e885cf to 858a6c2 Compare July 9, 2020 22:17
@irfansharif
Copy link
Contributor Author

Will have to change roachprod to start using init, hm, that'll be a bit tedious.

@irfansharif irfansharif force-pushed the 200709.remove-auto-init-no-join branch from 858a6c2 to 1e125e7 Compare July 10, 2020 00:11
irfansharif added a commit to irfansharif/cockroach-go that referenced this pull request Jul 10, 2020
`cockroach start` without `--join` is deprecated behavior, and something
we're looking to get rid of in cockroachdb/cockroach#51245.
@irfansharif
Copy link
Contributor Author

Updated roachprod to use explicit init, tested against v19.1-v20.2; cockroachdb/cockroach-go PR is cockroachdb/cockroach-go#81 (and I'll bump examples-orm after that lands).

@irfansharif irfansharif force-pushed the 200709.remove-auto-init-no-join branch from 1e125e7 to 9aa5bc1 Compare July 10, 2020 01:21
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to @knz for final LGTM.

I checked out your PR to see what would happen if I run

./cockroach start --insecure

without a join flag. It does not error out. Do we already consider this a footgun? ./cockroach sql --insecure will simply hang forever. Should we force a join flag to be supplied, even if it is empty (--join=)? I expect without some additional change here we will get lots of users starting a node that waits for init but without users realizing (maybe most of these users will be CRDB engineers but still, it's a potential issue).

Reviewed 18 of 18 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @knz)


pkg/acceptance/cluster/dockercluster.go, line 488 at r1 (raw file):

		firstNodeAddr = l.Nodes[0].nodeStr
	}
	cmd = append(cmd, "--join="+net.JoinHostPort(firstNodeAddr, base.DefaultPort))

Why give the first node --join=:26257 instead of l.Nodes[0].nodeStr? Is Nodes[0] still unavailable at this point? Is this join flag useful?


pkg/base/test_server_args.go, line 95 at r1 (raw file):

	// their config. If WaitForBootstrap is set, that behavior is disabled
	// and the test becomes responsible for initializing the cluster.
	WaitForBootstrap bool

might be easier to understand what this does if you named it NoAutoInitializeCluster but your call.


pkg/cli/init.go, line 62 at r1 (raw file):

	if _, err = c.Bootstrap(ctx, &serverpb.BootstrapRequest{}); err != nil {
		if strings.Contains(err.Error(), server.ErrClusterInitialized.Error()) {

Why did this check go away? Idempotency of init?


pkg/cmd/roachprod/install/cockroach.go, line 306 at r1 (raw file):

		var initOut string
		display = fmt.Sprintf("%s: bootstrapping cluster", c.Name)
		c.Parallel(display, 1, 0, func(i int) ([]byte, error) {

Wait, why c.Parallel()? This is just hitting a single node, no?


pkg/server/server.go, line 1293 at r1 (raw file):

		switch {
		case err == nil:
			log.Infof(ctx, "**** add additional nodes by specifying --join=%s", s.cfg.AdvertiseAddr)

Maybe just remove this log now that it's not prod code any more.


pkg/server/server.go, line 1300 at r1 (raw file):

	}

	log.Info(ctx, "awaiting init command or join with an already initialized node.")

} else {?

@irfansharif irfansharif force-pushed the 200709.remove-auto-init-no-join branch from 9aa5bc1 to 7a87c63 Compare July 10, 2020 23:33
Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what to do with the lonecockroach start --insecure, it does seem annoying. But forcing an empty join seems bad too? I wonder if it's possible/good enough to have cockroach sql prompt a warning asking the operator to ensure that the cluster is initialized, or the fact that it's waiting for initialization. @knz, thoughts?


I’m not sure how to work around #42083 to make sure my changes in cockroachdb/cockroach-go#81 and cockroachdb/examples-orms#109 all work for this branch, without me pushing to examples-orm master and letting CI pick that up. Is that the only way? Seems unwieldy.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @tbg)


pkg/acceptance/cluster/dockercluster.go, line 488 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Why give the first node --join=:26257 instead of l.Nodes[0].nodeStr? Is Nodes[0] still unavailable at this point? Is this join flag useful?

Done.


pkg/base/test_server_args.go, line 95 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

might be easier to understand what this does if you named it NoAutoInitializeCluster but your call.

Done.


pkg/cli/init.go, line 62 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Why did this check go away? Idempotency of init?

init is not idempotent, we still want to surface ErrClusterInitialized but I pulled the error message generated to to MaybeDecorateGRPCError, where we seem to do a lot of the same.


pkg/cmd/roachprod/install/cockroach.go, line 306 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Wait, why c.Parallel()? This is just hitting a single node, no?

It's hitting a single node, yup. Was cargo culting here from the setting of cluster settings in the block below, it nicely captures command process/failure outputs.


pkg/server/server.go, line 1293 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Maybe just remove this log now that it's not prod code any more.

Done.


pkg/server/server.go, line 1300 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

} else {?

Done.

@irfansharif irfansharif force-pushed the 200709.remove-auto-init-no-join branch from 7a87c63 to d2005b5 Compare July 11, 2020 04:00
@irfansharif
Copy link
Contributor Author

irfansharif commented Jul 11, 2020

I checked out your PR to see what would happen if I run ./cockroach start --insecure without a join flag. It does not error out.

It now errors out. We apparently already emitted a warning earlier deprecating usage of cockroach start without --join, so now we can remove it entirely.

@irfansharif
Copy link
Contributor Author

Ignore the first commit in this PR, I've sent it out in #51329.

@irfansharif irfansharif force-pushed the 200709.remove-auto-init-no-join branch 5 times, most recently from b387fc5 to 506042b Compare July 15, 2020 20:44
@irfansharif
Copy link
Contributor Author

Went ahead and merged cockroachdb/cockroach-go#81 + cockroachdb/examples-orms#109, which clears out the example-orm test suite run against this branch (and master). Should be reflected in the now green CI.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM, just minor concerns on the error objects

Reviewed 1 of 18 files at r1, 18 of 18 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @tbg)


pkg/cli/start.go, line 544 at r2 (raw file):

	// Check the --join flag.
	if !flagSetForCmd(cmd).Lookup(cliflags.Join.Name).Changed {
		err := errors.New("no --join flags provided to 'cockroach start'\n" +

Please consider using an error hint for the second sentence instead. This presents better on the console.


pkg/server/init.go, line 34 at r2 (raw file):

// ErrClusterInitialized is reported when the Bootstrap RPC is run on
// a node that is already part of an initialized cluster.
var ErrClusterInitialized = fmt.Errorf("cluster has already been initialized")

Can you add the error hint here too. Especially if the CLI code now returns ErrClusterInitialized without adding the hint itself.

@irfansharif irfansharif force-pushed the 200709.remove-auto-init-no-join branch 2 times, most recently from 707d82e to 57a3b52 Compare July 17, 2020 15:57
Fixes cockroachdb#44116. Informs cockroachdb#24118. Reviving cockroachdb#44112.

We remove the deprecated behavior of crdb where we previously
auto-initialized the cluster when `cockroach start` was invoked without
a corresponding `--join` flag. We update a few tests along the way that
made use of this behavior by changing them to either explicitly init, or
now lean on `roachprod` changes (from cockroachdb#51329) that transparently manage
initialization.

Release note (backward-incompatible change): A CockroachDB node
started with cockroach start without the --join flag no
longer automatically initializes the cluster. The `cockroach init`
command is now mandatory. The auto-initialization behavior had
been deprecated in version 19.2.

Release note (backward-incompatible change): `cockroach start` without
`--join` is no longer supported. It was deprecated in a previous
release.
Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @tbg)


pkg/cli/start.go, line 544 at r2 (raw file):

Previously, knz (kena) wrote…

Please consider using an error hint for the second sentence instead. This presents better on the console.

Done.


pkg/server/init.go, line 34 at r2 (raw file):

Previously, knz (kena) wrote…

Can you add the error hint here too. Especially if the CLI code now returns ErrClusterInitialized without adding the hint itself.

Discussed offline, decided to skip the hint here as the error is self-explanatory.

@irfansharif
Copy link
Contributor Author

bors r+

@irfansharif irfansharif force-pushed the 200709.remove-auto-init-no-join branch from 57a3b52 to b5e7171 Compare July 17, 2020 16:41
@craig
Copy link
Contributor

craig bot commented Jul 17, 2020

Canceled

@irfansharif
Copy link
Contributor Author

bors r+

@irfansharif
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 17, 2020

Build succeeded

@craig craig bot merged commit c09ed61 into cockroachdb:master Jul 17, 2020
@irfansharif irfansharif deleted the 200709.remove-auto-init-no-join branch July 17, 2020 21:25
irfansharif added a commit to irfansharif/jepsen that referenced this pull request Jul 23, 2020
When starting CockroachDB nodes, always include join flags. In order to
actually initialize the cluster, rely on an explicit `cockroach init`.

Previously any node started without explicit join flags was tasked with
bootstrapping the cluster. This was deprecated behavior, and was removed
in cockroachdb/cockroach#51245.
irfansharif added a commit to irfansharif/jepsen that referenced this pull request Jul 23, 2020
irfansharif added a commit to irfansharif/jepsen that referenced this pull request Jul 23, 2020
aliher1911 pushed a commit to aliher1911/jepsen that referenced this pull request Dec 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cli: remove auto-init for start without --join.
4 participants