server: simplify bootstrap via fatter init server #46843

tbg · 2020-04-01T11:46:07Z

Motivation

Today, starting up a server is complicated. This is especially true
when bootstrap is necessary. By this, we mean that either

no NodeID is known. This implies that none of the engines were
initialized yet
the NodeID is known (i.e. at least one store is initialized), but a
new engine was added.

When the process first starts, and a NodeID is known, a ClusterID is
also known (since they get persisted together, on every store). For the
same reason, a persisted cluster version is known. In this case, we can
start a fully initialized cluster, so this is the easy case.

It is more difficult when no NodeID is known, in which case the server
is just starting up for the first time, with its engines all blank. It
needs to somehow allocate NodeIDs (and StoreIDs) which are then written
to the engines. It also needs to, at least initially, use the lowest
possible cluster version it can tolerate (after all, the cluster may
actually run at that lowest version). Right now, there is a delicate
dance: we thread late-bound ClusterID and NodeID containers all over the
place, spin up a mostly dysfunctional Node, use its KV client to
allocate NodeID and StoreIDs once this is possible - we need Gossip
to have connected first - and update the containers. It is complex,
error prone, and ossifies any code it touches.

Cluster versions deserve an extra shout-out for complexity. Even if
a cluster version was persisted, the node may have been decommissioned
and the cluster upgraded. Much work went into our RPC layer to prevent
connections between incompatible nodes, but there is no boot-time check
that results in a swift and descriptive error - there will be a fatal
error, originating from the RPC subsystem. One aim of our work is to
simplify this by checking the version via an RPC before setting too
many gears in motion.

Context

This marks the beginning of a series of PRs aiming at improving the
startup code. Ultimately, the goal is to have the Node and all Stores
bootstrapped as early as possible, but in particular before starting
the KV or SQL server subsystems.

Furthermore, we want to achieve this without relying on Gossip, to
prepare for a split between the SQL and KV servers in the context of
multitenancy support (SQL server won't be able to rely on Gossip, but
will still need to announce itself to the KV servers).

This PR

This PR is an initial simplifying refactor that can help achieve these
goals. The init server (which hosts the Bootstrap RPC) is given more
responsibilities: it is now directly in charge of determining which, if
any, engines are bootstrapped, and explicitly listens to Gossip as well
as the Bootstrap RPC. It returns the cluster ID to the main server
start-up goroutine when it is available. As a result, the main startup
code has simplified, and a thread to be pulled on further has appeared
and is called out in TODOs.

Down the road (i.e. in later PRs), the init server will bootstrap a
NodeID and StoreIDs even when joining an existing cluster. It will
initially mimic/front-load the strategy taken by Node today, i.e. use a
kv.DB, but ultimately will bypass Gossip completely and use a simple
RPC call to ask the existing cluster to assign these IDs as needed. This
RPC will also establish the active cluster version, which is required
for SQL multi-tenancy, and generally follows the ideas in #32574.

Release note: None

cockroach-teamcity · 2020-04-01T11:46:17Z

This change is

It was mixing NodeIDs within a Node, which was exposed when I refactored inspectEngines. Release note: None

\## Motivation Today, starting up a server is complicated. This is especially true when bootstrap is necessary. By this, we mean that either - no NodeID is known. This implies that none of the engines were initialized yet - the NodeID is known (i.e. at least one store is initialized), but a new engine was added. When the process first starts, and a NodeID is known, a ClusterID is also known (since they get persisted together, on every store). For the same reason, a persisted cluster version is known. In this case, we can start a fully initialized cluster, so this is the easy case. It is more difficult when no NodeID is known, in which case the server is just starting up for the first time, with its engines all blank. It needs to somehow allocate NodeIDs (and StoreIDs) which are then written to the engines. It also needs to, at least initially, use the lowest possible cluster version it can tolerate (after all, the cluster may actually run at that lowest version). Right now, there is a delicate dance: we thread late-bound ClusterID and NodeID containers all over the place, spin up a mostly dysfunctional Node, use its KV client to allocate NodeID and StoreIDs once this is possible - we need Gossip to have connected first - and update the containers. It is complex, error prone, and ossifies any code it touches. Cluster versions deserve an extra shout-out for complexity. Even if a cluster version was persisted, the node may have been decommissioned and the cluster upgraded. Much work went into our RPC layer to prevent connections between incompatible nodes, but there is no boot-time check that results in a swift and descriptive error - there will be a fatal error, originating from the RPC subsystem. One aim of our work is to simplify this by checking the version via an RPC before setting too many gears in motion. \## Context This marks the beginning of a series of PRs aiming at improving the startup code. Ultimately, the goal is to have the Node and all Stores bootstrapped as early as possible, but in particular before starting the KV or SQL server subsystems. Furthermore, we want to achieve this without relying on Gossip, to prepare for a split between the SQL and KV servers in the context of multitenancy support (SQL server won't be able to rely on Gossip, but will still need to announce itself to the KV servers). \## This PR This PR is an initial simplifying refactor that can help achieve these goals. The init server (which hosts the Bootstrap RPC) is given more responsibilities: it is now directly in charge of determining which, if any, engines are bootstrapped, and explicitly listens to Gossip as well as the Bootstrap RPC. It returns the cluster ID to the main server start-up goroutine when it is available. As a result, the main startup code has simplified, and a thread to be pulled on further has appeared and is called out in TODOs. Down the road (i.e. in later PRs), the init server will bootstrap a NodeID and StoreIDs even when joining an existing cluster. It will initially mimic/front-load the strategy taken by Node today, i.e. use a `kv.DB`, but ultimately will bypass Gossip completely and use a simple RPC call to ask the existing cluster to assign these IDs as needed. This RPC will also establish the active cluster version, which is required for SQL multi-tenancy, and generally follows the ideas in cockroachdb#32574. Release note: None

Pre these refactors, many places in the code had to care about whether Gossip had already connected, which was used as a proxy for determining whether the clusterID was available. Now, though, we always have the clusterID before the Node even gets started. As a result, while we *do* want to log when Gossip does connect, internally it matters very little. As a result, various code paths have been simplified. In particular, the Node was awkwardly involved in verifying that we weren't cross-connecting between clusters, which has now been pulled out. After this commit, the few remaining uses of `gossip.Connected` are sane. Release note: None

ReadyFn was never called with waitForInit set to true since we were already past blocking when we'd initiate the call. Caught by `pkg/cli/interactive_tests/test_init_command.tcl`. Release note: None

andreimatei

Please see the "so what's the idea here" comment.

Reviewed 1 of 1 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)

pkg/server/config.go, line 265 at r2 (raw file):

	// will be waiting for an `init` command or accept bootstrapping
	// from a joined node. It is set in an advisory fashion, that is,
	// should be used for logging output only.

Hmm what is this new sentence trying to tell me?

pkg/server/init.go, line 35 at r2 (raw file):

// initServer manages the temporary init server used during
// bootstrapping.

mind adding more words here?

pkg/server/init.go, line 48 at r2 (raw file):

	binaryVersion, binaryMinSupportedVersion       roachpb.Version
	bootstrapVersion                               clusterversion.ClusterVersion

would roachpb.Version be more appropriate here?

pkg/server/init.go, line 48 at r2 (raw file):

	binaryVersion, binaryMinSupportedVersion       roachpb.Version
	bootstrapVersion                               clusterversion.ClusterVersion

please give bootstrapVersion a comment`

pkg/server/init.go, line 112 at r2 (raw file):

}

// ServeAndWait sets up the initServer to accept Bootstrap requests (which will

I think there's something wrong with the first sentence; the "which" seems off.

pkg/server/init.go, line 113 at r2 (raw file):

// ServeAndWait sets up the initServer to accept Bootstrap requests (which will
// block until then). It uses the provided engines and gossip to block until

Can you explain

pkg/server/init.go, line 146 at r2 (raw file):

		// now. Without this return, we could have all nodes in the cluster
		// deadlocked on g.Connected below. For similar reasons, we can't ever
		// hope to initialize the newEngines below, for which we would need to

can you pls add comments to the method about this - how new engines are not initialized if some of the engines were initialized already?

pkg/server/init.go, line 155 at r2 (raw file):

	}

	log.Info(ctx, "no stores bootstrapped and --join flag specified, awaiting init command or join with an already initialized node.")

long line

pkg/server/init.go, line 217 at r2 (raw file):

	ctx context.Context, request *serverpb.BootstrapRequest,
) (response *serverpb.BootstrapResponse, err error) {
	<-s.bootstrapBlockCh // block until ServeAndWait() is active

so what's the idea here? How come we open the port / register this RPC service before we decide whether we actually want to accept a Bootstrap RPC? Can't we just register the service after inspecting the engines?

pkg/server/init.go, line 223 at r2 (raw file):

		return nil, err
	}
	state, err := bootstrapCluster(ctx, s.engs, s.bootstrapVersion, s.bootstrapZoneConfig, s.bootstrapSystemZoneConfig)

long line

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)

pkg/server/init.go, line 113 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Can you explain

Can you comment more on what the deal with the engines is - that we check to see if any had been bootstrapped? And also comment about under which circumstances engines get bootstrapped by this method, and when they don't?

It seems to me that it'd bundling the checking of engines with this method is weird. Do you think we could lift that part outside of the init server, or at least outside of anything that has to do with close(s.bootstrapBlockCh)?

tbg

TFTR! As we discussed in our 1:1, I am reacting to your comments over in #46975.

bors r=andreimatei

Dismissed @andreimatei from a discussion.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)

pkg/server/init.go, line 35 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

mind adding more words here?

Done.

pkg/server/init.go, line 48 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

would roachpb.Version be more appropriate here?

Done.

pkg/server/init.go, line 48 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please give bootstrapVersion a comment`

Done.

pkg/server/init.go, line 113 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Can you comment more on what the deal with the engines is - that we check to see if any had been bootstrapped? And also comment about under which circumstances engines get bootstrapped by this method, and when they don't?

It seems to me that it'd bundling the checking of engines with this method is weird. Do you think we could lift that part outside of the init server, or at least outside of anything that has to do with close(s.bootstrapBlockCh)?

Done.

pkg/server/init.go, line 146 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

can you pls add comments to the method about this - how new engines are not initialized if some of the engines were initialized already?

Done. The comments will get clearer once I give the initServer a *kv.DB. So far the awkwardness that when joining an existing cluster we only wait for Gossip and let the *Node handle the rest remains. Not for long.

pkg/server/init.go, line 155 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

long line

Done.

pkg/server/init.go, line 217 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

so what's the idea here? How come we open the port / register this RPC service before we decide whether we actually want to accept a Bootstrap RPC? Can't we just register the service after inspecting the engines?

I managed to address this over in #46975.

pkg/server/init.go, line 223 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

long line

Done.

craig · 2020-04-07T11:25:38Z

Build succeeded

GitHub CI (Cockroach)

46756: sql: improve SHOW TABLES to show schema and type r=rohany a=otan Resolves #46553 This change is more closely aligned with postgres' `\d` and seems useful as we begin to support user-defined schemas. Release note (sql change): This change modifies SHOW TABLES to return schema and table type. Furthermore, sequences will now appear in SHOW TABLES. Any user who relies on SHOW TABLES to return one column can use `SELECT table_name FROM [SHOW TABLES]` to get compatible behavior between previous versions of CRDB. 46975: server: first pass at splitting up NewServer r=andreimatei a=tbg First four commits from #46843. ---- Ultimately we want to separate the SQL and KV startup paths. This is difficult because of various cyclic dependencies; for example, the KV subsystem relies on being able to run simple SQL queries, and the SQL subsystem in turn relies on node liveness, Gossip, and others that in a multi-tenant future it won't have access to. This won't be untangled in one commit, but here is a modest start: We split out a `newSQLServer` method which we call in `NewServer` and which populates the `(Server).sqlServer` field, which now houses all that was formerly in `Server` but really ought to be owned by SQL. A few of these PRs in, you can imagine a structure emerging in which, modulo the cross-dependencies which will then clearly be visible, ``` type Server struct // ... kvServer *kv.Server sqlServer *sql.Server } ``` Release note: None 47034: roachtest: fix sqlalchemy roachtest r=rafiss a=rafiss This test was failing because an upstream change made the test incompatible with the latest version of pytest. It is fixed by pinning the version of pytest that gets installed. The ignore list is also updated to deal with a test result parsing issue. fixes #45989 Release note: None Co-authored-by: Oliver Tan <otan@cockroachlabs.com> Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>

Fallout from: - cockroachdb#46843 - cockroachdb#46975 - cockroachdb#47509 - cockroachdb#47517 - cockroachdb#47570

47616: kv/concurrency: remove TODO about impossible deadlock scenario r=nvanbenschoten a=nvanbenschoten This commit removes a TODO that described a potential scenario in which a request-only dependency cycle could deadlock due to a simultaneous transaction abort. I realized after writing a full fix (9a9182f) that such a deadlock was not achievable in practice due to the semantics of the lockTable. > It may appear that there is a bug here in the handling of request-only > dependency cycles. If such a cycle was broken by simultaneously aborting > the transactions responsible for each of the request, there would be no > guarantee that an aborted pusher would notice that its own transaction > was aborted before it notices that its pushee's transaction was aborted. > For example, in the simplest case, imagine two requests deadlocked on > each other. If their transactions are both aborted and each push notices > the pushee is aborted first, they will both return here triumphantly and > wait for the other to exit its lock wait-queues, leading to deadlock. > Even if they eventually pushed each other again, there would be no > guarantee that the same thing wouldn't happen. > > However, such a situation is not possible in practice because such a > dependency cycle is never constructed by the lockTable. The lockTable > assigns each request a monotonically increasing sequence number upon its > initial entrance to the lockTable. This sequence number is used to > straighten out dependency chains of requests such that a request only > waits on conflicting requests with lower sequence numbers than its own > sequence number. This behavior guarantees that request-only dependency > cycles are never constructed by the lockTable. Put differently, all > dependency cycles must include at least one dependency on a lock and, > therefore, one call to pushLockTxn. Unlike pushRequestTxn, pushLockTxn > actively removes the conflicting lock and removes the dependency when it > determines that its pushee transaction is aborted. This means that the > call to pushLockTxn will continue to make forward progress in the case of > a simultaneous abort of all transactions behind the members of the cycle, > preventing such a hypothesized deadlock from ever materializing. The PR also includes a good amount of other cleanup that I was intending to land with the fix to this deadlock. For instance, it splits off a `LockAcquisition` type from the `LockUpdate` type. It also cleans up the internals of `lock_table.go`. 47727: server: various server construction cleanup r=nvanbenschoten a=nvanbenschoten This is all cleanup I stumbled upon while catching up on the changes made in the following PRs: - #46843 - #46975 - #47509 - #47517 - #47570 It's all pretty minor. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

tbg force-pushed the initserver branch 3 times, most recently from f86a231 to ad15c36 Compare April 1, 2020 16:42

tbg changed the title ~~[wip] server: simplify bootstrap via fatter init server~~ server: simplify bootstrap via fatter init server Apr 1, 2020

tbg requested a review from andreimatei April 1, 2020 16:43

tbg marked this pull request as ready for review April 1, 2020 16:43

tbg requested a review from a team as a code owner April 1, 2020 16:43

tbg force-pushed the initserver branch from ad15c36 to 89ce555 Compare April 3, 2020 09:25

tbg mentioned this pull request Apr 3, 2020

server: first pass at splitting up NewServer #46975

Merged

tbg force-pushed the initserver branch from 89ce555 to 0b3f738 Compare April 3, 2020 21:07

tbg added 4 commits April 4, 2020 22:18

server: fix buglet in TestClusterIDMismatch

1206ccd

It was mixing NodeIDs within a Node, which was exposed when I refactored inspectEngines. Release note: None

server: fix ReadyFn when waitForInit==true

e0fbd35

ReadyFn was never called with waitForInit set to true since we were already past blocking when we'd initiate the call. Caught by `pkg/cli/interactive_tests/test_init_command.tcl`. Release note: None

tbg force-pushed the initserver branch from 0b3f738 to e0fbd35 Compare April 4, 2020 20:18

andreimatei reviewed Apr 6, 2020

View reviewed changes

andreimatei suggested changes Apr 6, 2020

View reviewed changes

tbg requested a review from andreimatei April 7, 2020 10:31

tbg commented Apr 7, 2020

View reviewed changes

craig bot merged commit 5a3aae1 into cockroachdb:master Apr 7, 2020

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Apr 20, 2020

server: minor comment updates

0967d86

Fallout from: - cockroachdb#46843 - cockroachdb#46975 - cockroachdb#47509 - cockroachdb#47517 - cockroachdb#47570

nvanbenschoten mentioned this pull request Apr 20, 2020

server: various server construction cleanup #47727

Merged

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Apr 22, 2020

server: minor comment updates

80a75ad

Fallout from: - cockroachdb#46843 - cockroachdb#46975 - cockroachdb#47509 - cockroachdb#47517 - cockroachdb#47570

tbg added the A-multitenancy Related to multi-tenancy label Apr 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: simplify bootstrap via fatter init server #46843

server: simplify bootstrap via fatter init server #46843

tbg commented Apr 1, 2020 •

edited

Loading

cockroach-teamcity commented Apr 1, 2020

andreimatei left a comment

andreimatei left a comment

tbg left a comment

craig bot commented Apr 7, 2020

server: simplify bootstrap via fatter init server #46843

server: simplify bootstrap via fatter init server #46843

Conversation

tbg commented Apr 1, 2020 • edited Loading

Motivation

Context

This PR

cockroach-teamcity commented Apr 1, 2020

andreimatei left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

craig bot commented Apr 7, 2020

Build succeeded

tbg commented Apr 1, 2020 •

edited

Loading