Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: simplify bootstrap via fatter init server #46843

Merged
merged 4 commits into from
Apr 7, 2020

Conversation

tbg
Copy link
Member

@tbg tbg commented Apr 1, 2020

Motivation

Today, starting up a server is complicated. This is especially true
when bootstrap is necessary. By this, we mean that either

  • no NodeID is known. This implies that none of the engines were
    initialized yet
  • the NodeID is known (i.e. at least one store is initialized), but a
    new engine was added.

When the process first starts, and a NodeID is known, a ClusterID is
also known (since they get persisted together, on every store). For the
same reason, a persisted cluster version is known. In this case, we can
start a fully initialized cluster, so this is the easy case.

It is more difficult when no NodeID is known, in which case the server
is just starting up for the first time, with its engines all blank. It
needs to somehow allocate NodeIDs (and StoreIDs) which are then written
to the engines. It also needs to, at least initially, use the lowest
possible cluster version it can tolerate (after all, the cluster may
actually run at that lowest version). Right now, there is a delicate
dance: we thread late-bound ClusterID and NodeID containers all over the
place, spin up a mostly dysfunctional Node, use its KV client to
allocate NodeID and StoreIDs once this is possible - we need Gossip
to have connected first - and update the containers. It is complex,
error prone, and ossifies any code it touches.

Cluster versions deserve an extra shout-out for complexity. Even if
a cluster version was persisted, the node may have been decommissioned
and the cluster upgraded. Much work went into our RPC layer to prevent
connections between incompatible nodes, but there is no boot-time check
that results in a swift and descriptive error - there will be a fatal
error, originating from the RPC subsystem. One aim of our work is to
simplify this by checking the version via an RPC before setting too
many gears in motion.

Context

This marks the beginning of a series of PRs aiming at improving the
startup code. Ultimately, the goal is to have the Node and all Stores
bootstrapped as early as possible, but in particular before starting
the KV or SQL server subsystems.

Furthermore, we want to achieve this without relying on Gossip, to
prepare for a split between the SQL and KV servers in the context of
multitenancy support (SQL server won't be able to rely on Gossip, but
will still need to announce itself to the KV servers).

This PR

This PR is an initial simplifying refactor that can help achieve these
goals. The init server (which hosts the Bootstrap RPC) is given more
responsibilities: it is now directly in charge of determining which, if
any, engines are bootstrapped, and explicitly listens to Gossip as well
as the Bootstrap RPC. It returns the cluster ID to the main server
start-up goroutine when it is available. As a result, the main startup
code has simplified, and a thread to be pulled on further has appeared
and is called out in TODOs.

Down the road (i.e. in later PRs), the init server will bootstrap a
NodeID and StoreIDs even when joining an existing cluster. It will
initially mimic/front-load the strategy taken by Node today, i.e. use a
kv.DB, but ultimately will bypass Gossip completely and use a simple
RPC call to ask the existing cluster to assign these IDs as needed. This
RPC will also establish the active cluster version, which is required
for SQL multi-tenancy, and generally follows the ideas in #32574.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg tbg force-pushed the initserver branch 3 times, most recently from f86a231 to ad15c36 Compare April 1, 2020 16:42
@tbg tbg changed the title [wip] server: simplify bootstrap via fatter init server server: simplify bootstrap via fatter init server Apr 1, 2020
@tbg tbg requested a review from andreimatei April 1, 2020 16:43
@tbg tbg marked this pull request as ready for review April 1, 2020 16:43
@tbg tbg requested a review from a team as a code owner April 1, 2020 16:43
tbg added 4 commits April 4, 2020 22:18
It was mixing NodeIDs within a Node, which was exposed when I refactored
inspectEngines.

Release note: None
\## Motivation

Today, starting up a server is complicated. This is especially true
when bootstrap is necessary. By this, we mean that either

- no NodeID is known. This implies that none of the engines were
  initialized yet
- the NodeID is known (i.e. at least one store is initialized), but a
  new engine was added.

When the process first starts, and a NodeID is known, a ClusterID is
also known (since they get persisted together, on every store). For the
same reason, a persisted cluster version is known. In this case, we can
start a fully initialized cluster, so this is the easy case.

It is more difficult when no NodeID is known, in which case the server
is just starting up for the first time, with its engines all blank. It
needs to somehow allocate NodeIDs (and StoreIDs) which are then written
to the engines. It also needs to, at least initially, use the lowest
possible cluster version it can tolerate (after all, the cluster may
actually run at that lowest version). Right now, there is a delicate
dance: we thread late-bound ClusterID and NodeID containers all over the
place, spin up a mostly dysfunctional Node, use its KV client to
allocate NodeID and StoreIDs once this is possible - we need Gossip
to have connected first - and update the containers. It is complex,
error prone, and ossifies any code it touches.

Cluster versions deserve an extra shout-out for complexity. Even if
a cluster version was persisted, the node may have been decommissioned
and the cluster upgraded. Much work went into our RPC layer to prevent
connections between incompatible nodes, but there is no boot-time check
that results in a swift and descriptive error - there will be a fatal
error, originating from the RPC subsystem. One aim of our work is to
simplify this by checking the version via an RPC before setting too
many gears in motion.

\## Context

This marks the beginning of a series of PRs aiming at improving the
startup code. Ultimately, the goal is to have the Node and all Stores
bootstrapped as early as possible, but in particular before starting
the KV or SQL server subsystems.

Furthermore, we want to achieve this without relying on Gossip, to
prepare for a split between the SQL and KV servers in the context of
multitenancy support (SQL server won't be able to rely on Gossip, but
will still need to announce itself to the KV servers).

\## This PR

This PR is an initial simplifying refactor that can help achieve these
goals. The init server (which hosts the Bootstrap RPC) is given more
responsibilities: it is now directly in charge of determining which, if
any, engines are bootstrapped, and explicitly listens to Gossip as well
as the Bootstrap RPC. It returns the cluster ID to the main server
start-up goroutine when it is available. As a result, the main startup
code has simplified, and a thread to be pulled on further has appeared
and is called out in TODOs.

Down the road (i.e. in later PRs), the init server will bootstrap a
NodeID and StoreIDs even when joining an existing cluster. It will
initially mimic/front-load the strategy taken by Node today, i.e. use a
`kv.DB`, but ultimately will bypass Gossip completely and use a simple
RPC call to ask the existing cluster to assign these IDs as needed. This
RPC will also establish the active cluster version, which is required
for SQL multi-tenancy, and generally follows the ideas in cockroachdb#32574.

Release note: None
Pre these refactors, many places in the code had to care about
whether Gossip had already connected, which was used as a proxy for
determining whether the clusterID was available.

Now, though, we always have the clusterID before the Node even gets
started. As a result, while we *do* want to log when Gossip does
connect, internally it matters very little.

As a result, various code paths have been simplified. In particular,
the Node was awkwardly involved in verifying that we weren't
cross-connecting between clusters, which has now been pulled out.

After this commit, the few remaining uses of `gossip.Connected` are
sane.

Release note: None
ReadyFn was never called with waitForInit set to true since we were
already past blocking when we'd initiate the call.

Caught by `pkg/cli/interactive_tests/test_init_command.tcl`.

Release note: None
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the "so what's the idea here" comment.

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)


pkg/server/config.go, line 265 at r2 (raw file):

	// will be waiting for an `init` command or accept bootstrapping
	// from a joined node. It is set in an advisory fashion, that is,
	// should be used for logging output only.

Hmm what is this new sentence trying to tell me?


pkg/server/init.go, line 35 at r2 (raw file):

// initServer manages the temporary init server used during
// bootstrapping.

mind adding more words here?


pkg/server/init.go, line 48 at r2 (raw file):

	binaryVersion, binaryMinSupportedVersion       roachpb.Version
	bootstrapVersion                               clusterversion.ClusterVersion

would roachpb.Version be more appropriate here?


pkg/server/init.go, line 48 at r2 (raw file):

	binaryVersion, binaryMinSupportedVersion       roachpb.Version
	bootstrapVersion                               clusterversion.ClusterVersion

please give bootstrapVersion a comment`


pkg/server/init.go, line 112 at r2 (raw file):

}

// ServeAndWait sets up the initServer to accept Bootstrap requests (which will

I think there's something wrong with the first sentence; the "which" seems off.


pkg/server/init.go, line 113 at r2 (raw file):

// ServeAndWait sets up the initServer to accept Bootstrap requests (which will
// block until then). It uses the provided engines and gossip to block until

Can you explain


pkg/server/init.go, line 146 at r2 (raw file):

		// now. Without this return, we could have all nodes in the cluster
		// deadlocked on g.Connected below. For similar reasons, we can't ever
		// hope to initialize the newEngines below, for which we would need to

can you pls add comments to the method about this - how new engines are not initialized if some of the engines were initialized already?


pkg/server/init.go, line 155 at r2 (raw file):

	}

	log.Info(ctx, "no stores bootstrapped and --join flag specified, awaiting init command or join with an already initialized node.")

long line


pkg/server/init.go, line 217 at r2 (raw file):

	ctx context.Context, request *serverpb.BootstrapRequest,
) (response *serverpb.BootstrapResponse, err error) {
	<-s.bootstrapBlockCh // block until ServeAndWait() is active

so what's the idea here? How come we open the port / register this RPC service before we decide whether we actually want to accept a Bootstrap RPC? Can't we just register the service after inspecting the engines?


pkg/server/init.go, line 223 at r2 (raw file):

		return nil, err
	}
	state, err := bootstrapCluster(ctx, s.engs, s.bootstrapVersion, s.bootstrapZoneConfig, s.bootstrapSystemZoneConfig)

long line

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)


pkg/server/init.go, line 113 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Can you explain

Can you comment more on what the deal with the engines is - that we check to see if any had been bootstrapped? And also comment about under which circumstances engines get bootstrapped by this method, and when they don't?

It seems to me that it'd bundling the checking of engines with this method is weird. Do you think we could lift that part outside of the init server, or at least outside of anything that has to do with close(s.bootstrapBlockCh)?

@tbg tbg requested a review from andreimatei April 7, 2020 10:31
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! As we discussed in our 1:1, I am reacting to your comments over in #46975.

bors r=andreimatei

Dismissed @andreimatei from a discussion.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)


pkg/server/init.go, line 35 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

mind adding more words here?

Done.


pkg/server/init.go, line 48 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

would roachpb.Version be more appropriate here?

Done.


pkg/server/init.go, line 48 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please give bootstrapVersion a comment`

Done.


pkg/server/init.go, line 113 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Can you comment more on what the deal with the engines is - that we check to see if any had been bootstrapped? And also comment about under which circumstances engines get bootstrapped by this method, and when they don't?

It seems to me that it'd bundling the checking of engines with this method is weird. Do you think we could lift that part outside of the init server, or at least outside of anything that has to do with close(s.bootstrapBlockCh)?

Done.


pkg/server/init.go, line 146 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

can you pls add comments to the method about this - how new engines are not initialized if some of the engines were initialized already?

Done. The comments will get clearer once I give the initServer a *kv.DB. So far the awkwardness that when joining an existing cluster we only wait for Gossip and let the *Node handle the rest remains. Not for long.


pkg/server/init.go, line 155 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

long line

Done.


pkg/server/init.go, line 217 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

so what's the idea here? How come we open the port / register this RPC service before we decide whether we actually want to accept a Bootstrap RPC? Can't we just register the service after inspecting the engines?

I managed to address this over in #46975.


pkg/server/init.go, line 223 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

long line

Done.

@craig
Copy link
Contributor

craig bot commented Apr 7, 2020

Build succeeded

@craig craig bot merged commit 5a3aae1 into cockroachdb:master Apr 7, 2020
craig bot pushed a commit that referenced this pull request Apr 8, 2020
46756: sql: improve SHOW TABLES to show schema and type r=rohany a=otan

Resolves #46553

This change is more closely aligned with postgres' `\d` and seems useful
as we begin to support user-defined schemas.

Release note (sql change): This change modifies SHOW TABLES to return
schema and table type. Furthermore, sequences will now appear in SHOW
TABLES. Any user who relies on SHOW TABLES to return one column can use
`SELECT table_name FROM [SHOW TABLES]` to get compatible behavior
between previous versions of CRDB.



46975: server: first pass at splitting up NewServer r=andreimatei a=tbg

First four commits from #46843.

----

Ultimately we want to separate the SQL and KV startup paths.
This is difficult because of various cyclic dependencies;
for example, the KV subsystem relies on being able to run
simple SQL queries, and the SQL subsystem in turn relies
on node liveness, Gossip, and others that in a multi-tenant
future it won't have access to.

This won't be untangled in one commit, but here is a modest start: We
split out a `newSQLServer` method which we call in `NewServer` and which
populates the `(Server).sqlServer` field, which now houses all that was
formerly in `Server` but really ought to be owned by SQL.

A few of these PRs in, you can imagine a structure emerging in which,
modulo the cross-dependencies which will then clearly be visible,

```
type Server struct
  // ...
  kvServer  *kv.Server
  sqlServer *sql.Server
}
```

Release note: None



47034: roachtest: fix sqlalchemy roachtest r=rafiss a=rafiss

This test was failing because an upstream change made the test
incompatible with the latest version of pytest.

It is fixed by pinning the version of pytest that gets installed. The
ignore list is also updated to deal with a test result parsing issue.

fixes #45989

Release note: None

Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Apr 20, 2020
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Apr 22, 2020
craig bot pushed a commit that referenced this pull request Apr 22, 2020
47616: kv/concurrency: remove TODO about impossible deadlock scenario r=nvanbenschoten a=nvanbenschoten

This commit removes a TODO that described a potential scenario in which a request-only dependency cycle could deadlock due to a simultaneous transaction abort. I realized after writing a full fix (9a9182f) that such a deadlock was not achievable in practice due to the semantics of the lockTable.

> It may appear that there is a bug here in the handling of request-only
> dependency cycles. If such a cycle was broken by simultaneously aborting
> the transactions responsible for each of the request, there would be no
> guarantee that an aborted pusher would notice that its own transaction
> was aborted before it notices that its pushee's transaction was aborted.
> For example, in the simplest case, imagine two requests deadlocked on
> each other. If their transactions are both aborted and each push notices
> the pushee is aborted first, they will both return here triumphantly and
> wait for the other to exit its lock wait-queues, leading to deadlock.
> Even if they eventually pushed each other again, there would be no
> guarantee that the same thing wouldn't happen.
>
> However, such a situation is not possible in practice because such a
> dependency cycle is never constructed by the lockTable. The lockTable
> assigns each request a monotonically increasing sequence number upon its
> initial entrance to the lockTable. This sequence number is used to
> straighten out dependency chains of requests such that a request only
> waits on conflicting requests with lower sequence numbers than its own
> sequence number. This behavior guarantees that request-only dependency
> cycles are never constructed by the lockTable. Put differently, all
> dependency cycles must include at least one dependency on a lock and,
> therefore, one call to pushLockTxn. Unlike pushRequestTxn, pushLockTxn
> actively removes the conflicting lock and removes the dependency when it
> determines that its pushee transaction is aborted. This means that the
> call to pushLockTxn will continue to make forward progress in the case of
> a simultaneous abort of all transactions behind the members of the cycle,
> preventing such a hypothesized deadlock from ever materializing.

The PR also includes a good amount of other cleanup that I was intending to land with the fix to this deadlock. For instance, it splits off a `LockAcquisition` type from the `LockUpdate` type. It also cleans up the internals of `lock_table.go`.

47727: server: various server construction cleanup r=nvanbenschoten a=nvanbenschoten

This is all cleanup I stumbled upon while catching up on the changes made in the following PRs:
- #46843
- #46975
- #47509
- #47517
- #47570

It's all pretty minor.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@tbg tbg added the A-multitenancy Related to multi-tenancy label Apr 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multitenancy Related to multi-tenancy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants