-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: simplify bootstrap via fatter init server #46843
Conversation
f86a231
to
ad15c36
Compare
It was mixing NodeIDs within a Node, which was exposed when I refactored inspectEngines. Release note: None
\## Motivation Today, starting up a server is complicated. This is especially true when bootstrap is necessary. By this, we mean that either - no NodeID is known. This implies that none of the engines were initialized yet - the NodeID is known (i.e. at least one store is initialized), but a new engine was added. When the process first starts, and a NodeID is known, a ClusterID is also known (since they get persisted together, on every store). For the same reason, a persisted cluster version is known. In this case, we can start a fully initialized cluster, so this is the easy case. It is more difficult when no NodeID is known, in which case the server is just starting up for the first time, with its engines all blank. It needs to somehow allocate NodeIDs (and StoreIDs) which are then written to the engines. It also needs to, at least initially, use the lowest possible cluster version it can tolerate (after all, the cluster may actually run at that lowest version). Right now, there is a delicate dance: we thread late-bound ClusterID and NodeID containers all over the place, spin up a mostly dysfunctional Node, use its KV client to allocate NodeID and StoreIDs once this is possible - we need Gossip to have connected first - and update the containers. It is complex, error prone, and ossifies any code it touches. Cluster versions deserve an extra shout-out for complexity. Even if a cluster version was persisted, the node may have been decommissioned and the cluster upgraded. Much work went into our RPC layer to prevent connections between incompatible nodes, but there is no boot-time check that results in a swift and descriptive error - there will be a fatal error, originating from the RPC subsystem. One aim of our work is to simplify this by checking the version via an RPC before setting too many gears in motion. \## Context This marks the beginning of a series of PRs aiming at improving the startup code. Ultimately, the goal is to have the Node and all Stores bootstrapped as early as possible, but in particular before starting the KV or SQL server subsystems. Furthermore, we want to achieve this without relying on Gossip, to prepare for a split between the SQL and KV servers in the context of multitenancy support (SQL server won't be able to rely on Gossip, but will still need to announce itself to the KV servers). \## This PR This PR is an initial simplifying refactor that can help achieve these goals. The init server (which hosts the Bootstrap RPC) is given more responsibilities: it is now directly in charge of determining which, if any, engines are bootstrapped, and explicitly listens to Gossip as well as the Bootstrap RPC. It returns the cluster ID to the main server start-up goroutine when it is available. As a result, the main startup code has simplified, and a thread to be pulled on further has appeared and is called out in TODOs. Down the road (i.e. in later PRs), the init server will bootstrap a NodeID and StoreIDs even when joining an existing cluster. It will initially mimic/front-load the strategy taken by Node today, i.e. use a `kv.DB`, but ultimately will bypass Gossip completely and use a simple RPC call to ask the existing cluster to assign these IDs as needed. This RPC will also establish the active cluster version, which is required for SQL multi-tenancy, and generally follows the ideas in cockroachdb#32574. Release note: None
Pre these refactors, many places in the code had to care about whether Gossip had already connected, which was used as a proxy for determining whether the clusterID was available. Now, though, we always have the clusterID before the Node even gets started. As a result, while we *do* want to log when Gossip does connect, internally it matters very little. As a result, various code paths have been simplified. In particular, the Node was awkwardly involved in verifying that we weren't cross-connecting between clusters, which has now been pulled out. After this commit, the few remaining uses of `gossip.Connected` are sane. Release note: None
ReadyFn was never called with waitForInit set to true since we were already past blocking when we'd initiate the call. Caught by `pkg/cli/interactive_tests/test_init_command.tcl`. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the "so what's the idea here" comment.
Reviewed 1 of 1 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)
pkg/server/config.go, line 265 at r2 (raw file):
// will be waiting for an `init` command or accept bootstrapping // from a joined node. It is set in an advisory fashion, that is, // should be used for logging output only.
Hmm what is this new sentence trying to tell me?
pkg/server/init.go, line 35 at r2 (raw file):
// initServer manages the temporary init server used during // bootstrapping.
mind adding more words here?
pkg/server/init.go, line 48 at r2 (raw file):
binaryVersion, binaryMinSupportedVersion roachpb.Version bootstrapVersion clusterversion.ClusterVersion
would roachpb.Version
be more appropriate here?
pkg/server/init.go, line 48 at r2 (raw file):
binaryVersion, binaryMinSupportedVersion roachpb.Version bootstrapVersion clusterversion.ClusterVersion
please give bootstrapVersion
a comment`
pkg/server/init.go, line 112 at r2 (raw file):
} // ServeAndWait sets up the initServer to accept Bootstrap requests (which will
I think there's something wrong with the first sentence; the "which" seems off.
pkg/server/init.go, line 113 at r2 (raw file):
// ServeAndWait sets up the initServer to accept Bootstrap requests (which will // block until then). It uses the provided engines and gossip to block until
Can you explain
pkg/server/init.go, line 146 at r2 (raw file):
// now. Without this return, we could have all nodes in the cluster // deadlocked on g.Connected below. For similar reasons, we can't ever // hope to initialize the newEngines below, for which we would need to
can you pls add comments to the method about this - how new engines are not initialized if some of the engines were initialized already?
pkg/server/init.go, line 155 at r2 (raw file):
} log.Info(ctx, "no stores bootstrapped and --join flag specified, awaiting init command or join with an already initialized node.")
long line
pkg/server/init.go, line 217 at r2 (raw file):
ctx context.Context, request *serverpb.BootstrapRequest, ) (response *serverpb.BootstrapResponse, err error) { <-s.bootstrapBlockCh // block until ServeAndWait() is active
so what's the idea here? How come we open the port / register this RPC service before we decide whether we actually want to accept a Bootstrap
RPC? Can't we just register the service after inspecting the engines?
pkg/server/init.go, line 223 at r2 (raw file):
return nil, err } state, err := bootstrapCluster(ctx, s.engs, s.bootstrapVersion, s.bootstrapZoneConfig, s.bootstrapSystemZoneConfig)
long line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)
pkg/server/init.go, line 113 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Can you explain
Can you comment more on what the deal with the engines is - that we check to see if any had been bootstrapped? And also comment about under which circumstances engines get bootstrapped by this method, and when they don't?
It seems to me that it'd bundling the checking of engines with this method is weird. Do you think we could lift that part outside of the init server, or at least outside of anything that has to do with close(s.bootstrapBlockCh)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR! As we discussed in our 1:1, I am reacting to your comments over in #46975.
bors r=andreimatei
Dismissed @andreimatei from a discussion.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)
pkg/server/init.go, line 35 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
mind adding more words here?
Done.
pkg/server/init.go, line 48 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
would
roachpb.Version
be more appropriate here?
Done.
pkg/server/init.go, line 48 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
please give
bootstrapVersion
a comment`
Done.
pkg/server/init.go, line 113 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Can you comment more on what the deal with the engines is - that we check to see if any had been bootstrapped? And also comment about under which circumstances engines get bootstrapped by this method, and when they don't?
It seems to me that it'd bundling the checking of engines with this method is weird. Do you think we could lift that part outside of the init server, or at least outside of anything that has to do with
close(s.bootstrapBlockCh)
?
Done.
pkg/server/init.go, line 146 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
can you pls add comments to the method about this - how new engines are not initialized if some of the engines were initialized already?
Done. The comments will get clearer once I give the initServer a *kv.DB. So far the awkwardness that when joining an existing cluster we only wait for Gossip and let the *Node
handle the rest remains. Not for long.
pkg/server/init.go, line 155 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
long line
Done.
pkg/server/init.go, line 217 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
so what's the idea here? How come we open the port / register this RPC service before we decide whether we actually want to accept a
Bootstrap
RPC? Can't we just register the service after inspecting the engines?
I managed to address this over in #46975.
pkg/server/init.go, line 223 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
long line
Done.
Build succeeded |
46756: sql: improve SHOW TABLES to show schema and type r=rohany a=otan Resolves #46553 This change is more closely aligned with postgres' `\d` and seems useful as we begin to support user-defined schemas. Release note (sql change): This change modifies SHOW TABLES to return schema and table type. Furthermore, sequences will now appear in SHOW TABLES. Any user who relies on SHOW TABLES to return one column can use `SELECT table_name FROM [SHOW TABLES]` to get compatible behavior between previous versions of CRDB. 46975: server: first pass at splitting up NewServer r=andreimatei a=tbg First four commits from #46843. ---- Ultimately we want to separate the SQL and KV startup paths. This is difficult because of various cyclic dependencies; for example, the KV subsystem relies on being able to run simple SQL queries, and the SQL subsystem in turn relies on node liveness, Gossip, and others that in a multi-tenant future it won't have access to. This won't be untangled in one commit, but here is a modest start: We split out a `newSQLServer` method which we call in `NewServer` and which populates the `(Server).sqlServer` field, which now houses all that was formerly in `Server` but really ought to be owned by SQL. A few of these PRs in, you can imagine a structure emerging in which, modulo the cross-dependencies which will then clearly be visible, ``` type Server struct // ... kvServer *kv.Server sqlServer *sql.Server } ``` Release note: None 47034: roachtest: fix sqlalchemy roachtest r=rafiss a=rafiss This test was failing because an upstream change made the test incompatible with the latest version of pytest. It is fixed by pinning the version of pytest that gets installed. The ignore list is also updated to deal with a test result parsing issue. fixes #45989 Release note: None Co-authored-by: Oliver Tan <otan@cockroachlabs.com> Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
47616: kv/concurrency: remove TODO about impossible deadlock scenario r=nvanbenschoten a=nvanbenschoten This commit removes a TODO that described a potential scenario in which a request-only dependency cycle could deadlock due to a simultaneous transaction abort. I realized after writing a full fix (9a9182f) that such a deadlock was not achievable in practice due to the semantics of the lockTable. > It may appear that there is a bug here in the handling of request-only > dependency cycles. If such a cycle was broken by simultaneously aborting > the transactions responsible for each of the request, there would be no > guarantee that an aborted pusher would notice that its own transaction > was aborted before it notices that its pushee's transaction was aborted. > For example, in the simplest case, imagine two requests deadlocked on > each other. If their transactions are both aborted and each push notices > the pushee is aborted first, they will both return here triumphantly and > wait for the other to exit its lock wait-queues, leading to deadlock. > Even if they eventually pushed each other again, there would be no > guarantee that the same thing wouldn't happen. > > However, such a situation is not possible in practice because such a > dependency cycle is never constructed by the lockTable. The lockTable > assigns each request a monotonically increasing sequence number upon its > initial entrance to the lockTable. This sequence number is used to > straighten out dependency chains of requests such that a request only > waits on conflicting requests with lower sequence numbers than its own > sequence number. This behavior guarantees that request-only dependency > cycles are never constructed by the lockTable. Put differently, all > dependency cycles must include at least one dependency on a lock and, > therefore, one call to pushLockTxn. Unlike pushRequestTxn, pushLockTxn > actively removes the conflicting lock and removes the dependency when it > determines that its pushee transaction is aborted. This means that the > call to pushLockTxn will continue to make forward progress in the case of > a simultaneous abort of all transactions behind the members of the cycle, > preventing such a hypothesized deadlock from ever materializing. The PR also includes a good amount of other cleanup that I was intending to land with the fix to this deadlock. For instance, it splits off a `LockAcquisition` type from the `LockUpdate` type. It also cleans up the internals of `lock_table.go`. 47727: server: various server construction cleanup r=nvanbenschoten a=nvanbenschoten This is all cleanup I stumbled upon while catching up on the changes made in the following PRs: - #46843 - #46975 - #47509 - #47517 - #47570 It's all pretty minor. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Motivation
Today, starting up a server is complicated. This is especially true
when bootstrap is necessary. By this, we mean that either
initialized yet
new engine was added.
When the process first starts, and a NodeID is known, a ClusterID is
also known (since they get persisted together, on every store). For the
same reason, a persisted cluster version is known. In this case, we can
start a fully initialized cluster, so this is the easy case.
It is more difficult when no NodeID is known, in which case the server
is just starting up for the first time, with its engines all blank. It
needs to somehow allocate NodeIDs (and StoreIDs) which are then written
to the engines. It also needs to, at least initially, use the lowest
possible cluster version it can tolerate (after all, the cluster may
actually run at that lowest version). Right now, there is a delicate
dance: we thread late-bound ClusterID and NodeID containers all over the
place, spin up a mostly dysfunctional Node, use its KV client to
allocate NodeID and StoreIDs once this is possible - we need Gossip
to have connected first - and update the containers. It is complex,
error prone, and ossifies any code it touches.
Cluster versions deserve an extra shout-out for complexity. Even if
a cluster version was persisted, the node may have been decommissioned
and the cluster upgraded. Much work went into our RPC layer to prevent
connections between incompatible nodes, but there is no boot-time check
that results in a swift and descriptive error - there will be a fatal
error, originating from the RPC subsystem. One aim of our work is to
simplify this by checking the version via an RPC before setting too
many gears in motion.
Context
This marks the beginning of a series of PRs aiming at improving the
startup code. Ultimately, the goal is to have the Node and all Stores
bootstrapped as early as possible, but in particular before starting
the KV or SQL server subsystems.
Furthermore, we want to achieve this without relying on Gossip, to
prepare for a split between the SQL and KV servers in the context of
multitenancy support (SQL server won't be able to rely on Gossip, but
will still need to announce itself to the KV servers).
This PR
This PR is an initial simplifying refactor that can help achieve these
goals. The init server (which hosts the Bootstrap RPC) is given more
responsibilities: it is now directly in charge of determining which, if
any, engines are bootstrapped, and explicitly listens to Gossip as well
as the Bootstrap RPC. It returns the cluster ID to the main server
start-up goroutine when it is available. As a result, the main startup
code has simplified, and a thread to be pulled on further has appeared
and is called out in TODOs.
Down the road (i.e. in later PRs), the init server will bootstrap a
NodeID and StoreIDs even when joining an existing cluster. It will
initially mimic/front-load the strategy taken by Node today, i.e. use a
kv.DB
, but ultimately will bypass Gossip completely and use a simpleRPC call to ask the existing cluster to assign these IDs as needed. This
RPC will also establish the active cluster version, which is required
for SQL multi-tenancy, and generally follows the ideas in #32574.
Release note: None