Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: integrate the TLS auto-negotiation in the start commands #63850

Open
knz opened this issue Apr 19, 2021 · 3 comments
Open

server: integrate the TLS auto-negotiation in the start commands #63850

knz opened this issue Apr 19, 2021 · 3 comments
Labels
A-authentication Pertains to authn subsystems A-cli-server CLI commands that pertain to CockroachDB server processes A-kv-server Relating to the KV-level RPC server A-security C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-server-and-security DB Server & Security

Comments

@knz
Copy link
Contributor

knz commented Apr 19, 2021

This is a sub-issue of #60632.

We want the start commands to determine their TLS certificate automatically, if given a --num-expected-initial-nodes and --init-token flags.

There are two cases to consider:

  • the cluster is not initialized yet. In this case, we want to use the "initial handshake" protocol.
  • the cluster is initialized already. In this case, we want to use the "add node" protocol.

What is complicated is that at the moment the process starts, we do not yet know which of the two situations apply. In the base start code, which of the two modes apply is discovered after the the new node has attempted to join another node, and the current cluster status is determined from the other side of the RPC connection.

So we cannot implement the automatic logic in start by just looking at the state of the TLS certificates on disk.
We also need to know whether the rest of the cluster is initialized already or not.

We can do this in either of two ways:

a. either --join mentions other nodes. In that case, we can connect to the other node to see if the cluster is already initialized.

b. --join does not mention other nodes (i.e. the current node is going to be a join target for other nodes). In that case, we don't know whether the rest of the cluster is initialized already until another already-initialized node joins.

It's possible that the second case here is bogus, because of the discussion in #61621.

So let us focus on the first case for a bit. At the moment the process starts, it knows its --join flag already, but not its TLS certificates. So it cannot connect safely to the remote node yet. What can it connect to, to determine node status?

Thanks to #63492, there is a non-authenticated endpoint that could be used for this purpose. The details need to be investigated.

Jira issue: CRDB-6796
Epic: CRDB-6663

@knz knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-server Relating to the KV-level RPC server A-security A-cli-server CLI commands that pertain to CockroachDB server processes labels Apr 19, 2021
@knz
Copy link
Contributor Author

knz commented Apr 19, 2021

cc @aaron-crl @bdarnell @itsbilal ^^ do you have thoughts about the above?

It might be worth using the same RPC or HTTP endpoint for both cases ("init handshake" and "add node"), so that the code can distinguish the cases using different responses. WDYT?

@bdarnell
Copy link
Contributor

I think it should use the same RequestCA endpoint as in #63492 to determine which case we're in. Then if the cluster is already initialized I'd keep things as close to the add-node flow as possible, but if it's not initialized I think the rendezvous for the initial nodes would happen on a different endpoint.

Regarding the second case, I think we have an implicit assumption that there is at most one node without a join flag per cluster (consider the case where A and B have no join flags and C has --join=A,B. The way things have worked so far there is no guarantee that A and B would ever get connected to each other). Given that, I think it's reasonable to say that the join-less node A) should appear in all other nodes' join flags and B) should be the target of the init command. If we wanted to introduce better support for mixing nodes with and without join flags we might want to reconsider that, but I don't think there's much need. A single join-less node is useful for upgrading from single to multi-node clusters and in some orchestration environments where it is difficult to predict hostnames in advance, but I don't think there's much need for multiple such nodes.

@knz
Copy link
Contributor Author

knz commented Apr 19, 2021

FWIW:

it's reasonable to say that the join-less node A) should appear in all other nodes' join flags and B) should be the target of the init command.

This has already become a requirement since #32574 was closed (#30553 and related, see PR #52526). Irfan reminded me of this in #61621.

@jlinder jlinder added the T-server-and-security DB Server & Security label Jun 16, 2021
@knz knz added the A-authentication Pertains to authn subsystems label Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-authentication Pertains to authn subsystems A-cli-server CLI commands that pertain to CockroachDB server processes A-kv-server Relating to the KV-level RPC server A-security C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-server-and-security DB Server & Security
Projects
None yet
Development

No branches or pull requests

3 participants