diff --git a/_includes/prod_deployment/insecure-start-nodes.md b/_includes/prod_deployment/insecure-start-nodes.md index a4a4432b06f..ba555756eb9 100644 --- a/_includes/prod_deployment/insecure-start-nodes.md +++ b/_includes/prod_deployment/insecure-start-nodes.md @@ -27,9 +27,10 @@ For each initial node of your cluster, complete the following steps: ~~~ $ cockroach start --insecure \ --host= \ - --join=:26257,:26257,:26257 \ + --locality= \ --cache=25% \ --max-sql-memory=25% \ + --join=:26257,:26257,:26257 \ --background ~~~ @@ -39,8 +40,9 @@ For each initial node of your cluster, complete the following steps: -----|------------ `--insecure` | Indicates that the cluster is insecure, with no network encryption or authentication. `--host` | Specifies the hostname or IP address to listen on for intra-cluster and client communication, as well as to identify the node in the Admin UI. If it is a hostname, it must be resolvable from all nodes, and if it is an IP address, it must be routable from all nodes.

If you want the node to listen on multiple interfaces, leave `--host` empty.

If you want the node to communicate with other nodes on an internal address (e.g., within a private network) while listening on all interfaces, leave `--host` empty and set the `--advertise-host` flag to the internal address. - `--join` | Identifies the address and port of 3-5 of the initial nodes of the cluster. + `--locality` | Key-value pairs that describe the location of the node, e.g., country, region, datacenter, rack, etc. The key-value pairs should be ordered from most inclusive to least inclusive, and the keys and the order of key-value pairs must be the same on all nodes. For example:

`--locality=region=east,datacenter=us-east-1`
`--locality=region=west,datacenter=us-west-1`

Locality is required to use the enterprise Table Partitioning and Node Map features as well as to benefit from CockroachDB's ability to dynamically optimize read latency for the location from which most of the workload is originating, also known as ["follow-the-workload"](demo-follow-the-workload.html). `--cache`
`--max-sql-memory` | Increases the node's cache and temporary SQL memory size to 25% of available system memory to improve read performance and increase capacity for in-memory SQL processing (see [Recommended Production Settings](recommended-production-settings.html) for more details). + `--join` | Identifies the address and port of 3-5 of the initial nodes of the cluster. `--background` | Starts the node in the background so you gain control of the terminal to issue more commands. For other flags not explicitly set, the command uses default values. For example, the node stores data in `--store=cockroach-data`, binds internal and client communication to `--port=26257`, and binds Admin UI HTTP requests to `--http-port=8080`. To set these options manually, see [Start a Node](start-a-node.html). diff --git a/_includes/prod_deployment/secure-start-nodes.md b/_includes/prod_deployment/secure-start-nodes.md index d989193854d..efa3f36638d 100644 --- a/_includes/prod_deployment/secure-start-nodes.md +++ b/_includes/prod_deployment/secure-start-nodes.md @@ -28,9 +28,10 @@ For each initial node of your cluster, complete the following steps: $ cockroach start \ --certs-dir=certs \ --host= \ - --join=:26257,:26257,:26257 \ + --locality= \ --cache=25% \ --max-sql-memory=25% \ + --join=:26257,:26257,:26257 \ --background ~~~ @@ -40,8 +41,9 @@ For each initial node of your cluster, complete the following steps: -----|------------ `--certs-dir` | Specifies the directory where you placed the `ca.crt` file and the `node.crt` and `node.key` files for the node. `--host` | Specifies the hostname or IP address to listen on for intra-cluster and client communication, as well as to identify the node in the Admin UI. If it is a hostname, it must be resolvable from all nodes, and if it is an IP address, it must be routable from all nodes.

If you want the node to listen on multiple interfaces, leave `--host` empty.

If you want the node to communicate with other nodes on an internal address (e.g., within a private network) while listening on all interfaces, leave `--host` empty and set the `--advertise-host` flag to the internal address. - `--join` | Identifies the address and port of 3-5 of the initial nodes of the cluster. + `--locality` | Key-value pairs that describe the location of the node, e.g., country, region, datacenter, rack, etc. The key-value pairs should be ordered from most inclusive to least inclusive, and the keys and the order of key-value pairs must be the same on all nodes. For example:

`--locality=region=east,datacenter=us-east-1`
`--locality=region=west,datacenter=us-west-1`

Locality is required to use the enterprise Table Partitioning and Node Map features as well as to benefit from CockroachDB's ability to dynamically optimize read latency for the location from which most of the workload is originating, also known as ["follow-the-workload"](demo-follow-the-workload.html). `--cache`
`--max-sql-memory` | Increases the node's cache and temporary SQL memory size to 25% of available system memory to improve read performance and increase capacity for in-memory SQL processing (see [Recommended Production Settings](recommended-production-settings.html) for more details). + `--join` | Identifies the address and port of 3-5 of the initial nodes of the cluster. `--background` | Starts the node in the background so you gain control of the terminal to issue more commands. For other flags not explicitly set, the command uses default values. For example, the node stores data in `--store=cockroach-data`, binds internal and client communication to `--port=26257`, and binds Admin UI HTTP requests to `--http-port=8080`. To set these options manually, see [Start a Node](start-a-node.html). diff --git a/v2.0/high-availability.md b/v2.0/fault-tolerance.md similarity index 96% rename from v2.0/high-availability.md rename to v2.0/fault-tolerance.md index d10636ca7f1..e7c55580025 100644 --- a/v2.0/high-availability.md +++ b/v2.0/fault-tolerance.md @@ -1,7 +1,8 @@ --- -title: High Availability +title: Fault Tolerance summary: CockroachDB is designed to survive software and hardware failures, from server restarts to datacenter outages. toc: false +redirect-from: high-availability.html --- CockroachDB is designed to survive software and hardware failures, from server restarts to datacenter outages. This is accomplished without confusing artifacts typical of other distributed systems (e.g., stale reads) using strongly-consistent replication as well as automated repair after failures. @@ -12,7 +13,7 @@ CockroachDB replicates your data for availability and guarantees consistency bet - Different servers within a rack to tolerate server failures - Different servers on different racks within a datacenter to tolerate rack power/network failures -- Different servers in different datacenters to tolerate large scale network or power outages +- Different servers in different datacenters to tolerate large scale network or power outages When replicating across datacenters, be aware that the round-trip latency between datacenters will have a direct effect on your database's performance. Latency in cross-continent clusters will be noticeably worse than in clusters where all nodes are geographically close together. diff --git a/v2.0/recommended-production-settings.md b/v2.0/recommended-production-settings.md index 4d87dc47e71..dd2468b5c63 100644 --- a/v2.0/recommended-production-settings.md +++ b/v2.0/recommended-production-settings.md @@ -10,19 +10,56 @@ This page provides important recommendations for production deployments of Cockr ## Cluster Topology +### Basic Recommendations + For a replicated cluster, each replica will be on a different node and a majority of replicas must remain available for the cluster to make progress. Therefore: - Use at least 3 nodes to ensure that a majority of replicas (2/3) remains available if a node fails. - Run each node on a separate machine. Since CockroachDB replicates across nodes, running more than one node per machine increases the risk of data loss if a machine fails. Likewise, if a machine has multiple disks or SSDs, run one node with multiple `--store` flags and not one node per disk. For more details about stores, see [Start a Node](start-a-node.html). -- Configurations with odd numbers of replicas are more robust than those with even numbers. Clusters of three and four nodes can each tolerate one node failure and still reach a majority (2/3 and 3/4 respectively), so the fourth replica doesn't add any extra fault-tolerance. To survive two simultaneous failures, you must have five replicas. +- Configurations with odd numbers of replicas are more robust than those with even numbers. Clusters of three and four nodes can each tolerate one node failure and still reach a majority (2/3 and 3/4 respectively), so the fourth replica doesn't add any extra fault-tolerance. To survive two simultaneous failures, you must have five replicas. For details about controlling the number and location of replicas, see [Configure Replication Zones](configure-replication-zones.html). + +- When starting each node, use the [`--locality`](start-a-node.html#locality) flag to describe the node's location, for example, `--locality=region=west,datacenter=us-west-1`. The key-value pairs should be ordered from most to least inclusive, and the keys and order of key-value pairs must be the same on all nodes. + - When you set locality in this way, the cluster will evenly replicate across localities by default and will benefit from CockroachDB's ability to dynamically optimize read latency for the location from which most of the workload is originating, also known as ["follow-the-workload"](demo-follow-the-workload.html). Locality is also a prerequisite for using the [table partitioning](partitioning.html) and [**Node Map**](enable-node-map.html) features. + +### Common Patterns + +#### Single Datacenter + +When deploying in a single datacenter: + +- Use at least 3 nodes to ensure that the cluster can tolerate the failure of any one node. + +- Although network latency between nodes should be minimal within a single datacenter (~1ms), it's recommended to set the `--locality` flag on each node to have the flexibility to customize [replication zones](configure-replication-zones.html) based on locality as you scale. + +#### Multiple Datacenters + +When deploying across multiple datacenters in one or more regions: -- When replicating across datacenters, it's recommended to specify which datacenter each node is in using the `--locality` flag to ensure even replication (see this [example](configure-replication-zones.html#even-replication-across-datacenters) for more details). If some of your datacenters are much farther apart than others, [specifying multiple levels of locality (such as country and region) is recommended](configure-replication-zones.html#descriptive-attributes-assigned-to-nodes). +- Use at least 3 datacenters to ensure that the cluster can tolerate the failure of 1 entire datacenter. -- When replicating across datacenters, be aware that the round trip latency between datacenters will have a direct effect on your database's performance, with cross-continent clusters performing noticeably worse than clusters in which all nodes are geographically close together. +- The round-trip latency between datacenters will have a direct effect on your cluster's performance, with cross-continent clusters performing noticeably worse than clusters in which all nodes are geographically close together. + - To optimize read latency for the location from which most of the workload is originating, also known as ["follow-the-workload"](demo-follow-the-workload.html), set the `--locality` flag when starting each node. When deploying across more than 3 datacenters, to ensure that all data benefits from "follow-the-workload", you must also increase your replication factor to match the total number of datacenters. + - To optimize read and write latency, consider using the enterprise [table partitioning](partitioning.html) feature. -For details about controlling the number and location of replicas, see [Configure Replication Zones](configure-replication-zones.html). +- If you expect region-specific concurrency and load characteristics, consider using different numbers and types of nodes per region. For example, in the following scenario, the Central region is closer to the West region than to the East region, which means that write latency would be optimized for workloads in the West and Central regions. + + ~~~ + A---C A---C + \ / \ / + B B + West---80m---East + \ / + 20ms 60ms + \ / + \ / + \/ + Central + A---C + \ / + B + ~~~ ## Hardware @@ -71,6 +108,23 @@ Cockroach Labs recommends the following cloud-specific configurations based on o For example, Cockroach Labs has used custom VMs (8 vCPUs and 16 GiB of RAM per VM) for internal testing. - **Do not** use `f1` or `g1` [shared-core machines](https://cloud.google.com/compute/docs/machine-types#sharedcore), which limit the load on a single core. +## Security + +An insecure cluster comes with serious risks: + +- Your cluster is open to any client that can access any node's IP addresses. +- Any user, even `root`, can log in without providing a password. +- Any user, connecting as `root`, can read or write any data in your cluster. +- There is no network encryption or authentication, and thus no confidentiality. + +Therefore, to deploy CockroachDB in production, it is strongly recommended to use TLS certificates to authenticate the identity of nodes and clients and to encrypt in-flight data between nodes and clients. You can use either the built-in [`cockroach cert` commands](create-security-certificates.html) or [`openssl` commands](create-security-certificates-openssl.html) to generate security certificates for your deployment. Regardless of which option you choose, you'll need the following files: + +- A certificate authority (CA) certificate and key, used to sign all of the other certificates. +- A separate certificate and key for each node in your deployment, with the common name `node`. +- A separate certificate and key for each client and user you want to connect to your nodes, with the common name set to the username. The default user is `root`. + + Alternatively, CockroachDB does [support password authentication](create-and-manage-users.html#secure-cluster), although we typically recommend using client certificates instead. + ## Load Balancing Each CockroachDB node is an equally suitable SQL gateway to a cluster, but to ensure client performance and reliability, it's important to use load balancing: