Skip to content

Commit

Permalink
Expand production recommendations
Browse files Browse the repository at this point in the history
- Update cluster topology to mention locality and cover common
  cluster patterns. Addresses part of #2411.
- Add a security section.
- Add locality to manual deployment tutorials.
  • Loading branch information
Jesse Seldess committed Apr 2, 2018
1 parent 77c1cc6 commit d1d6a55
Show file tree
Hide file tree
Showing 4 changed files with 69 additions and 10 deletions.
6 changes: 4 additions & 2 deletions _includes/prod_deployment/insecure-start-nodes.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,10 @@ For each initial node of your cluster, complete the following steps:
~~~
$ cockroach start --insecure \
--host=<node1 address> \
--join=<node1 address>:26257,<node2 address>:26257,<node3 address>:26257 \
--locality=<key-value pairs> \
--cache=25% \
--max-sql-memory=25% \
--join=<node1 address>:26257,<node2 address>:26257,<node3 address>:26257 \
--background
~~~

Expand All @@ -39,8 +40,9 @@ For each initial node of your cluster, complete the following steps:
-----|------------
`--insecure` | Indicates that the cluster is insecure, with no network encryption or authentication.
`--host` | Specifies the hostname or IP address to listen on for intra-cluster and client communication, as well as to identify the node in the Admin UI. If it is a hostname, it must be resolvable from all nodes, and if it is an IP address, it must be routable from all nodes.<br><br>If you want the node to listen on multiple interfaces, leave `--host` empty.<br><br>If you want the node to communicate with other nodes on an internal address (e.g., within a private network) while listening on all interfaces, leave `--host` empty and set the `--advertise-host` flag to the internal address.
`--join` | Identifies the address and port of 3-5 of the initial nodes of the cluster.
`--locality` | Key-value pairs that describe the location of the node, e.g., country, region, datacenter, rack, etc. The key-value pairs should be ordered from most inclusive to least inclusive, and the keys and the order of key-value pairs must be the same on all nodes. For example:<br><br>`--locality=region=east,datacenter=us-east-1`<br>`--locality=region=west,datacenter=us-west-1`<br><br>When there is high latency between nodes, CockroachDB uses locality to move range leases closer to the current workload, reducing network round trips and improving read performance, also known as ["follow-the-workload"](demo-follow-the-workload.html). Locality is also a prerequisite for using the [table partitioning](partitioning.html) and [**Node Map**](enable-node-map.html) enterprise features.
`--cache`<br>`--max-sql-memory` | Increases the node's cache and temporary SQL memory size to 25% of available system memory to improve read performance and increase capacity for in-memory SQL processing (see [Recommended Production Settings](recommended-production-settings.html) for more details).
`--join` | Identifies the address and port of 3-5 of the initial nodes of the cluster.
`--background` | Starts the node in the background so you gain control of the terminal to issue more commands.
For other flags not explicitly set, the command uses default values. For example, the node stores data in `--store=cockroach-data`, binds internal and client communication to `--port=26257`, and binds Admin UI HTTP requests to `--http-port=8080`. To set these options manually, see [Start a Node](start-a-node.html).
Expand Down
6 changes: 4 additions & 2 deletions _includes/prod_deployment/secure-start-nodes.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,10 @@ For each initial node of your cluster, complete the following steps:
$ cockroach start \
--certs-dir=certs \
--host=<node1 address> \
--join=<node1 address>:26257,<node2 address>:26257,<node3 address>:26257 \
--locality=<key-value pairs> \
--cache=25% \
--max-sql-memory=25% \
--join=<node1 address>:26257,<node2 address>:26257,<node3 address>:26257 \
--background
~~~

Expand All @@ -40,8 +41,9 @@ For each initial node of your cluster, complete the following steps:
-----|------------
`--certs-dir` | Specifies the directory where you placed the `ca.crt` file and the `node.crt` and `node.key` files for the node.
`--host` | Specifies the hostname or IP address to listen on for intra-cluster and client communication, as well as to identify the node in the Admin UI. If it is a hostname, it must be resolvable from all nodes, and if it is an IP address, it must be routable from all nodes.<br><br>If you want the node to listen on multiple interfaces, leave `--host` empty.<br><br>If you want the node to communicate with other nodes on an internal address (e.g., within a private network) while listening on all interfaces, leave `--host` empty and set the `--advertise-host` flag to the internal address.
`--join` | Identifies the address and port of 3-5 of the initial nodes of the cluster.
`--locality` | Key-value pairs that describe the location of the node, e.g., country, region, datacenter, rack, etc. The key-value pairs should be ordered from most inclusive to least inclusive, and the keys and the order of key-value pairs must be the same on all nodes. For example:<br><br>`--locality=region=east,datacenter=us-east-1`<br>`--locality=region=west,datacenter=us-west-1`<br><br>When there is high latency between nodes, CockroachDB uses locality to move range leases closer to the current workload, reducing network round trips and improving read performance, also known as ["follow-the-workload"](demo-follow-the-workload.html). Locality is also a prerequisite for using the [table partitioning](partitioning.html) and [**Node Map**](enable-node-map.html) enterprise features.
`--cache`<br>`--max-sql-memory` | Increases the node's cache and temporary SQL memory size to 25% of available system memory to improve read performance and increase capacity for in-memory SQL processing (see [Recommended Production Settings](recommended-production-settings.html) for more details).
`--join` | Identifies the address and port of 3-5 of the initial nodes of the cluster.
`--background` | Starts the node in the background so you gain control of the terminal to issue more commands.
For other flags not explicitly set, the command uses default values. For example, the node stores data in `--store=cockroach-data`, binds internal and client communication to `--port=26257`, and binds Admin UI HTTP requests to `--http-port=8080`. To set these options manually, see [Start a Node](start-a-node.html).
Expand Down
5 changes: 3 additions & 2 deletions v2.0/high-availability.md → v2.0/fault-tolerance.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
---
title: High Availability
title: Fault Tolerance
summary: CockroachDB is designed to survive software and hardware failures, from server restarts to datacenter outages.
toc: false
redirect-from: high-availability.html
---

CockroachDB is designed to survive software and hardware failures, from server restarts to datacenter outages. This is accomplished without confusing artifacts typical of other distributed systems (e.g., stale reads) using strongly-consistent replication as well as automated repair after failures.
Expand All @@ -12,7 +13,7 @@ CockroachDB replicates your data for availability and guarantees consistency bet

- Different servers within a rack to tolerate server failures
- Different servers on different racks within a datacenter to tolerate rack power/network failures
- Different servers in different datacenters to tolerate large scale network or power outages
- Different servers in different datacenters to tolerate large scale network or power outages

When replicating across datacenters, be aware that the round-trip latency between datacenters will have a direct effect on your database's performance. Latency in cross-continent clusters will be noticeably worse than in clusters where all nodes are geographically close together.

Expand Down
62 changes: 58 additions & 4 deletions v2.0/recommended-production-settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,56 @@ This page provides important recommendations for production deployments of Cockr

## Cluster Topology

### Basic Recommendations

For a replicated cluster, each replica will be on a different node and a majority of replicas must remain available for the cluster to make progress. Therefore:

- Use at least 3 nodes to ensure that a majority of replicas (2/3) remains available if a node fails.

- Run each node on a separate machine. Since CockroachDB replicates across nodes, running more than one node per machine increases the risk of data loss if a machine fails. Likewise, if a machine has multiple disks or SSDs, run one node with multiple `--store` flags and not one node per disk. For more details about stores, see [Start a Node](start-a-node.html).

- Configurations with odd numbers of replicas are more robust than those with even numbers. Clusters of three and four nodes can each tolerate one node failure and still reach a majority (2/3 and 3/4 respectively), so the fourth replica doesn't add any extra fault-tolerance. To survive two simultaneous failures, you must have five replicas.
- Configurations with odd numbers of replicas are more robust than those with even numbers. Clusters of three and four nodes can each tolerate one node failure and still reach a majority (2/3 and 3/4 respectively), so the fourth replica doesn't add any extra fault-tolerance. To survive two simultaneous failures, you must have five replicas. For details about controlling the number and location of replicas, see [Configure Replication Zones](configure-replication-zones.html).

- When starting each node, use the [`--locality`](start-a-node.html#locality) flag to describe the node's location, for example, `--locality=region=west,datacenter=us-west-1`. The key-value pairs should be ordered from most to least inclusive, and the keys and order of key-value pairs must be the same on all nodes.
- When there is high latency between nodes, CockroachDB uses locality to move range leases closer to the current workload, reducing network round trips and improving read performance, also known as ["follow-the-workload"](demo-follow-the-workload.html). Locality is also a prerequisite for using the [table partitioning](partitioning.html) and [**Node Map**](enable-node-map.html) enterprise features.

### Common Patterns

#### Single Datacenter

When deploying in a single datacenter:

- Use at least 3 nodes to ensure that the cluster can tolerate the failure of any one node.

- Although network latency between nodes should be minimal within a single datacenter (~1ms), it's recommended to set the `--locality` flag on each node to have the flexibility to customize [replication zones](configure-replication-zones.html) based on locality as you scale.

#### Multiple Datacenters

When deploying across multiple datacenters in one or more regions:

- When replicating across datacenters, it's recommended to specify which datacenter each node is in using the `--locality` flag to ensure even replication (see this [example](configure-replication-zones.html#even-replication-across-datacenters) for more details). If some of your datacenters are much farther apart than others, [specifying multiple levels of locality (such as country and region) is recommended](configure-replication-zones.html#descriptive-attributes-assigned-to-nodes).
- Use at least 3 datacenters to ensure that the cluster can tolerate the failure of 1 entire datacenter.

- When replicating across datacenters, be aware that the round trip latency between datacenters will have a direct effect on your database's performance, with cross-continent clusters performing noticeably worse than clusters in which all nodes are geographically close together.
- The round-trip latency between datacenters will have a direct effect on your cluster's performance, with cross-continent clusters performing noticeably worse than clusters in which all nodes are geographically close together.
- To optimize read latency for the location from which most of the workload is originating, also known as ["follow-the-workload"](demo-follow-the-workload.html), set the `--locality` flag when starting each node. When deploying across more than 3 datacenters, to ensure that all data benefits from "follow-the-workload", you must also increase your replication factor to match the total number of datacenters.
- To optimize read and write latency, consider using the enterprise [table partitioning](partitioning.html) feature.

For details about controlling the number and location of replicas, see [Configure Replication Zones](configure-replication-zones.html).
- If you expect region-specific concurrency and load characteristics, consider using different numbers and types of nodes per region. For example, in the following scenario, the Central region is closer to the West region than to the East region, which means that write latency would be optimized for workloads in the West and Central regions.

~~~
A---C A---C
\ / \ /
B B
West---80m---East
\ /
20ms 60ms
\ /
\ /
\/
Central
A---C
\ /
B
~~~
## Hardware
Expand Down Expand Up @@ -71,6 +108,23 @@ Cockroach Labs recommends the following cloud-specific configurations based on o
For example, Cockroach Labs has used custom VMs (8 vCPUs and 16 GiB of RAM per VM) for internal testing.
- **Do not** use `f1` or `g1` [shared-core machines](https://cloud.google.com/compute/docs/machine-types#sharedcore), which limit the load on a single core.
## Security
An insecure cluster comes with serious risks:
- Your cluster is open to any client that can access any node's IP addresses.
- Any user, even `root`, can log in without providing a password.
- Any user, connecting as `root`, can read or write any data in your cluster.
- There is no network encryption or authentication, and thus no confidentiality.
Therefore, to deploy CockroachDB in production, it is strongly recommended to use TLS certificates to authenticate the identity of nodes and clients and to encrypt in-flight data between nodes and clients. You can use either the built-in [`cockroach cert` commands](create-security-certificates.html) or [`openssl` commands](create-security-certificates-openssl.html) to generate security certificates for your deployment. Regardless of which option you choose, you'll need the following files:
- A certificate authority (CA) certificate and key, used to sign all of the other certificates.
- A separate certificate and key for each node in your deployment, with the common name `node`.
- A separate certificate and key for each client and user you want to connect to your nodes, with the common name set to the username. The default user is `root`.
Alternatively, CockroachDB does [support password authentication](create-and-manage-users.html#secure-cluster), although we typically recommend using client certificates instead.
## Load Balancing
Each CockroachDB node is an equally suitable SQL gateway to a cluster, but to ensure client performance and reliability, it's important to use load balancing:
Expand Down

0 comments on commit d1d6a55

Please sign in to comment.