Skip to content

Commit

Permalink
Document how to use replication reports
Browse files Browse the repository at this point in the history
Specifically, add docs for the new system tables that provide insight
into:

- Whether data is under-replicated or unavailable

- Violations of data placement requirements

- Which localities are critical

Fixes #4730.

Addresses #2957, #5636.

Relates to #4459.
  • Loading branch information
rmloveland committed Nov 4, 2019
1 parent 44f9ff9 commit e70ec20
Show file tree
Hide file tree
Showing 9 changed files with 541 additions and 9 deletions.
6 changes: 6 additions & 0 deletions _includes/sidebar-data-v19.2.json
Original file line number Diff line number Diff line change
Expand Up @@ -494,6 +494,12 @@
"urls": [
"/${VERSION}/rotate-certificates.html"
]
},
{
"title": "Run Replication Reports",
"urls": [
"/${VERSION}/run-replication-reports.html"
]
}
]
},
Expand Down
2 changes: 1 addition & 1 deletion _includes/v19.2/misc/basic-terms.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Term | Definition
-----|------------
**Cluster** | Your CockroachDB deployment, which acts as a single logical application.
**Node** | An individual machine running CockroachDB. Many nodes join together to create your cluster.
**Range** | CockroachDB stores all user data (tables, indexes, etc.) and almost all system data in a giant sorted map of key-value pairs. This keyspace is divided into "ranges", contiguous chunks of the keyspace, so that every key can always be found in a single range.<br><br>From a SQL perspective, a table and its secondary indexes initially map to a single range, where each key-value pair in the range represents a single row in the table (also called the primary index because the table is sorted by the primary key) or a single row in a secondary index. As soon as that range reaches 64 MiB in size, it splits into two ranges. This process continues for these new ranges as the table and its indexes continue growing.
**Range** | <a name="architecture-range"></a> CockroachDB stores all user data (tables, indexes, etc.) and almost all system data in a giant sorted map of key-value pairs. This keyspace is divided into "ranges", contiguous chunks of the keyspace, so that every key can always be found in a single range.<br><br>From a SQL perspective, a table and its secondary indexes initially map to a single range, where each key-value pair in the range represents a single row in the table (also called the primary index because the table is sorted by the primary key) or a single row in a secondary index. As soon as that range reaches 64 MiB in size, it splits into two ranges. This process continues for these new ranges as the table and its indexes continue growing.
**Replica** | CockroachDB replicates each range (3 times by default) and stores each replica on a different node.
**Leaseholder** | For each range, one of the replicas holds the "range lease". This replica, referred to as the "leaseholder", is the one that receives and coordinates all read and write requests for the range.<br><br>Unlike writes, read requests access the leaseholder and send the results to the client without needing to coordinate with any of the other range replicas. This reduces the network round trips involved and is possible because the leaseholder is guaranteed to be up-to-date due to the fact that all write requests also go to the leaseholder.
**Raft Leader** | For each range, one of the replicas is the "leader" for write requests. Via the [Raft consensus protocol](https://www.cockroachlabs.com/docs/{{ page.version.version }}/architecture/replication-layer.html#raft), this replica ensures that a majority of replicas (the leader and enough followers) agree, based on their Raft logs, before committing the write. The Raft leader is almost always the same replica as the leaseholder.
Expand Down
2 changes: 1 addition & 1 deletion _includes/v19.2/sql/settings/settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
<tr><td><code>kv.range_split.load_qps_threshold</code></td><td>integer</td><td><code>2500</code></td><td>the QPS over which, the range becomes a candidate for load based splitting</td></tr>
<tr><td><code>kv.rangefeed.concurrent_catchup_iterators</code></td><td>integer</td><td><code>64</code></td><td>number of rangefeeds catchup iterators a store will allow concurrently before queueing</td></tr>
<tr><td><code>kv.rangefeed.enabled</code></td><td>boolean</td><td><code>false</code></td><td>if set, rangefeed registration is enabled</td></tr>
<tr><td><code>kv.replication_reports.interval</code></td><td>duration</td><td><code>1m0s</code></td><td>the frequency for generating the replication_constraint_stats, replication_stats_report and replication_critical_localities reports (set to 0 to disable)</td></tr>
<tr><td><a name="kv-replication-reports-interval"></a><code>kv.replication_reports.interval</code></td><td>duration</td><td><code>1m0s</code></td><td>the frequency for generating the replication_constraint_stats, replication_stats_report and replication_critical_localities reports (set to 0 to disable)</td></tr>
<tr><td><code>kv.snapshot_rebalance.max_rate</code></td><td>byte size</td><td><code>8.0 MiB</code></td><td>the rate limit (bytes/sec) to use for rebalance and upreplication snapshots</td></tr>
<tr><td><code>kv.snapshot_recovery.max_rate</code></td><td>byte size</td><td><code>8.0 MiB</code></td><td>the rate limit (bytes/sec) to use for recovery snapshots</td></tr>
<tr><td><code>kv.snapshot_sst.sync_size</code></td><td>byte size</td><td><code>2.0 MiB</code></td><td>threshold after which snapshot SST writes must fsync</td></tr>
Expand Down
2 changes: 1 addition & 1 deletion _includes/v19.2/zone-configs/variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Variable | Description
`range_min_bytes` | The minimum size, in bytes, for a range of data in the zone. When a range is less than this size, CockroachDB will merge it with an adjacent range.<br><br>**Default:** `16777216` (16MiB)
`range_max_bytes` | The maximum size, in bytes, for a range of data in the zone. When a range reaches this size, CockroachDB will spit it into two ranges.<br><br>**Default:** `67108864` (64MiB)
`gc.ttlseconds` | <a name="gc-ttlseconds"></a> The number of seconds overwritten values will be retained before garbage collection. Smaller values can save disk space if values are frequently overwritten; larger values increase the range allowed for `AS OF SYSTEM TIME` queries, also know as [Time Travel Queries](select-clause.html#select-historical-data-time-travel).<br><br>It is not recommended to set this below `600` (10 minutes); doing so will cause problems for long-running queries. Also, since all versions of a row are stored in a single range that never splits, it is not recommended to set this so high that all the changes to a row in that time period could add up to more than 64MiB; such oversized ranges could contribute to the server running out of memory or other problems.<br><br>**Default:** `90000` (25 hours)
`num_replicas` | The number of replicas in the zone.<br><br>**Default:** `3`<br><br>For the `system` database and `.meta`, `.liveness`, and `.system` ranges, the default value is `5`.
`num_replicas` | <a name="num_replicas"></a> The number of replicas in the zone.<br><br>**Default:** `3`<br><br>For the `system` database and `.meta`, `.liveness`, and `.system` ranges, the default value is `5`.
`constraints` | An array of required (`+`) and/or prohibited (`-`) constraints influencing the location of replicas. See [Types of Constraints](configure-replication-zones.html#types-of-constraints) and [Scope of Constraints](configure-replication-zones.html#scope-of-constraints) for more details.<br/><br/>To prevent hard-to-detect typos, constraints placed on [store attributes and node localities](configure-replication-zones.html#descriptive-attributes-assigned-to-nodes) must match the values passed to at least one node in the cluster. If not, an error is signalled.<br/><br/>**Default:** No constraints, with CockroachDB locating each replica on a unique node and attempting to spread replicas evenly across localities.
`lease_preferences` <a name="lease_preferences"></a> | An ordered list of required and/or prohibited constraints influencing the location of [leaseholders](architecture/overview.html#glossary). Whether each constraint is required or prohibited is expressed with a leading `+` or `-`, respectively. Note that lease preference constraints do not have to be shared with the `constraints` field. For example, it's valid for your configuration to define a `lease_preferences` field that does not reference any values from the `constraints` field. It's also valid to define a `lease_preferences` field with no `constraints` field at all. <br /><br /> If the first preference cannot be satisfied, CockroachDB will attempt to satisfy the second preference, and so on. If none of the preferences can be met, the lease will be placed using the default lease placement algorithm, which is to base lease placement decisions on how many leases each node already has, trying to make all the nodes have around the same amount.<br /><br />Each value in the list can include multiple constraints. For example, the list `[[+zone=us-east-1b, +ssd], [+zone=us-east-1a], [+zone=us-east-1c, +ssd]]` means "prefer nodes with an SSD in `us-east-1b`, then any nodes in `us-east-1a`, then nodes in `us-east-1c` with an SSD."<br /><br /> For a usage example, see [Constrain leaseholders to specific datacenters](configure-replication-zones.html#constrain-leaseholders-to-specific-datacenters).<br /><br />**Default**: No lease location preferences are applied if this field is not specified.

Expand Down
4 changes: 2 additions & 2 deletions v19.2/admin-ui-replication-dashboard.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ The **Replication** dashboard in the CockroachDB Admin UI enables you to monitor
- **Range**: CockroachDB stores all user data and almost all system data in a giant sorted map of key-value pairs. This keyspace is divided into "ranges", contiguous chunks of the keyspace, so that every key can always be found in a single range.
- **Range Replica:** CockroachDB replicates each range (3 times by default) and stores each replica on a different node.
- **Range Lease:** For each range, one of the replicas holds the "range lease". This replica, referred to as the "leaseholder", is the one that receives and coordinates all read and write requests for the range.
- **Under-replicated Ranges:** When a cluster is first initialized, the few default starting ranges will only have a single replica, but as soon as other nodes are available, they will replicate to them until they've reached their desired replication factor, the default being 3. If a range does not have enough replicas, the range is said to be "under-replicated".
- **Unavailable Ranges:** If a majority of a range's replicas are on nodes that are unavailable, then the entire range is unavailable and will be unable to process queries.
- **Under-replicated Ranges:** <a name="under-replicated-ranges"></a> When a cluster is first initialized, the few default starting ranges will only have a single replica, but as soon as other nodes are available, they will replicate to them until they've reached their desired replication factor, the default being 3. If a range does not have enough replicas, the range is said to be "under-replicated".
- **Unavailable Ranges:** <a name="unavailable-ranges"></a> If a majority of a range's replicas are on nodes that are unavailable, then the entire range is unavailable and will be unable to process queries.

For more details, see [Scalable SQL Made Easy: How CockroachDB Automates Operations](https://www.cockroachlabs.com/blog/automated-rebalance-and-repair/)

Expand Down
2 changes: 1 addition & 1 deletion v19.2/architecture/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ CockroachDB heavily relies on the following concepts, so being familiar with the
Term | Definition
-----|-----------
**Consistency** | CockroachDB uses "consistency" in both the sense of [ACID semantics](https://en.wikipedia.org/wiki/Consistency_(database_systems)) and the [CAP theorem](https://en.wikipedia.org/wiki/CAP_theorem), albeit less formally than either definition. What we try to express with this term is that your data should be anomaly-free.
**Consensus** | When a range receives a write, a quorum of nodes containing replicas of the range acknowledge the write. This means your data is safely stored and a majority of nodes agree on the database's current state, even if some of the nodes are offline.<br/><br/>When a write *doesn't* achieve consensus, forward progress halts to maintain consistency within the cluster.
**Consensus** | <a name="architecture-overview-consensus"></a> When a range receives a write, a quorum of nodes containing replicas of the range acknowledge the write. This means your data is safely stored and a majority of nodes agree on the database's current state, even if some of the nodes are offline.<br/><br/>When a write *doesn't* achieve consensus, forward progress halts to maintain consistency within the cluster.
**Replication** | Replication involves creating and distributing copies of data, as well as ensuring copies remain consistent. However, there are multiple types of replication: namely, synchronous and asynchronous.<br/><br/>Synchronous replication requires all writes to propagate to a quorum of copies of the data before being considered committed. To ensure consistency with your data, this is the kind of replication CockroachDB uses.<br/><br/>Asynchronous replication only requires a single node to receive the write to be considered committed; it's propagated to each copy of the data after the fact. This is more or less equivalent to "eventual consistency", which was popularized by NoSQL databases. This method of replication is likely to cause anomalies and loss of data.
**Transactions** | A set of operations performed on your database that satisfy the requirements of [ACID semantics](https://en.wikipedia.org/wiki/Database_transaction). This is a crucial component for a consistent system to ensure developers can trust the data in their database.
**Multi-Active Availability** | Our consensus-based notion of high availability that lets each node in the cluster handle reads and writes for a subset of the stored data (on a per-range basis). This is in contrast to active-passive replication, in which the active node receives 100% of request traffic, as well as active-active replication, in which all nodes accept requests but typically cannot guarantee that reads are both up-to-date and fast.
Expand Down
2 changes: 1 addition & 1 deletion v19.2/configure-replication-zones.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ When starting a node with the [`cockroach start`](start-a-node.html) command, yo

Attribute Type | Description
---------------|------------
**Node Locality** | Using the [`--locality`](start-a-node.html#locality) flag, you can assign arbitrary key-value pairs that describe the locality of the node. Locality might include country, region, datacenter, rack, etc. The key-value pairs should be ordered from most inclusive to least inclusive (e.g., country before datacenter before rack), and the keys and the order of key-value pairs must be the same on all nodes. It's typically better to include more pairs than fewer. For example:<br><br>`--locality=region=east,datacenter=us-east-1`<br>`--locality=region=east,datacenter=us-east-2`<br>`--locality=region=west,datacenter=us-west-1`<br><br>CockroachDB attempts to spread replicas evenly across the cluster based on locality, with the order determining the priority. However, locality can be used to influence the location of data replicas in various ways using replication zones.<br><br>When there is high latency between nodes, CockroachDB also uses locality to move range leases closer to the current workload, reducing network round trips and improving read performance. See [Follow-the-workload](demo-follow-the-workload.html) for more details.
**Node Locality** | <a name="zone-config-node-locality"></a> Using the [`--locality`](start-a-node.html#locality) flag, you can assign arbitrary key-value pairs that describe the location of the node. Locality might include region, country, datacenter, rack, etc. The key-value pairs should be ordered into _locality tiers_ that range from most inclusive to least inclusive (e.g., region before datacenter as in `region=eu,dc=paris`), and the keys and the order of key-value pairs must be the same on all nodes. It's typically better to include more pairs than fewer. For example:<br><br>`--locality=region=east,datacenter=us-east-1`<br>`--locality=region=east,datacenter=us-east-2`<br>`--locality=region=west,datacenter=us-west-1`<br><br>CockroachDB attempts to spread replicas evenly across the cluster based on locality, with the order of locality tiers determining the priority. Locality can also be used to influence the location of data replicas in various ways using replication zones.<br><br>When there is high latency between nodes, CockroachDB uses locality to move range leases closer to the current workload, reducing network round trips and improving read performance. See [Follow-the-workload](demo-follow-the-workload.html) for more details.
**Node Capability** | Using the `--attrs` flag, you can specify node capability, which might include specialized hardware or number of cores, for example:<br><br>`--attrs=ram:64gb`
**Store Type/Capability** | Using the `attrs` field of the `--store` flag, you can specify disk type or capability, for example:<br><br>`--store=path=/mnt/ssd01,attrs=ssd`<br>`--store=path=/mnt/hda1,attrs=hdd:7200rpm`

Expand Down
Loading

0 comments on commit e70ec20

Please sign in to comment.