Skip to content

Commit

Permalink
Merge pull request #4490 from cockroachdb/slowing-deletes
Browse files Browse the repository at this point in the history
Added "Why are my deletes getting slower?" FAQ
  • Loading branch information
rmloveland authored Jun 20, 2019
2 parents 25380ac + 44e8f34 commit 2c386c1
Show file tree
Hide file tree
Showing 23 changed files with 89 additions and 29 deletions.
2 changes: 1 addition & 1 deletion _includes/v19.1/zone-configs/variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Variable | Description
------|------------
`range_min_bytes` | The minimum size, in bytes, for a range of data in the zone. When a range is less than this size, CockroachDB will merge it with an adjacent range.<br><br>**Default:** `1048576` (1MiB)
`range_max_bytes` | The maximum size, in bytes, for a range of data in the zone. When a range reaches this size, CockroachDB will spit it into two ranges.<br><br>**Default:** `67108864` (64MiB)
`gc.ttlseconds` | The number of seconds overwritten values will be retained before garbage collection. Smaller values can save disk space if values are frequently overwritten; larger values increase the range allowed for `AS OF SYSTEM TIME` queries, also know as [Time Travel Queries](select-clause.html#select-historical-data-time-travel).<br><br>It is not recommended to set this below `600` (10 minutes); doing so will cause problems for long-running queries. Also, since all versions of a row are stored in a single range that never splits, it is not recommended to set this so high that all the changes to a row in that time period could add up to more than 64MiB; such oversized ranges could contribute to the server running out of memory or other problems.<br><br>**Default:** `90000` (25 hours)
`gc.ttlseconds` | <a name="gc-ttlseconds"></a> The number of seconds overwritten values will be retained before garbage collection. Smaller values can save disk space if values are frequently overwritten; larger values increase the range allowed for `AS OF SYSTEM TIME` queries, also know as [Time Travel Queries](select-clause.html#select-historical-data-time-travel).<br><br>It is not recommended to set this below `600` (10 minutes); doing so will cause problems for long-running queries. Also, since all versions of a row are stored in a single range that never splits, it is not recommended to set this so high that all the changes to a row in that time period could add up to more than 64MiB; such oversized ranges could contribute to the server running out of memory or other problems.<br><br>**Default:** `90000` (25 hours)
`num_replicas` | The number of replicas in the zone.<br><br>**Default:** `3`<br><br>For the `system` database and `.meta`, `.liveness`, and `.system` ranges, the default value is `5`.
`constraints` | An array of required (`+`) and/or prohibited (`-`) constraints influencing the location of replicas. See [Types of Constraints](configure-replication-zones.html#types-of-constraints) and [Scope of Constraints](configure-replication-zones.html#scope-of-constraints) for more details.<br/><br/>To prevent hard-to-detect typos, constraints placed on [store attributes and node localities](configure-replication-zones.html#descriptive-attributes-assigned-to-nodes) must match the values passed to at least one node in the cluster. If not, an error is signalled.<br/><br/>**Default:** No constraints, with CockroachDB locating each replica on a unique node and attempting to spread replicas evenly across localities.
`lease_preferences` <a name="lease_preferences"></a> | An ordered list of required and/or prohibited constraints influencing the location of [leaseholders](architecture/overview.html#glossary). Whether each constraint is required or prohibited is expressed with a leading `+` or `-`, respectively. Note that lease preference constraints do not have to be shared with the `constraints` field. For example, it's valid for your configuration to define a `lease_preferences` field that does not reference any values from the `constraints` field. It's also valid to define a `lease_preferences` field with no `constraints` field at all. <br /><br /> If the first preference cannot be satisfied, CockroachDB will attempt to satisfy the second preference, and so on. If none of the preferences can be met, the lease will be placed using the default lease placement algorithm, which is to base lease placement decisions on how many leases each node already has, trying to make all the nodes have around the same amount.<br /><br />Each value in the list can include multiple constraints. For example, the list `[[+zone=us-east-1b, +ssd], [+zone=us-east-1a], [+zone=us-east-1c, +ssd]]` means "prefer nodes with an SSD in `us-east-1b`, then any nodes in `us-east-1a`, then nodes in `us-east-1c` with an SSD."<br /><br /> For a usage example, see [Constrain leaseholders to specific datacenters](configure-replication-zones.html#constrain-leaseholders-to-specific-datacenters).<br /><br />**Default**: No lease location preferences are applied if this field is not specified.
Expand Down
2 changes: 1 addition & 1 deletion _includes/v19.2/zone-configs/variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Variable | Description
------|------------
`range_min_bytes` | The minimum size, in bytes, for a range of data in the zone. When a range is less than this size, CockroachDB will merge it with an adjacent range.<br><br>**Default:** `1048576` (1MiB)
`range_max_bytes` | The maximum size, in bytes, for a range of data in the zone. When a range reaches this size, CockroachDB will spit it into two ranges.<br><br>**Default:** `67108864` (64MiB)
`gc.ttlseconds` | The number of seconds overwritten values will be retained before garbage collection. Smaller values can save disk space if values are frequently overwritten; larger values increase the range allowed for `AS OF SYSTEM TIME` queries, also know as [Time Travel Queries](select-clause.html#select-historical-data-time-travel).<br><br>It is not recommended to set this below `600` (10 minutes); doing so will cause problems for long-running queries. Also, since all versions of a row are stored in a single range that never splits, it is not recommended to set this so high that all the changes to a row in that time period could add up to more than 64MiB; such oversized ranges could contribute to the server running out of memory or other problems.<br><br>**Default:** `90000` (25 hours)
`gc.ttlseconds` | <a name="gc-ttlseconds"></a> The number of seconds overwritten values will be retained before garbage collection. Smaller values can save disk space if values are frequently overwritten; larger values increase the range allowed for `AS OF SYSTEM TIME` queries, also know as [Time Travel Queries](select-clause.html#select-historical-data-time-travel).<br><br>It is not recommended to set this below `600` (10 minutes); doing so will cause problems for long-running queries. Also, since all versions of a row are stored in a single range that never splits, it is not recommended to set this so high that all the changes to a row in that time period could add up to more than 64MiB; such oversized ranges could contribute to the server running out of memory or other problems.<br><br>**Default:** `90000` (25 hours)
`num_replicas` | The number of replicas in the zone.<br><br>**Default:** `3`<br><br>For the `system` database and `.meta`, `.liveness`, and `.system` ranges, the default value is `5`.
`constraints` | An array of required (`+`) and/or prohibited (`-`) constraints influencing the location of replicas. See [Types of Constraints](configure-replication-zones.html#types-of-constraints) and [Scope of Constraints](configure-replication-zones.html#scope-of-constraints) for more details.<br/><br/>To prevent hard-to-detect typos, constraints placed on [store attributes and node localities](configure-replication-zones.html#descriptive-attributes-assigned-to-nodes) must match the values passed to at least one node in the cluster. If not, an error is signalled.<br/><br/>**Default:** No constraints, with CockroachDB locating each replica on a unique node and attempting to spread replicas evenly across localities.
`lease_preferences` <a name="lease_preferences"></a> | An ordered list of required and/or prohibited constraints influencing the location of [leaseholders](architecture/overview.html#glossary). Whether each constraint is required or prohibited is expressed with a leading `+` or `-`, respectively. Note that lease preference constraints do not have to be shared with the `constraints` field. For example, it's valid for your configuration to define a `lease_preferences` field that does not reference any values from the `constraints` field. It's also valid to define a `lease_preferences` field with no `constraints` field at all. <br /><br /> If the first preference cannot be satisfied, CockroachDB will attempt to satisfy the second preference, and so on. If none of the preferences can be met, the lease will be placed using the default lease placement algorithm, which is to base lease placement decisions on how many leases each node already has, trying to make all the nodes have around the same amount.<br /><br />Each value in the list can include multiple constraints. For example, the list `[[+zone=us-east-1b, +ssd], [+zone=us-east-1a], [+zone=us-east-1c, +ssd]]` means "prefer nodes with an SSD in `us-east-1b`, then any nodes in `us-east-1a`, then nodes in `us-east-1c` with an SSD."<br /><br /> For a usage example, see [Constrain leaseholders to specific datacenters](configure-replication-zones.html#constrain-leaseholders-to-specific-datacenters).<br /><br />**Default**: No lease location preferences are applied if this field is not specified.
Expand Down
2 changes: 1 addition & 1 deletion _includes/v2.1/zone-configs/variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Variable | Description
------|------------
`range_min_bytes` | The minimum size, in bytes, for a range of data in the zone. When a range is less than this size, CockroachDB will merge it with an adjacent range.<br><br>**Default:** `1048576` (1MiB)
`range_max_bytes` | The maximum size, in bytes, for a range of data in the zone. When a range reaches this size, CockroachDB will spit it into two ranges.<br><br>**Default:** `67108864` (64MiB)
`gc.ttlseconds` | The number of seconds overwritten values will be retained before garbage collection. Smaller values can save disk space if values are frequently overwritten; larger values increase the range allowed for `AS OF SYSTEM TIME` queries, also know as [Time Travel Queries](select-clause.html#select-historical-data-time-travel).<br><br>It is not recommended to set this below `600` (10 minutes); doing so will cause problems for long-running queries. Also, since all versions of a row are stored in a single range that never splits, it is not recommended to set this so high that all the changes to a row in that time period could add up to more than 64MiB; such oversized ranges could contribute to the server running out of memory or other problems.<br><br>**Default:** `90000` (25 hours)
<a name="gc-ttlseconds"></a> `gc.ttlseconds` | The number of seconds overwritten values will be retained before garbage collection. Smaller values can save disk space if values are frequently overwritten; larger values increase the range allowed for `AS OF SYSTEM TIME` queries, also know as [Time Travel Queries](select-clause.html#select-historical-data-time-travel).<br><br>It is not recommended to set this below `600` (10 minutes); doing so will cause problems for long-running queries. Also, since all versions of a row are stored in a single range that never splits, it is not recommended to set this so high that all the changes to a row in that time period could add up to more than 64MiB; such oversized ranges could contribute to the server running out of memory or other problems.<br><br>**Default:** `90000` (25 hours)
`num_replicas` | The number of replicas in the zone.<br><br>**Default:** `3`<br><br>For the `system` database and `.meta`, `.liveness`, and `.system` ranges, the default value is `5`.
`constraints` | An array of required (`+`) and/or prohibited (`-`) constraints influencing the location of replicas. See [Types of Constraints](configure-replication-zones.html#types-of-constraints) and [Scope of Constraints](configure-replication-zones.html#scope-of-constraints) for more details.<br/><br/>To prevent hard-to-detect typos, constraints placed on [store attributes and node localities](configure-replication-zones.html#descriptive-attributes-assigned-to-nodes) must match the values passed to at least one node in the cluster. If not, an error is signalled.<br/><br/>**Default:** No constraints, with CockroachDB locating each replica on a unique node and attempting to spread replicas evenly across localities.
`lease_preferences` | An ordered list of required and/or prohibited constraints influencing the location of [leaseholders](architecture/overview.html#glossary). Whether each constraint is required or prohibited is expressed with a leading `+` or `-`, respectively. Note that lease preference constraints do not have to be shared with the `constraints` field. For example, it's valid for your configuration to define a `lease_preferences` field that does not reference any values from the `constraints` field. It's also valid to define a `lease_preferences` field with no `constraints` field at all. <br /><br /> If the first preference cannot be satisfied, CockroachDB will attempt to satisfy the second preference, and so on. If none of the preferences can be met, the lease will be placed using the default lease placement algorithm, which is to base lease placement decisions on how many leases each node already has, trying to make all the nodes have around the same amount.<br /><br />Each value in the list can include multiple constraints. For example, the list `[[+zone=us-east-1b, +ssd], [+zone=us-east-1a], [+zone=us-east-1c, +ssd]]` means "prefer nodes with an SSD in `us-east-1b`, then any nodes in `us-east-1a`, then nodes in `us-east-1c` with an SSD."<br /><br /> For a usage example, see [Constrain leaseholders to specific datacenters](configure-replication-zones.html#constrain-leaseholders-to-specific-datacenters).<br /><br />**Default**: No lease location preferences are applied if this field is not specified.
6 changes: 6 additions & 0 deletions v19.1/delete.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,12 @@ deleted rows more frequently.
For more information about ordering query results in general, see
[Ordering Query Results](query-order.html).

## Delete performance on large data sets

If you are deleting a large amount of data using iterative `DELETE ... LIMIT` statements, you are likely to see a drop in performance for each subsequent `DELETE` statement.

For an explanation of why this happens, and for instructions showing how to iteratively delete rows in constant time, see [Why are my deletes getting slower over time?](sql-faqs.html#why-are-my-deletes-getting-slower-over-time).

## Examples

### Delete all rows
Expand Down
8 changes: 2 additions & 6 deletions v19.1/frequently-asked-questions.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ The [CAP theorem](https://en.wikipedia.org/wiki/CAP_theorem) states that it is i
- Partition Tolerance

CockroachDB is a CP (consistent and partition tolerant) system. This means
that, in the presence of partitions, the system will become unavailable rather than do anything which might cause inconsistent results. For example, writes require acknowledgements from a majority of replicas, and reads require a lease, which can only be transferred to a different node when writes are possible.
that, in the presence of partitions, the system will become unavailable rather than do anything which might cause inconsistent results. For example, writes require acknowledgments from a majority of replicas, and reads require a lease, which can only be transferred to a different node when writes are possible.

Separately, CockroachDB is also Highly Available, although "available" here means something different than the way it is used in the CAP theorem. In the CAP theorem, availability is a binary property, but for High Availability, we talk about availability as a spectrum (using terms like "five nines" for a system that is available 99.999% of the time).

Expand Down Expand Up @@ -99,7 +99,7 @@ Yes. Every [transaction](transactions.html) in CockroachDB guarantees [ACID sema

No. CockroachDB was designed to work without atomic clocks or GPS clocks. It’s a database intended to be run on arbitrary collections of nodes, from physical servers in a corp development cluster to public cloud infrastructure using the flavor-of-the-month virtualization layer. It’d be a showstopper to require an external dependency on specialized hardware for clock synchronization. However, CockroachDB does require moderate levels of clock synchronization for correctness. If clocks drift past a maximum threshold, nodes will be taken offline. It's therefore highly recommended to run [NTP](http://www.ntp.org/) or other clock synchronization software on each node.

For more details on how CockroachDB handles unsychronized clocks, see [Clock Synchronization](recommended-production-settings.html#clock-synchronization). And for a broader discussion of clocks, and the differences between clocks in Spanner and CockroachDB, see [Living Without Atomic Clocks](https://www.cockroachlabs.com/blog/living-without-atomic-clocks/).
For more details on how CockroachDB handles unsynchronized clocks, see [Clock Synchronization](recommended-production-settings.html#clock-synchronization). And for a broader discussion of clocks, and the differences between clocks in Spanner and CockroachDB, see [Living Without Atomic Clocks](https://www.cockroachlabs.com/blog/living-without-atomic-clocks/).

## What languages can I use to work with CockroachDB?

Expand Down Expand Up @@ -160,10 +160,6 @@ Yes. The Managed CockroachDB offering is currently in Limited Availability and a

For more details, see the [Managed CockroachDB](../managed/v19.1/) docs.

## Can I use CockroachDB as a key-value store?

{% include {{ page.version.version }}/faq/simulate-key-value-store.html %}

## Have questions that weren’t answered?

- [CockroachDB Community Forum](https://forum.cockroachlabs.com): Ask questions, find answers, and help other users.
Expand Down
2 changes: 1 addition & 1 deletion v19.1/known-limitations.md
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,7 @@ It is currently not possible to [add a column](add-column.html) to a table when

When inserting/updating all columns of a table, and the table has no secondary indexes, we recommend using an [`UPSERT`](upsert.html) statement instead of the equivalent [`INSERT ON CONFLICT`](insert.html) statement. Whereas `INSERT ON CONFLICT` always performs a read to determine the necessary writes, the `UPSERT` statement writes without reading, making it faster.

This issue is particularly relevant when using a simple SQL table of two columns to [simulate direct KV access](frequently-asked-questions.html#can-i-use-cockroachdb-as-a-key-value-store). In this case, be sure to use the `UPSERT` statement.
This issue is particularly relevant when using a simple SQL table of two columns to [simulate direct KV access](sql-faqs.html#can-i-use-cockroachdb-as-a-key-value-store). In this case, be sure to use the `UPSERT` statement.

### Write and update limits for a single statement

Expand Down
4 changes: 4 additions & 0 deletions v19.1/performance-best-practices-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ To bulk-insert data into an existing table, batch multiple rows in one multi-row

To bulk-insert data into a brand new table, the [`IMPORT`](import.html) statement performs better than `INSERT`.

## Bulk deletion best practices

To get the best performance when deleting large amounts of data, follow the instructions in [Why are my deletes getting slower over time?](sql-faqs.html#why-are-my-deletes-getting-slower-over-time).

## Assign column families

A column family is a group of columns in a table that is stored as a single key-value pair in the underlying key-value store.
Expand Down
Loading

0 comments on commit 2c386c1

Please sign in to comment.