Skip to content

Commit

Permalink
Naming things: Use "hot" vs. "warm" labels for the "REALLOCATE" strategy
Browse files Browse the repository at this point in the history
Previously, they have been called "hot" vs. "cold".
  • Loading branch information
amotl committed Jul 7, 2023
1 parent ccb0891 commit a7d9865
Show file tree
Hide file tree
Showing 4 changed files with 14 additions and 13 deletions.
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ different retention strategies.
The application manages the life-cycle of data stored in CrateDB, handling
concerns of data expiry, size reduction, and archival. Within a system storing
and processing large amounts of data, it is crucial to manage data flows between
hot and cold storage types better than using ad hoc solutions.
"hot", "warm", and "cold" storage types better than using ad hoc solutions.

Data retention policies can be flexibly configured by adding records to the
retention policy database table, which is also stored within CrateDB.
Expand Down Expand Up @@ -92,30 +92,30 @@ This retention policy implements the following directive.

### REALLOCATE

A retention policy algorithm that reallocates expired partitions from hot nodes
to cold nodes.
A retention policy algorithm that reallocates expired partitions from "hot" nodes
to "warm" nodes.

Because each cluster member is assigned a designated node type by using the
`-Cnode.attr.storage=hot|cold` parameter, this strategy is only applicable in
`-Cnode.attr.storage=hot|warm` parameter, this strategy is only applicable in
cluster/multi-node scenarios.

On the data expiration run, corresponding partitions will get physically moved to
cluster nodes of the `cold` type, which are mostly designated archive nodes, with
cluster nodes of the `warm` type, which are mostly designated archive nodes, with
large amounts of storage space.

```shell
cratedb-retention create-policy --strategy=reallocate \
--table-schema=doc --table-name=raw_metrics \
--partition-column=ts_day --retention-period=60 \
--reallocation-attribute-name=storage --reallocation-attribute-value=cold \
--reallocation-attribute-name=storage --reallocation-attribute-value=warm \
"${CRATEDB_URI}"
```

This retention policy implements the following directive.

> **Reallocate** data from the `"doc"."raw_metrics"` table, on partitions defined by
> the column `ts_day`, which is older than **60** days at the given cut-off date, to
> nodes tagged with the `storage=cold` attribute.
> nodes tagged with the `storage=warm` attribute.
[implementation](cratedb_retention/strategy/reallocate.py) | [tutorial](https://community.crate.io/t/cratedb-and-apache-airflow-building-a-hot-cold-storage-data-retention-policy/934)

Expand Down Expand Up @@ -178,7 +178,7 @@ CREATE TABLE IF NOT EXISTS "ext"."retention_policy" (
-- Target: Where data is moved/relocated to.

-- Targeting specific nodes.
-- You may want to designate dedicated nodes to be responsible for hot or cold storage types.
-- You may want to designate dedicated nodes to be responsible for "hot" or "warm" storage types.
-- To do that, you can assign attributes to specific nodes, effectively tagging them.
-- https://crate.io/docs/crate/reference/en/latest/config/node.html#custom-attributes
"reallocation_attribute_name" TEXT, -- Name of the node-specific custom attribute.
Expand Down
4 changes: 2 additions & 2 deletions cratedb_retention/setup/schema.sql
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,14 @@ CREATE TABLE IF NOT EXISTS {policy_table.fullname} (
-- Target: Where data is moved/relocated to.

-- Targeting specific nodes.
-- You may want to designate dedicated nodes to be responsible for hot or cold storage types.
-- You may want to designate dedicated nodes to be responsible for "hot" or "warm" storage types.
-- To do that, you can assign attributes to specific nodes, effectively tagging them.
-- https://crate.io/docs/crate/reference/en/latest/config/node.html#custom-attributes
"reallocation_attribute_name" TEXT, -- Name of the node-specific custom attribute.
"reallocation_attribute_value" TEXT, -- Value of the node-specific custom attribute.

-- Targeting a repository.
"target_repository_name" TEXT -- The name of a repository created with `CREATE REPOSITORY ...`.
"target_repository_name" TEXT -- The name of a repository created with "CREATE REPOSITORY ...".

)
CLUSTERED INTO 1 SHARDS;
5 changes: 3 additions & 2 deletions cratedb_retention/strategy/reallocate.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# Copyright (c) 2021-2023, Crate.io Inc.
# Distributed under the terms of the AGPLv3 license, see LICENSE.
"""
Implements a retention policy by reallocating cold partitions
A retention policy implementation which reallocates data to "warm" partitions.
A detailed tutorial is available at https://community.crate.io/t/cratedb-and-apache-airflow-building-a-hot-cold-storage-data-retention-policy/934
It is derived from a corresponding tutorial based on Apache Airflow.
https://community.crate.io/t/cratedb-and-apache-airflow-building-a-hot-cold-storage-data-retention-policy/934
Prerequisites
-------------
Expand Down
2 changes: 1 addition & 1 deletion tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ def policies(cratedb, settings, store):
partition_column="ts_day",
retention_period=60,
reallocation_attribute_name="storage",
reallocation_attribute_value="cold",
reallocation_attribute_value="warm",
),
# Retention policy rule for the SNAPSHOT strategy.
RetentionPolicy(
Expand Down

0 comments on commit a7d9865

Please sign in to comment.