diff --git a/README.md b/README.md index 774a4fe2..d220a274 100644 --- a/README.md +++ b/README.md @@ -19,7 +19,7 @@ different retention strategies. The application manages the life-cycle of data stored in CrateDB, handling concerns of data expiry, size reduction, and archival. Within a system storing and processing large amounts of data, it is crucial to manage data flows between -hot and cold storage types better than using ad hoc solutions. +"hot", "warm", and "cold" storage types better than using ad hoc solutions. Data retention policies can be flexibly configured by adding records to the retention policy database table, which is also stored within CrateDB. @@ -92,22 +92,22 @@ This retention policy implements the following directive. ### REALLOCATE -A retention policy algorithm that reallocates expired partitions from hot nodes -to cold nodes. +A retention policy algorithm that reallocates expired partitions from "hot" nodes +to "warm" nodes. Because each cluster member is assigned a designated node type by using the -`-Cnode.attr.storage=hot|cold` parameter, this strategy is only applicable in +`-Cnode.attr.storage=hot|warm` parameter, this strategy is only applicable in cluster/multi-node scenarios. On the data expiration run, corresponding partitions will get physically moved to -cluster nodes of the `cold` type, which are mostly designated archive nodes, with +cluster nodes of the `warm` type, which are mostly designated archive nodes, with large amounts of storage space. ```shell cratedb-retention create-policy --strategy=reallocate \ --table-schema=doc --table-name=raw_metrics \ --partition-column=ts_day --retention-period=60 \ - --reallocation-attribute-name=storage --reallocation-attribute-value=cold \ + --reallocation-attribute-name=storage --reallocation-attribute-value=warm \ "${CRATEDB_URI}" ``` @@ -115,7 +115,7 @@ This retention policy implements the following directive. > **Reallocate** data from the `"doc"."raw_metrics"` table, on partitions defined by > the column `ts_day`, which is older than **60** days at the given cut-off date, to -> nodes tagged with the `storage=cold` attribute. +> nodes tagged with the `storage=warm` attribute. [implementation](cratedb_retention/strategy/reallocate.py) | [tutorial](https://community.crate.io/t/cratedb-and-apache-airflow-building-a-hot-cold-storage-data-retention-policy/934) @@ -178,7 +178,7 @@ CREATE TABLE IF NOT EXISTS "ext"."retention_policy" ( -- Target: Where data is moved/relocated to. -- Targeting specific nodes. - -- You may want to designate dedicated nodes to be responsible for hot or cold storage types. + -- You may want to designate dedicated nodes to be responsible for "hot" or "warm" storage types. -- To do that, you can assign attributes to specific nodes, effectively tagging them. -- https://crate.io/docs/crate/reference/en/latest/config/node.html#custom-attributes "reallocation_attribute_name" TEXT, -- Name of the node-specific custom attribute. diff --git a/cratedb_retention/setup/schema.sql b/cratedb_retention/setup/schema.sql index b447157a..87c17773 100644 --- a/cratedb_retention/setup/schema.sql +++ b/cratedb_retention/setup/schema.sql @@ -21,14 +21,14 @@ CREATE TABLE IF NOT EXISTS {policy_table.fullname} ( -- Target: Where data is moved/relocated to. -- Targeting specific nodes. - -- You may want to designate dedicated nodes to be responsible for hot or cold storage types. + -- You may want to designate dedicated nodes to be responsible for "hot" or "warm" storage types. -- To do that, you can assign attributes to specific nodes, effectively tagging them. -- https://crate.io/docs/crate/reference/en/latest/config/node.html#custom-attributes "reallocation_attribute_name" TEXT, -- Name of the node-specific custom attribute. "reallocation_attribute_value" TEXT, -- Value of the node-specific custom attribute. -- Targeting a repository. - "target_repository_name" TEXT -- The name of a repository created with `CREATE REPOSITORY ...`. + "target_repository_name" TEXT -- The name of a repository created with "CREATE REPOSITORY ...". ) CLUSTERED INTO 1 SHARDS; diff --git a/cratedb_retention/strategy/reallocate.py b/cratedb_retention/strategy/reallocate.py index cc35bc9a..d5a26e30 100644 --- a/cratedb_retention/strategy/reallocate.py +++ b/cratedb_retention/strategy/reallocate.py @@ -1,9 +1,10 @@ # Copyright (c) 2021-2023, Crate.io Inc. # Distributed under the terms of the AGPLv3 license, see LICENSE. """ -Implements a retention policy by reallocating cold partitions +A retention policy implementation which reallocates data to "warm" partitions. -A detailed tutorial is available at https://community.crate.io/t/cratedb-and-apache-airflow-building-a-hot-cold-storage-data-retention-policy/934 +It is derived from a corresponding tutorial based on Apache Airflow. +https://community.crate.io/t/cratedb-and-apache-airflow-building-a-hot-cold-storage-data-retention-policy/934 Prerequisites ------------- diff --git a/tests/conftest.py b/tests/conftest.py index 83d418bf..30ec3470 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -141,7 +141,7 @@ def policies(cratedb, settings, store): partition_column="ts_day", retention_period=60, reallocation_attribute_name="storage", - reallocation_attribute_value="cold", + reallocation_attribute_value="warm", ), # Retention policy rule for the SNAPSHOT strategy. RetentionPolicy(