diff --git a/docs/reference/datatiers.asciidoc b/docs/reference/datatiers.asciidoc index 0981e80804383..c37f54b5c9cae 100644 --- a/docs/reference/datatiers.asciidoc +++ b/docs/reference/datatiers.asciidoc @@ -2,13 +2,24 @@ [[data-tiers]] == Data tiers -A _data tier_ is a collection of nodes with the same data role that -typically share the same hardware profile: +A _data tier_ is a collection of <> within a cluster that share the same +<>, and a hardware profile that's appropriately sized for the role. Elastic recommends that nodes in the same tier share the same +hardware profile to avoid <>. -* <> nodes handle the indexing and query load for content such as a product catalog. -* <> nodes handle the indexing load for time series data such as logs or metrics -and hold your most recent, most-frequently-accessed data. -* <> nodes hold time series data that is accessed less-frequently +The data tiers that you use, and the way that you use them, depends on the data's <>. + +The following data tiers are can be used with each data category: + +Content data: + +* <> nodes handle the indexing and query load for non-timeseries +indices, such as a product catalog. + +Time series data: + +* <> nodes handle the indexing load for time series data, +such as logs or metrics. They hold your most recent, most-frequently-accessed data. +* <> nodes hold time series data that is accessed less-frequently and rarely needs to be updated. * <> nodes hold time series data that is accessed infrequently and not normally updated. To save space, you can keep @@ -16,29 +27,40 @@ infrequently and not normally updated. To save space, you can keep <> on the cold tier. These fully mounted indices eliminate the need for replicas, reducing required disk space by approximately 50% compared to the regular indices. -* <> nodes hold time series data that is accessed +* <> nodes hold time series data that is accessed rarely and never updated. The frozen tier stores <> of <> exclusively. This extends the storage capacity even further — by up to 20 times compared to the warm tier. -TIP: The performance of an {es} node is often limited by the performance of the underlying storage. +TIP: The performance of an {es} node is often limited by the performance of the underlying storage and hardware profile. +For example hardware profiles, refer to Elastic Cloud's {cloud}/ec-reference-hardware.html[instance configurations]. Review our recommendations for optimizing your storage for <> and <>. IMPORTANT: {es} generally expects nodes within a data tier to share the same hardware profile. Variations not following this recommendation should be carefully architected to avoid <>. -When you index documents directly to a specific index, they remain on content tier nodes indefinitely. +The way data tiers are used often depends on the data's category: + +- Content data remains on the <> for its entire +data lifecycle. -When you index documents to a data stream, they initially reside on hot tier nodes. -You can configure <> ({ilm-init}) policies -to automatically transition your time series data through the hot, warm, and cold tiers -according to your performance, resiliency and data retention requirements. +- Time series data may progress through the +descending temperature data tiers (hot, warm, cold, and frozen) according to your +performance, resiliency, and data retention requirements. ++ +You can automate these lifecycle transitions using the <>, or custom <>. + +[discrete] +[[available-tier]] +=== Available data tiers + +Learn more about each data tier, including when and how it should be used. [discrete] [[content-tier]] -=== Content tier +==== Content tier // tag::content-tier[] Data stored in the content tier is generally a collection of items such as a product catalog or article archive. @@ -53,13 +75,14 @@ While they are also responsible for indexing, content data is generally not inge as time series data such as logs and metrics. From a resiliency perspective the indices in this tier should be configured to use one or more replicas. -The content tier is required. System indices and other indices that aren't part -of a data stream are automatically allocated to the content tier. +The content tier is required and is often deployed within the same node +grouping as the hot tier. System indices and other indices that aren't part +of a data stream are automatically allocated to the content tier. // end::content-tier[] [discrete] [[hot-tier]] -=== Hot tier +==== Hot tier // tag::hot-tier[] The hot tier is the {es} entry point for time series data and holds your most-recent, @@ -74,7 +97,7 @@ data stream>> are automatically allocated to the hot tier. [discrete] [[warm-tier]] -=== Warm tier +==== Warm tier // tag::warm-tier[] Time series data can move to the warm tier once it is being queried less frequently @@ -87,7 +110,7 @@ For resiliency, indices in the warm tier should be configured to use one or more [discrete] [[cold-tier]] -=== Cold tier +==== Cold tier // tag::cold-tier[] When you no longer need to search time series data regularly, it can move from @@ -109,7 +132,7 @@ but doesn't reduce required disk space compared to the warm tier. [discrete] [[frozen-tier]] -=== Frozen tier +==== Frozen tier // tag::frozen-tier[] Once data is no longer being queried, or being queried rarely, it may move from @@ -123,9 +146,15 @@ sometimes fetch frozen data from the snapshot repository, searches on the frozen tier are typically slower than on the cold tier. // end::frozen-tier[] +[discrete] +[[configure-data-tiers]] +=== Configure data tiers + +Follow the instructions for your deployment type to configure data tiers. + [discrete] [[configure-data-tiers-cloud]] -=== Configure data tiers on {ess} or {ece} +==== {ess} or {ece} The default configuration for an {ecloud} deployment includes a shared tier for hot and content data. This tier is required and can't be removed. @@ -159,7 +188,7 @@ tier]. [discrete] [[configure-data-tiers-on-premise]] -=== Configure data tiers for self-managed deployments +==== Self-managed deployments For self-managed deployments, each node's <> is configured in `elasticsearch.yml`. For example, the highest-performance nodes in a cluster @@ -177,25 +206,59 @@ tier. [[data-tier-allocation]] === Data tier index allocation -When you create an index, by default {es} sets -<> +The <> setting determines which tier the index should be allocated to. + +When you create an index, by default {es} sets the `_tier_preference` to `data_content` to automatically allocate the index shards to the content tier. When {es} creates an index as part of a <>, -by default {es} sets -<> +by default {es} sets the `_tier_preference` to `data_hot` to automatically allocate the index shards to the hot tier. -You can explicitly set `index.routing.allocation.include._tier_preference` -to opt out of the default tier-based allocation. +At the time of index creation, you can override the default setting by explicitly setting +the preferred value in one of two ways: + +- Using an <>. Refer to <> for details. +- Within the <> request body. + +You can override this +setting after index creation by <> to the preferred +value. + +This setting also accepts multiple tiers in order of preference. This prevents indices from remaining unallocated if no nodes are available in the preferred tier. For example, when {ilm} migrates an index to the cold phase, it sets the index `_tier_preference` to `data_cold,data_warm,data_hot`. + +To remove the data tier preference +setting, set the `_tier_preference` value to `null`. This allows the index to allocate to any data node within the cluster. Setting the `_tier_preference` to `null` does not restore the default value. Note that, in the case of managed indices, a <> action might apply a new value in its place. + +[discrete] +[[data-tier-allocation-value]] +==== Determine the current data tier preference + +You can check an existing index's data tier preference by <> for `index.routing.allocation.include._tier_preference`: + +[source,console] +-------------------------------------------------- +GET /my-index-000001/_settings?filter_path=*.settings.index.routing.allocation.include._tier_preference +-------------------------------------------------- +// TEST[setup:my_index] + +[discrete] +[[data-tier-allocation-troubleshooting]] +==== Troubleshooting + +The `_tier_preference` setting might conflict with other allocation settings. This conflict might prevent the shard from allocating. A conflict might occur when a cluster has not yet been completely <>. + +This setting will not unallocate a currently allocated shard, but might prevent it from migrating from its current location to its designated data tier. To troubleshoot, call the <> and specify the suspected problematic shard. [discrete] [[data-tier-migration]] -=== Automatic data tier migration +==== Automatic data tier migration {ilm-init} automatically transitions managed indices through the available data tiers using the <> action. By default, this action is automatically injected in every phase. -You can explicitly specify the migrate action with `"enabled": false` to disable automatic migration, +You can explicitly specify the migrate action with `"enabled": false` to <>, for example, if you're using the <> to manually specify allocation rules.