Reorg tiering policy sections into manage tiering (#3524)

timescale · Nov 21, 2024 · 5ee8610 · 5ee8610
1 parent 362fd93
commit 5ee8610
Show file tree

Hide file tree

Showing 13 changed files with 396 additions and 753 deletions.
diff --git a/_troubleshooting/slow-tiering-chunks.md b/_troubleshooting/slow-tiering-chunks.md
@@ -4,7 +4,7 @@ section: troubleshooting
 products: [cloud]
 topics: [data tiering]
 keywords: [tiered storage]
-tags: [tiered storage]
+tags: [tiered storage]  
 ---
 
 

diff --git a/about/changelog.md b/about/changelog.md
@@ -119,7 +119,7 @@ SELECT * FROM  hypertable WHERE timestamp_col > now() - '100 days'::interval
 
 For more info on queries with immutable/stable/volatile filters, check our blog post on [Implementing constraint exclusion for faster query performance](https://www.timescale.com/blog/implementing-constraint-exclusion-for-faster-query-performance/).
 
-If you no longer want to use tiered storage for a particular hypertable, you can now disable tiering and drop the associated tiering metadata on the hypertable with a call to [disable_tiering function](https://docs.timescale.com/use-timescale/latest/data-tiering/disabling-data-tiering/). 
+If you no longer want to use tiered storage for a particular hypertable, you can now disable tiering and drop the associated tiering metadata on the hypertable with a call to [disable_tiering function](https://docs.timescale.com/use-timescale/latest/data-tiering/enabling-data-tiering/#disable-tiering). 
 
 ### Chunk interval recommendations
 Timescale Console now shows recommendations for services with too many small chunks in their hypertables. 

diff --git a/use-timescale/data-tiering/about-data-tiering.md b/use-timescale/data-tiering/about-data-tiering.md
@@ -11,51 +11,77 @@ cloud_ui:
 
 # About the object storage tier
 
-The tiered storage architecture complements Timescale's standard high-performance storage tier with a low-cost object storage tier.
+Timescale's tiered storage architecture includes a standard high-performance storage tier and a low-cost object storage tier built on Amazon S3. You can use the standard tier for data that requires quick access, and the object tier for rarely used historical data. Chunks from a single hypertable, including compressed chunks, can stretch across these two storage tiers. A compressed chunk uses a different storage representation after tiering.
 
-You can move your hypertable data across the different storage tiers to get the best price performance.
-You can use the standard high-performance storage tier for data that requires quick access,
-and the low-cost object storage tier for rarely used historical data. 
-Regardless of where your data is stored, you can still query it with
-[standard SQL][querying-tiered-data].  
-Because it's queried normally with SQL, you can still JOIN against tiered data, 
-build views on tiered data, and even define continuous aggregates on tiered data.
-In fact, because the implementation of continuous aggregates also use hypertables, 
-they can be tiered to low-cost storage as well.
+In the high-performance storage, chunks are stored in the block format. In the object storage, they are stored in a compressed, columnar format. For better interoperability across various platforms, this format is different from that of the internals of the database. It allows for more efficient columnar scans across longer time periods, and Timescale Cloud uses other metadata and query optimizations to reduce the amount of data that needs to be fetched from the object storage tier to satisfy a query.
 
+Regardless of where your data is stored, you can still query it with standard SQL. A single SQL query transparently pulls data from the appropriate chunks using the chunk exclusion algorithms. You can `JOIN` against tiered data, build views, and even define continuous aggregates on it. In fact, because the implementation of continuous aggregates also uses hypertables, they can be tiered to low-cost storage as well. 
 
 ## Benefits of the object storage tier
 
-The object storage tier is more than an archiving solution:
+The object storage tier is more than an archiving solution. It is also:
 
-*   **Cost effective.** Store high volumes of data cost-efficiently.
+*   **Cost-effective:** store high volumes of data at a lower cost.
     You pay only for what you store, with no extra cost for queries.
 
-*   **Scalable.**  Scale past the restrictions imposed by storage that can be attached
+*   **Scalable:** scale past the restrictions imposed by storage that can be attached
     directly to a Timescale service (currently 16 TB).
 
-*   **Online.**  Your data is always there and can be [queried when needed][querying-tiered-data]. 
+*   **Online:** your data is always there and can be [queried when needed][querying-tiered-data]. 
 
 ## Architecture
 
-The tiered storage backend works by periodically and asynchronously moving older chunks to the object storage tier;
-an object store built on Amazon S3.
-There, it's stored in the Apache Parquet format, which is a compressed
-columnar format well-suited for S3. Data remains accessible both during and after the migration.
+The tiered storage backend works by periodically and asynchronously moving older chunks from the high-performance storage to the object storage. 
+There, it's stored in the Apache Parquet format, which is a compressed columnar format well-suited for S3. Within a Parquet file, a set of rows is grouped together to form a row group. Within a row group, values for a single column across multiple rows are stored together.
 
 By default, tiered data is not included when querying from a Timescale service. 
-However, it is possible to access tiered data by [enabling tiered reads][querying-tiered-data] for a session, query, or even for all sessions.   
+However, you can access tiered data by [enabling tiered reads][querying-tiered-data] for a query, a session, or even for all sessions. After you enable tiered reads, when you run regular SQL queries, a behind-the-scenes process transparently pulls data from wherever it's located: the standard high-performance storage tier, the object storage tier, or both.
 
-With tiered reads enabled, when you run regular SQL queries, a behind-the-scenes process transparently
-pulls data from wherever it's located: the standard high-performance storage tier, the object storage tier, or both.
 Various SQL optimizations limit what needs to be read from S3:
 
-*   Chunk exclusion avoids processing chunks that fall outside the query's time window
-*   The database uses metadata about row groups and columnar offsets, so only
-    part of an object needs to be read from S3
-
-The result is transparent queries across standard PostgreSQL storage and S3
-storage, so your queries fetch the same data as before.
+* **Chunk pruning** - exclude the chunks that fall outside the query time window.
+* **Row group pruning** - identify the row groups within the Parquet object that satisfy the query.
+* **Column pruning** - fetch only columns that are requested by the query.
+
+The result is transparent queries across high-performance storage and object storage, so your queries fetch the same data as before.
+
+The following query is against a tiered dataset and illustrates the optimizations:
+
+```sql
+EXPLAIN ANALYZE 
+SELECT count(*) FROM
+( SELECT device_uuid,  sensor_id FROM public.device_readings 
+  WHERE observed_at > '2023-08-28 00:00+00' and observed_at < '2023-08-29 00:00+00' 
+  GROUP BY device_uuid,  sensor_id ) q;
+            QUERY PLAN                                                                  
+
+-------------------------------------------------------------------------------------------------
+ Aggregate  (cost=7277226.78..7277226.79 rows=1 width=8) (actual time=234993.749..234993.750 rows=1 loops=1)
+   ->  HashAggregate  (cost=4929031.23..7177226.78 rows=8000000 width=68) (actual time=184256.546..234913.067 rows=1651523 loops=1)
+         Group Key: osm_chunk_1.device_uuid, osm_chunk_1.sensor_id
+         Planned Partitions: 128  Batches: 129  Memory Usage: 20497kB  Disk Usage: 4429832kB
+         ->  Foreign Scan on osm_chunk_1  (cost=0.00..0.00 rows=92509677 width=68) (actual time=345.890..128688.459 rows=92505457 loops=1)
+               Filter: ((observed_at > '2023-08-28 00:00:00+00'::timestamp with time zone) AND (observed_at < '2023-08-29 00:00:00+00'::timestamp with t
+ime zone))
+               Rows Removed by Filter: 4220
+               Match tiered objects: 3
+               Row Groups:
+                 _timescaledb_internal._hyper_1_42_chunk: 0-74
+                 _timescaledb_internal._hyper_1_43_chunk: 0-29
+                 _timescaledb_internal._hyper_1_44_chunk: 0-71
+               S3 requests: 177
+               S3 data: 224423195 bytes
+ Planning Time: 6.216 ms
+ Execution Time: 235372.223 ms
+(16 rows)
+```
+
+`EXPLAIN` illustrates which chunks are being pulled in from the object storage tier:
+
+1. Fetch data from chunks 42, 43, and 44 from the object storage tier. 
+1. Prune row groups and limit the fetch to a subset of the offsets in the
+Parquet object that potentially match the query filter. Only fetch the data
+for `device_uuid`, `sensor_id`, and `observed_at` as the query needs only these 3 columns.
 
 ## Limitations
 

diff --git a/use-timescale/data-tiering/creating-data-tiering-policy.md b/use-timescale/data-tiering/creating-data-tiering-policy.md
diff --git a/use-timescale/data-tiering/disabling-data-tiering.md b/use-timescale/data-tiering/disabling-data-tiering.md