Skip to content

Commit

Permalink
Reorg tiering policy sections into manage tiering (#3524)
Browse files Browse the repository at this point in the history
  • Loading branch information
atovpeko authored Nov 21, 2024
1 parent 362fd93 commit 5ee8610
Show file tree
Hide file tree
Showing 13 changed files with 396 additions and 753 deletions.
2 changes: 1 addition & 1 deletion _troubleshooting/slow-tiering-chunks.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ section: troubleshooting
products: [cloud]
topics: [data tiering]
keywords: [tiered storage]
tags: [tiered storage]
tags: [tiered storage]
---


Expand Down
2 changes: 1 addition & 1 deletion about/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ SELECT * FROM hypertable WHERE timestamp_col > now() - '100 days'::interval

For more info on queries with immutable/stable/volatile filters, check our blog post on [Implementing constraint exclusion for faster query performance](https://www.timescale.com/blog/implementing-constraint-exclusion-for-faster-query-performance/).

If you no longer want to use tiered storage for a particular hypertable, you can now disable tiering and drop the associated tiering metadata on the hypertable with a call to [disable_tiering function](https://docs.timescale.com/use-timescale/latest/data-tiering/disabling-data-tiering/).
If you no longer want to use tiered storage for a particular hypertable, you can now disable tiering and drop the associated tiering metadata on the hypertable with a call to [disable_tiering function](https://docs.timescale.com/use-timescale/latest/data-tiering/enabling-data-tiering/#disable-tiering).

### Chunk interval recommendations
Timescale Console now shows recommendations for services with too many small chunks in their hypertables.
Expand Down
80 changes: 53 additions & 27 deletions use-timescale/data-tiering/about-data-tiering.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,51 +11,77 @@ cloud_ui:

# About the object storage tier

The tiered storage architecture complements Timescale's standard high-performance storage tier with a low-cost object storage tier.
Timescale's tiered storage architecture includes a standard high-performance storage tier and a low-cost object storage tier built on Amazon S3. You can use the standard tier for data that requires quick access, and the object tier for rarely used historical data. Chunks from a single hypertable, including compressed chunks, can stretch across these two storage tiers. A compressed chunk uses a different storage representation after tiering.

You can move your hypertable data across the different storage tiers to get the best price performance.
You can use the standard high-performance storage tier for data that requires quick access,
and the low-cost object storage tier for rarely used historical data.
Regardless of where your data is stored, you can still query it with
[standard SQL][querying-tiered-data].
Because it's queried normally with SQL, you can still JOIN against tiered data,
build views on tiered data, and even define continuous aggregates on tiered data.
In fact, because the implementation of continuous aggregates also use hypertables,
they can be tiered to low-cost storage as well.
In the high-performance storage, chunks are stored in the block format. In the object storage, they are stored in a compressed, columnar format. For better interoperability across various platforms, this format is different from that of the internals of the database. It allows for more efficient columnar scans across longer time periods, and Timescale Cloud uses other metadata and query optimizations to reduce the amount of data that needs to be fetched from the object storage tier to satisfy a query.

Regardless of where your data is stored, you can still query it with standard SQL. A single SQL query transparently pulls data from the appropriate chunks using the chunk exclusion algorithms. You can `JOIN` against tiered data, build views, and even define continuous aggregates on it. In fact, because the implementation of continuous aggregates also uses hypertables, they can be tiered to low-cost storage as well.

## Benefits of the object storage tier

The object storage tier is more than an archiving solution:
The object storage tier is more than an archiving solution. It is also:

* **Cost effective.** Store high volumes of data cost-efficiently.
* **Cost-effective:** store high volumes of data at a lower cost.
You pay only for what you store, with no extra cost for queries.

* **Scalable.** Scale past the restrictions imposed by storage that can be attached
* **Scalable:** scale past the restrictions imposed by storage that can be attached
directly to a Timescale service (currently 16 TB).

* **Online.** Your data is always there and can be [queried when needed][querying-tiered-data].
* **Online:** your data is always there and can be [queried when needed][querying-tiered-data].

## Architecture

The tiered storage backend works by periodically and asynchronously moving older chunks to the object storage tier;
an object store built on Amazon S3.
There, it's stored in the Apache Parquet format, which is a compressed
columnar format well-suited for S3. Data remains accessible both during and after the migration.
The tiered storage backend works by periodically and asynchronously moving older chunks from the high-performance storage to the object storage.
There, it's stored in the Apache Parquet format, which is a compressed columnar format well-suited for S3. Within a Parquet file, a set of rows is grouped together to form a row group. Within a row group, values for a single column across multiple rows are stored together.

By default, tiered data is not included when querying from a Timescale service.
However, it is possible to access tiered data by [enabling tiered reads][querying-tiered-data] for a session, query, or even for all sessions.
However, you can access tiered data by [enabling tiered reads][querying-tiered-data] for a query, a session, or even for all sessions. After you enable tiered reads, when you run regular SQL queries, a behind-the-scenes process transparently pulls data from wherever it's located: the standard high-performance storage tier, the object storage tier, or both.

With tiered reads enabled, when you run regular SQL queries, a behind-the-scenes process transparently
pulls data from wherever it's located: the standard high-performance storage tier, the object storage tier, or both.
Various SQL optimizations limit what needs to be read from S3:

* Chunk exclusion avoids processing chunks that fall outside the query's time window
* The database uses metadata about row groups and columnar offsets, so only
part of an object needs to be read from S3

The result is transparent queries across standard PostgreSQL storage and S3
storage, so your queries fetch the same data as before.
* **Chunk pruning** - exclude the chunks that fall outside the query time window.
* **Row group pruning** - identify the row groups within the Parquet object that satisfy the query.
* **Column pruning** - fetch only columns that are requested by the query.

The result is transparent queries across high-performance storage and object storage, so your queries fetch the same data as before.

The following query is against a tiered dataset and illustrates the optimizations:

```sql
EXPLAIN ANALYZE
SELECT count(*) FROM
( SELECT device_uuid, sensor_id FROM public.device_readings
WHERE observed_at > '2023-08-28 00:00+00' and observed_at < '2023-08-29 00:00+00'
GROUP BY device_uuid, sensor_id ) q;
QUERY PLAN

-------------------------------------------------------------------------------------------------
Aggregate (cost=7277226.78..7277226.79 rows=1 width=8) (actual time=234993.749..234993.750 rows=1 loops=1)
-> HashAggregate (cost=4929031.23..7177226.78 rows=8000000 width=68) (actual time=184256.546..234913.067 rows=1651523 loops=1)
Group Key: osm_chunk_1.device_uuid, osm_chunk_1.sensor_id
Planned Partitions: 128 Batches: 129 Memory Usage: 20497kB Disk Usage: 4429832kB
-> Foreign Scan on osm_chunk_1 (cost=0.00..0.00 rows=92509677 width=68) (actual time=345.890..128688.459 rows=92505457 loops=1)
Filter: ((observed_at > '2023-08-28 00:00:00+00'::timestamp with time zone) AND (observed_at < '2023-08-29 00:00:00+00'::timestamp with t
ime zone))
Rows Removed by Filter: 4220
Match tiered objects: 3
Row Groups:
_timescaledb_internal._hyper_1_42_chunk: 0-74
_timescaledb_internal._hyper_1_43_chunk: 0-29
_timescaledb_internal._hyper_1_44_chunk: 0-71
S3 requests: 177
S3 data: 224423195 bytes
Planning Time: 6.216 ms
Execution Time: 235372.223 ms
(16 rows)
```

`EXPLAIN` illustrates which chunks are being pulled in from the object storage tier:

1. Fetch data from chunks 42, 43, and 44 from the object storage tier.
1. Prune row groups and limit the fetch to a subset of the offsets in the
Parquet object that potentially match the query filter. Only fetch the data
for `device_uuid`, `sensor_id`, and `observed_at` as the query needs only these 3 columns.

## Limitations

Expand Down
107 changes: 0 additions & 107 deletions use-timescale/data-tiering/creating-data-tiering-policy.md

This file was deleted.

47 changes: 0 additions & 47 deletions use-timescale/data-tiering/disabling-data-tiering.md

This file was deleted.

Loading

0 comments on commit 5ee8610

Please sign in to comment.