Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table access method for compressed hypertables #7104

Merged
merged 143 commits into from
Oct 16, 2024
Merged

Conversation

mkindahl
Copy link
Contributor

@mkindahl mkindahl commented Jul 5, 2024

This wraps our existing compression solution in the table access method API, effectively turning it into a columnar storage with compression, allowing several features normally available to PostgreSQL tables to be available on tables using TimescaleDB compression, for example:

  • You can create and delete indexes on any column of the compressed hypertable regardless of whether the hypertable is compressed or not.
  • Index plans for queries on compressed tables are generated when possible, which can significantly improve performance for queries that can take advantage of indexes.
  • Index-only plans for queries on compressed tables are generated when possible, which can improve performance by not having to decompress data.
  • Covering indexes works for compressed data.
  • Predicate indexes works for compressed data.
  • Tuple locks work as expected, meaning that SELECT FOR UPDATE (for example) will properly lock uncompressed and compressed tuples.
  • Better handling of CLUSTER and VACUUM FULL compresses the table before running vacuum.
  • Improved handling of isolation levels when working with compressed data.
  • Improved handling of CHECK constraints for compressed data.
  • Skip-scan now works on compressed data
  • Some DML and COPY directly on chunks now work as expected or give proper errors. (Previously, e.g., COPY TO directly on a chunk returned no data when compressed).
  • ANALYZE on compressed data now produces correct column statistics. This is important to produce good plans, for example JOINs.

Disable-check: commit-count
Disable-check: force-changelog-file

A changelog entry will be added in a follow-up PR.

@mkindahl mkindahl changed the title Hyperstore: TimescaleDB compression and columnar storage in a table access method Hyperstore: compression and columnar storage in a table access method Jul 5, 2024
Copy link
Member

@svenklemm svenklemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this PR also has unrelated commits that are independant of the table access method. It would help ease review if those could be pulled out and moved into separate PRs (e.g. 96141c6) . This PR also seems to be inconsistent, in the initial commit the table access method is named tscompression and later on its changed to hyperstore. It would be useful if those commits were squashed to ease review.

sql/pre_install/schemas.sql Outdated Show resolved Hide resolved
sql/pre_install/schemas.sql Show resolved Hide resolved
sql/updates/latest-dev.sql Outdated Show resolved Hide resolved
sql/updates/latest-dev.sql Show resolved Hide resolved
.github/gh_matrix_builder.py Show resolved Hide resolved
@mkindahl
Copy link
Contributor Author

It looks like this PR also has unrelated commits that are independant of the table access method. It would help ease review if those could be pulled out and moved into separate PRs (e.g. 96141c6) .

It makes sense to split out several changes into separate pull requests. In some cases these changes are part of other pull requests that modify Hyperstore files, but that can be dealt with by separating out the changes and then rebasing this PR.

This PR also seems to be inconsistent, in the initial commit the table access method is named tscompression and later on its changed to hyperstore. It would be useful if those commits were squashed to ease review.

I can make an attempt at squashing this when I get back but there is a risk that this change cause ripple effects and might be hard to deal with.

Copy link
Contributor

@antekresic antekresic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there are a lot of unrelated changes that could have been separate PRs to reduce the amount of changes in this PR.

Looking at this, I found it very surprising that you are changing the table access method on each compression state change. Effectively, you only use the access method if you have a compressed chunk. Ultimately, in my mind, this defeats the purpose of a TAM as an encapsulation method since you have to maintain it the same way as chunk status. I personally would love to get rid of the compressed chunk status so doing this feels like a step in the wrong direction.

src/nodes/chunk_dispatch/chunk_dispatch.c Outdated Show resolved Hide resolved
tsl/src/compression/api.c Outdated Show resolved Hide resolved
tsl/src/compression/api.c Outdated Show resolved Hide resolved
tsl/src/compression/compression.c Outdated Show resolved Hide resolved
@akuzm
Copy link
Member

akuzm commented Jul 26, 2024

For reference, couple of tsbench runs based on this branch:

this exact branch vs the commit it was based on: https://grafana.ops.savannah-dev.timescale.com/d/fasYic_4z/compare-akuzm?orgId=1&var-branch=All&var-run1=3606&var-run2=3647&var-threshold=0&var-use_historical_thresholds=true&var-threshold_expression=2.5%20%2A%20percentile_cont%280.90%29&var-exact_suite_version=false

Interesting that we have regression in clickbench and join with this branch, probably this is related to the interface changes in compressed batch, but they looked pretty minor. This is something we should fix before merging.

@erimatnor
Copy link
Contributor

erimatnor commented Jul 29, 2024

I agree that there are a lot of unrelated changes that could have been separate PRs to reduce the amount of changes in this PR.

Can you give examples of any unrelated changes? I think most, if not all, changes are actually there to support the new use case.

Looking at this, I found it very surprising that you are changing the table access method on each compression state change. Effectively, you only use the access method if you have a compressed chunk. Ultimately, in my mind, this defeats the purpose of a TAM as an encapsulation method since you have to maintain it the same way as chunk status. I personally would love to get rid of the compressed chunk status so doing this feels like a step in the wrong direction.

It is not clear to me what you mean by "you only use the access method if you have a compressed chunk.", because the whole point of Hyperstore is to have compressed data, and the reason Hyperstore was created was to encapsulate compression in a TAM interface. To use Hyperstore with only non-compressed data, while possible, defeats the purpose of Hyperstore since in that case it is nothing more than a plain heap table with a bunch of downsides. In other words, the ideal state is to have only compressed data. Having non-compressed data is a transitional state.

There are, of course, situations where you will still have a lot of non-compressed data with Hyperstore, just like with compression. This happens due to DML decompression, just like before. If you want to (re)compress a Hyperstore you can use the existing APIs (compress_chunk or recompress_chunk) or VACUUM FULL (eventually VACUUM too, probably).

The chunk compression status is a different and orthogonal matter. I would also like to get rid of it, but it is not strictly tied to TAM and requires additional changes we wanted to avoid in the initial version.

Just a reminder: we will, of course, continue improving and making changes to Hyperstore after this merge. We just can't do everything in one go.

@mkindahl
Copy link
Contributor Author

mkindahl commented Aug 5, 2024

This PR also seems to be inconsistent, in the initial commit the table access method is named tscompression and later on its changed to hyperstore. It would be useful if those commits were squashed to ease review.

I can make an attempt at squashing this when I get back but there is a risk that this change cause ripple effects and might be hard to deal with.

This also makes it hard to maintain the code should we need to use, e.g., bisect to figure out what change caused an issue.

@svenklemm
Copy link
Member

135 commits in a single PR is quite big. I think it would speed up reviewing if the PR was split up into smaller parts so it could be merged incrementally. Also it would help separate unrelated changes.

@svenklemm
Copy link
Member

Currently this PR segfaults when non-btree/non-hash indexes are present. We should probably error out in those cases instead of segfaulting.

@mkindahl
Copy link
Contributor Author

mkindahl commented Sep 3, 2024

Currently this PR segfaults when non-btree/non-hash indexes are present. We should probably error out in those cases instead of segfaulting.

Definitely. Note that in the next version (in progress) we have a whitelist available for index access methods that we are supporting. I think that should solve the issue.

@antekresic
Copy link
Contributor

Additionally, I've done a quick test enabling hyperstore by default and running our test suite against it. There was a few more places with segfaults in addition to creating unsupported index types. What mainly worries me is that there was a segfault on insert in compression_insert test.

These definitely need to be cleaned up, at least not to crash the database when trying out hyperstore.

@erimatnor
Copy link
Contributor

135 commits in a single PR is quite big. I think it would speed up reviewing if the PR was split up into smaller parts so it could be merged incrementally. Also it would help separate unrelated changes.

The number of commits does not necessarily reflect the size of the PR. It is the LOC in the final artifact that matters. A lot of commits evolve code already in previous commits, fixing bugs and issues, etc, so breaking the PR up along commits will actually increase the review burden because you'd be reviewing "historical" code and not the final artifact. This would only increase confusion and review burden.

Perhaps there is a way to break it up along files, but that would require a lot of extra work to produce "partial" artifacts that build and pass tests. Honestly, I am not sure it is worth the effort. Perhaps it is something we can discuss.

Another point is that the code is already reviewed, commit by commit. So, maybe we don't need/expect the same review burden as we normally expect?

Copy link
Member

@svenklemm svenklemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR should be split up into smaller pieces to ease review.

@erimatnor
Copy link
Contributor

Additionally, I've done a quick test enabling hyperstore by default and running our test suite against it. There was a few more places with segfaults in addition to creating unsupported index types. What mainly worries me is that there was a segfault on insert in compression_insert test.

These definitely need to be cleaned up, at least not to crash the database when trying out hyperstore.

This should be fixed. There are some concurrency/isolation tests that block because the lock behavior of ALTER TABLE is different (this is a PG thing). We might do work later to harmonize the locking with current compression.

@mkindahl
Copy link
Contributor Author

Currently this PR segfaults when non-btree/non-hash indexes are present. We should probably error out in those cases instead of segfaulting.

This PR now contains a whitelist and will error out for any access method not on the whitelist. Currently, we only support btree and hash.

@mkindahl mkindahl changed the title Hyperstore: compression and columnar storage in a table access method Table access method for compressed hypertables Sep 17, 2024
@mkindahl
Copy link
Contributor Author

Changed the name to clarify that this is just one piece in the puzzle of the larger effort described in Hyperstore: A Hybrid Row-Columnar Storage Engine for T̶̶̶i̶̶̶m̶̶̶e̶̶̶ ̶̶̶S̶̶̶e̶̶̶r̶̶̶i̶̶̶e̶̶̶s̶̶̶ Real-Time Analytics .

@erimatnor erimatnor force-pushed the hyperstore-rebase-on-main branch from 20fc5f1 to 75b195f Compare September 18, 2024 07:07
svenklemm pushed a commit that referenced this pull request Jan 23, 2025
This release introduces the ability to add secondary indexes to the columnstore, improves group by and filtering performance through columnstore vectorization, and contains the highly upvoted community request of transition table support. We recommend that you upgrade at the next available opportunity.

**Highlighted features in TimescaleDB v2.17.0**

* The ability to add secondary indexes to the columnstore through the new hypercore table access method.
* Significant performance improvements through vectorization (`SIMD`) for aggregations using a group by with one column and/or using a filter clause when querying the columnstore.
* Hypertables support triggers for transition tables, which is one of the most upvoted community feature requests.
* Updated methods to manage Timescale's hybrid row-columnar store (hypercore) that highlight the usage of the columnstore which includes both an optimized columnar format as well as compression.

**Dropping support for Bitnami images**

After the recent change in Bitnami’s [LTS support policy](bitnami/containers#75671), we are no longer building Bitnami images for TimescaleDB. We recommend using the [official TimescaleDB Docker image](https://hub.docker.com/r/timescale/timescaledb-ha)

**Deprecation Notice**

We are deprecating the following parameters, functions, procedures and views. They will be removed with the next major release of TimescaleDB. Please find the replacements in the table below:

| Deprecated | Replacement | Type |
| --- | --- | --- |
| decompress_chunk | convert_to_rowstore | Procedure |
| compress_chunk | convert_to_columnstore | Procedure |
| add_compression_policy | add_columnstore_policy | Function |
| remove_compression_policy | remove_columnstore_policy | Function |
| hypertable_compression_stats | hypertable_columnstore_stats | Function |
| chunk_compression_stats | chunk_columnstore_stats | Function |
| hypertable_compression_settings | hypertable_columnstore_settings | View |
| chunk_compression_settings | chunk_columnstore_settings | View |
| compression_settings | columnstore_settings | View |
| timescaledb.compress | timescaledb.enable_columnstore | Parameter |
| timescaledb.compress_segmentby | timescaledb.segmentby | Parameter |
| timescaledb.compress_orderby  | timescaledb.orderby | Parameter |

**Features**
* #7341: Vectorized aggregation with grouping by one fixed-size by-value compressed column (such as arithmetic types).
* #7104: Hypercore table access method.
* #6901: Add hypertable support for transition tables.
* #7482: Optimize recompression of partially compressed chunks.
* #7458: Support vectorized aggregation with aggregate `filter` clauses that are also vectorizable.
* #7433: Add support for merging chunks.
* #7271: Push down `order by` in real-time continuous aggregate queries.
* #7455: Support `drop not null` on compressed hypertables.
* #7295: Support `alter table set access method` on hypertable.
* #7411: Change parameter name to enable hypercore table access method.
* #7436: Add index creation on `order by` columns.
* #7443: Add hypercore function and view aliases.
* #7521: Add optional `force` argument to `refresh_continuous_aggregate`.
* #7528: Transform sorting on `time_bucket` to sorting on time for compressed chunks in some cases.
* #7565: Add hint when hypertable creation fails.
* #7390: Disable custom `hashagg` planner code.
* #7587: Add `include_tiered_data` parameter to `add_continuous_aggregate_policy` API.
* #7486: Prevent building against PostgreSQL versions with broken ABI.
* #7412: Add [GUC](https://www.postgresql.org/docs/current/acronyms.html#:~:text=GUC) for the `hypercore_use_access_method` default.
* #7413: Add GUC for segmentwise recompression.

**Bugfixes**
* #7378: Remove obsolete job referencing `policy_job_error_retention`.
* #7409: Update `bgw_job` table when altering procedure.
* #7410: Fix the `aggregated compressed column not found` error on aggregation query.
* #7426: Fix `datetime` parsing error in chunk constraint creation.
* #7432: Verify that the heap tuple is valid before using.
* #7434: Fix the segfault when internally setting the replica identity for a given chunk.
* #7488: Emit error for transition table trigger on chunks.
* #7514: Fix the error: `invalid child of chunk append`.
* #7517: Fix the performance regression on the `cagg_migrate` procedure.
* #7527: Restart scheduler on error.
* #7557: Fix null handling for in-memory tuple filtering.
* #7566: Improve transaction check in CAGG refresh.
* #7584: Fix NaN-handling for vectorized aggregation.

**Thanks**
* @bharrisau for reporting the segfault when creating chunks.
* @k-rus for suggesting that we add a hint when hypertable creation fails.
* @pgloader for reporting the issue in an internal background job.
* @staticlibs for sending the pull request that improves the transaction check in CAGG refresh.
* @uasiddiqi for reporting the `aggregated compressed column not found` error.
svenklemm pushed a commit to pallavisontakke/timescaledb that referenced this pull request Jan 23, 2025
This release introduces the ability to add secondary indexes to the columnstore, improves group by and filtering performance through columnstore vectorization, and contains the highly upvoted community request of transition table support. We recommend that you upgrade at the next available opportunity.

**Highlighted features in TimescaleDB v2.18.0**

* The ability to add secondary indexes to the columnstore through the new hypercore table access method.
* Significant performance improvements through vectorization (`SIMD`) for aggregations using a group by with one column and/or using a filter clause when querying the columnstore.
* Hypertables support triggers for transition tables, which is one of the most upvoted community feature requests.
* Updated methods to manage Timescale's hybrid row-columnar store (hypercore) that highlight the usage of the columnstore which includes both an optimized columnar format as well as compression.

**Dropping support for Bitnami images**

After the recent change in Bitnami’s [LTS support policy](bitnami/containers#75671), we are no longer building Bitnami images for TimescaleDB. We recommend using the [official TimescaleDB Docker image](https://hub.docker.com/r/timescale/timescaledb-ha)

**Deprecation Notice**

We are deprecating the following parameters, functions, procedures and views. They will be removed with the next major release of TimescaleDB. Please find the replacements in the table below:

| Deprecated | Replacement | Type |
| --- | --- | --- |
| decompress_chunk | convert_to_rowstore | Procedure |
| compress_chunk | convert_to_columnstore | Procedure |
| add_compression_policy | add_columnstore_policy | Function |
| remove_compression_policy | remove_columnstore_policy | Function |
| hypertable_compression_stats | hypertable_columnstore_stats | Function |
| chunk_compression_stats | chunk_columnstore_stats | Function |
| hypertable_compression_settings | hypertable_columnstore_settings | View |
| chunk_compression_settings | chunk_columnstore_settings | View |
| compression_settings | columnstore_settings | View |
| timescaledb.compress | timescaledb.enable_columnstore | Parameter |
| timescaledb.compress_segmentby | timescaledb.segmentby | Parameter |
| timescaledb.compress_orderby  | timescaledb.orderby | Parameter |

**Features**
* timescale#7341: Vectorized aggregation with grouping by one fixed-size by-value compressed column (such as arithmetic types).
* timescale#7104: Hypercore table access method.
* timescale#6901: Add hypertable support for transition tables.
* timescale#7482: Optimize recompression of partially compressed chunks.
* timescale#7458: Support vectorized aggregation with aggregate `filter` clauses that are also vectorizable.
* timescale#7433: Add support for merging chunks.
* timescale#7271: Push down `order by` in real-time continuous aggregate queries.
* timescale#7455: Support `drop not null` on compressed hypertables.
* timescale#7295: Support `alter table set access method` on hypertable.
* timescale#7411: Change parameter name to enable hypercore table access method.
* timescale#7436: Add index creation on `order by` columns.
* timescale#7443: Add hypercore function and view aliases.
* timescale#7521: Add optional `force` argument to `refresh_continuous_aggregate`.
* timescale#7528: Transform sorting on `time_bucket` to sorting on time for compressed chunks in some cases.
* timescale#7565: Add hint when hypertable creation fails.
* timescale#7390: Disable custom `hashagg` planner code.
* timescale#7587: Add `include_tiered_data` parameter to `add_continuous_aggregate_policy` API.
* timescale#7486: Prevent building against PostgreSQL versions with broken ABI.
* timescale#7412: Add [GUC](https://www.postgresql.org/docs/current/acronyms.html#:~:text=GUC) for the `hypercore_use_access_method` default.
* timescale#7413: Add GUC for segmentwise recompression.

**Bugfixes**
* timescale#7378: Remove obsolete job referencing `policy_job_error_retention`.
* timescale#7409: Update `bgw_job` table when altering procedure.
* timescale#7410: Fix the `aggregated compressed column not found` error on aggregation query.
* timescale#7426: Fix `datetime` parsing error in chunk constraint creation.
* timescale#7432: Verify that the heap tuple is valid before using.
* timescale#7434: Fix the segfault when internally setting the replica identity for a given chunk.
* timescale#7488: Emit error for transition table trigger on chunks.
* timescale#7514: Fix the error: `invalid child of chunk append`.
* timescale#7517: Fix the performance regression on the `cagg_migrate` procedure.
* timescale#7527: Restart scheduler on error.
* timescale#7557: Fix null handling for in-memory tuple filtering.
* timescale#7566: Improve transaction check in CAGG refresh.
* timescale#7584: Fix NaN-handling for vectorized aggregation.
* timescale#7598: Match the Postgres NaN comparison behavior in WHERE clause over compressed tables.

**Thanks**
* @bharrisau for reporting the segfault when creating chunks.
* @jakehedlund for reporting the incompatible NaN behavior in WHERE clause over compressed tables.
* @k-rus for suggesting that we add a hint when hypertable creation fails.
* @staticlibs for sending the pull request that improves the transaction check in CAGG refresh.
* @uasiddiqi for reporting the `aggregated compressed column not found` error.
svenklemm pushed a commit to pallavisontakke/timescaledb that referenced this pull request Jan 23, 2025
This release introduces the ability to add secondary indexes to the columnstore, improves group by and filtering performance through columnstore vectorization, and contains the highly upvoted community request of transition table support. We recommend that you upgrade at the next available opportunity.

**Highlighted features in TimescaleDB v2.18.0**

* The ability to add secondary indexes to the columnstore through the new hypercore table access method.
* Significant performance improvements through vectorization (`SIMD`) for aggregations using a group by with one column and/or using a filter clause when querying the columnstore.
* Hypertables support triggers for transition tables, which is one of the most upvoted community feature requests.
* Updated methods to manage Timescale's hybrid row-columnar store (hypercore) that highlight the usage of the columnstore which includes both an optimized columnar format as well as compression.

**Dropping support for Bitnami images**

After the recent change in Bitnami’s [LTS support policy](bitnami/containers#75671), we are no longer building Bitnami images for TimescaleDB. We recommend using the [official TimescaleDB Docker image](https://hub.docker.com/r/timescale/timescaledb-ha)

**Deprecation Notice**

We are deprecating the following parameters, functions, procedures and views. They will be removed with the next major release of TimescaleDB. Please find the replacements in the table below:

| Deprecated | Replacement | Type |
| --- | --- | --- |
| decompress_chunk | convert_to_rowstore | Procedure |
| compress_chunk | convert_to_columnstore | Procedure |
| add_compression_policy | add_columnstore_policy | Function |
| remove_compression_policy | remove_columnstore_policy | Function |
| hypertable_compression_stats | hypertable_columnstore_stats | Function |
| chunk_compression_stats | chunk_columnstore_stats | Function |
| hypertable_compression_settings | hypertable_columnstore_settings | View |
| chunk_compression_settings | chunk_columnstore_settings | View |
| compression_settings | columnstore_settings | View |
| timescaledb.compress | timescaledb.enable_columnstore | Parameter |
| timescaledb.compress_segmentby | timescaledb.segmentby | Parameter |
| timescaledb.compress_orderby  | timescaledb.orderby | Parameter |

**Features**
* timescale#7341: Vectorized aggregation with grouping by one fixed-size by-value compressed column (such as arithmetic types).
* timescale#7104: Hypercore table access method.
* timescale#6901: Add hypertable support for transition tables.
* timescale#7482: Optimize recompression of partially compressed chunks.
* timescale#7458: Support vectorized aggregation with aggregate `filter` clauses that are also vectorizable.
* timescale#7433: Add support for merging chunks.
* timescale#7271: Push down `order by` in real-time continuous aggregate queries.
* timescale#7455: Support `drop not null` on compressed hypertables.
* timescale#7295: Support `alter table set access method` on hypertable.
* timescale#7411: Change parameter name to enable hypercore table access method.
* timescale#7436: Add index creation on `order by` columns.
* timescale#7443: Add hypercore function and view aliases.
* timescale#7521: Add optional `force` argument to `refresh_continuous_aggregate`.
* timescale#7528: Transform sorting on `time_bucket` to sorting on time for compressed chunks in some cases.
* timescale#7565: Add hint when hypertable creation fails.
* timescale#7390: Disable custom `hashagg` planner code.
* timescale#7587: Add `include_tiered_data` parameter to `add_continuous_aggregate_policy` API.
* timescale#7486: Prevent building against PostgreSQL versions with broken ABI.
* timescale#7412: Add [GUC](https://www.postgresql.org/docs/current/acronyms.html#:~:text=GUC) for the `hypercore_use_access_method` default.
* timescale#7413: Add GUC for segmentwise recompression.

**Bugfixes**
* timescale#7378: Remove obsolete job referencing `policy_job_error_retention`.
* timescale#7409: Update `bgw_job` table when altering procedure.
* timescale#7410: Fix the `aggregated compressed column not found` error on aggregation query.
* timescale#7426: Fix `datetime` parsing error in chunk constraint creation.
* timescale#7432: Verify that the heap tuple is valid before using.
* timescale#7434: Fix the segfault when internally setting the replica identity for a given chunk.
* timescale#7488: Emit error for transition table trigger on chunks.
* timescale#7514: Fix the error: `invalid child of chunk append`.
* timescale#7517: Fix the performance regression on the `cagg_migrate` procedure.
* timescale#7527: Restart scheduler on error.
* timescale#7557: Fix null handling for in-memory tuple filtering.
* timescale#7566: Improve transaction check in CAGG refresh.
* timescale#7584: Fix NaN-handling for vectorized aggregation.
* timescale#7598: Match the Postgres NaN comparison behavior in WHERE clause over compressed tables.

**Thanks**
* @bharrisau for reporting the segfault when creating chunks.
* @jakehedlund for reporting the incompatible NaN behavior in WHERE clause over compressed tables.
* @k-rus for suggesting that we add a hint when hypertable creation fails.
* @staticlibs for sending the pull request that improves the transaction check in CAGG refresh.
* @uasiddiqi for reporting the `aggregated compressed column not found` error.
svenklemm pushed a commit to pallavisontakke/timescaledb that referenced this pull request Jan 23, 2025
This release introduces the ability to add secondary indexes to the columnstore, improves group by and filtering performance through columnstore vectorization, and contains the highly upvoted community request of transition table support. We recommend that you upgrade at the next available opportunity.

**Highlighted features in TimescaleDB v2.18.0**

* The ability to add secondary indexes to the columnstore through the new hypercore table access method.
* Significant performance improvements through vectorization (`SIMD`) for aggregations using a group by with one column and/or using a filter clause when querying the columnstore.
* Hypertables support triggers for transition tables, which is one of the most upvoted community feature requests.
* Updated methods to manage Timescale's hybrid row-columnar store (hypercore) that highlight the usage of the columnstore which includes both an optimized columnar format as well as compression.

**Dropping support for Bitnami images**

After the recent change in Bitnami’s [LTS support policy](bitnami/containers#75671), we are no longer building Bitnami images for TimescaleDB. We recommend using the [official TimescaleDB Docker image](https://hub.docker.com/r/timescale/timescaledb-ha)

**Deprecation Notice**

We are deprecating the following parameters, functions, procedures and views. They will be removed with the next major release of TimescaleDB. Please find the replacements in the table below:

| Deprecated | Replacement | Type |
| --- | --- | --- |
| decompress_chunk | convert_to_rowstore | Procedure |
| compress_chunk | convert_to_columnstore | Procedure |
| add_compression_policy | add_columnstore_policy | Function |
| remove_compression_policy | remove_columnstore_policy | Function |
| hypertable_compression_stats | hypertable_columnstore_stats | Function |
| chunk_compression_stats | chunk_columnstore_stats | Function |
| hypertable_compression_settings | hypertable_columnstore_settings | View |
| chunk_compression_settings | chunk_columnstore_settings | View |
| compression_settings | columnstore_settings | View |
| timescaledb.compress | timescaledb.enable_columnstore | Parameter |
| timescaledb.compress_segmentby | timescaledb.segmentby | Parameter |
| timescaledb.compress_orderby  | timescaledb.orderby | Parameter |

**Features**
* timescale#7341: Vectorized aggregation with grouping by one fixed-size by-value compressed column (such as arithmetic types).
* timescale#7104: Hypercore table access method.
* timescale#6901: Add hypertable support for transition tables.
* timescale#7482: Optimize recompression of partially compressed chunks.
* timescale#7458: Support vectorized aggregation with aggregate `filter` clauses that are also vectorizable.
* timescale#7433: Add support for merging chunks.
* timescale#7271: Push down `order by` in real-time continuous aggregate queries.
* timescale#7455: Support `drop not null` on compressed hypertables.
* timescale#7295: Support `alter table set access method` on hypertable.
* timescale#7411: Change parameter name to enable hypercore table access method.
* timescale#7436: Add index creation on `order by` columns.
* timescale#7443: Add hypercore function and view aliases.
* timescale#7521: Add optional `force` argument to `refresh_continuous_aggregate`.
* timescale#7528: Transform sorting on `time_bucket` to sorting on time for compressed chunks in some cases.
* timescale#7565: Add hint when hypertable creation fails.
* timescale#7390: Disable custom `hashagg` planner code.
* timescale#7587: Add `include_tiered_data` parameter to `add_continuous_aggregate_policy` API.
* timescale#7486: Prevent building against PostgreSQL versions with broken ABI.
* timescale#7412: Add [GUC](https://www.postgresql.org/docs/current/acronyms.html#:~:text=GUC) for the `hypercore_use_access_method` default.
* timescale#7413: Add GUC for segmentwise recompression.

**Bugfixes**
* timescale#7378: Remove obsolete job referencing `policy_job_error_retention`.
* timescale#7409: Update `bgw_job` table when altering procedure.
* timescale#7410: Fix the `aggregated compressed column not found` error on aggregation query.
* timescale#7426: Fix `datetime` parsing error in chunk constraint creation.
* timescale#7432: Verify that the heap tuple is valid before using.
* timescale#7434: Fix the segfault when internally setting the replica identity for a given chunk.
* timescale#7488: Emit error for transition table trigger on chunks.
* timescale#7514: Fix the error: `invalid child of chunk append`.
* timescale#7517: Fix the performance regression on the `cagg_migrate` procedure.
* timescale#7527: Restart scheduler on error.
* timescale#7557: Fix null handling for in-memory tuple filtering.
* timescale#7566: Improve transaction check in CAGG refresh.
* timescale#7584: Fix NaN-handling for vectorized aggregation.
* timescale#7598: Match the Postgres NaN comparison behavior in WHERE clause over compressed tables.

**Thanks**
* @bharrisau for reporting the segfault when creating chunks.
* @jakehedlund for reporting the incompatible NaN behavior in WHERE clause over compressed tables.
* @k-rus for suggesting that we add a hint when hypertable creation fails.
* @staticlibs for sending the pull request that improves the transaction check in CAGG refresh.
* @uasiddiqi for reporting the `aggregated compressed column not found` error.
svenklemm pushed a commit to pallavisontakke/timescaledb that referenced this pull request Jan 24, 2025
This release introduces the ability to add secondary indexes to the columnstore, improves group by and filtering performance through columnstore vectorization, and contains the highly upvoted community request of transition table support. We recommend that you upgrade at the next available opportunity.

**Highlighted features in TimescaleDB v2.18.0**

* The ability to add secondary indexes to the columnstore through the new hypercore table access method.
* Significant performance improvements through vectorization (`SIMD`) for aggregations using a group by with one column and/or using a filter clause when querying the columnstore.
* Hypertables support triggers for transition tables, which is one of the most upvoted community feature requests.
* Updated methods to manage Timescale's hybrid row-columnar store (hypercore) that highlight the usage of the columnstore which includes both an optimized columnar format as well as compression.

**Dropping support for Bitnami images**

After the recent change in Bitnami’s [LTS support policy](bitnami/containers#75671), we are no longer building Bitnami images for TimescaleDB. We recommend using the [official TimescaleDB Docker image](https://hub.docker.com/r/timescale/timescaledb-ha)

**Deprecation Notice**

We are deprecating the following parameters, functions, procedures and views. They will be removed with the next major release of TimescaleDB. Please find the replacements in the table below:

| Deprecated | Replacement | Type |
| --- | --- | --- |
| decompress_chunk | convert_to_rowstore | Procedure |
| compress_chunk | convert_to_columnstore | Procedure |
| add_compression_policy | add_columnstore_policy | Function |
| remove_compression_policy | remove_columnstore_policy | Function |
| hypertable_compression_stats | hypertable_columnstore_stats | Function |
| chunk_compression_stats | chunk_columnstore_stats | Function |
| hypertable_compression_settings | hypertable_columnstore_settings | View |
| chunk_compression_settings | chunk_columnstore_settings | View |
| compression_settings | columnstore_settings | View |
| timescaledb.compress | timescaledb.enable_columnstore | Parameter |
| timescaledb.compress_segmentby | timescaledb.segmentby | Parameter |
| timescaledb.compress_orderby  | timescaledb.orderby | Parameter |

**Features**
* timescale#7341: Vectorized aggregation with grouping by one fixed-size by-value compressed column (such as arithmetic types).
* timescale#7104: Hypercore table access method.
* timescale#6901: Add hypertable support for transition tables.
* timescale#7482: Optimize recompression of partially compressed chunks.
* timescale#7458: Support vectorized aggregation with aggregate `filter` clauses that are also vectorizable.
* timescale#7433: Add support for merging chunks.
* timescale#7271: Push down `order by` in real-time continuous aggregate queries.
* timescale#7455: Support `drop not null` on compressed hypertables.
* timescale#7295: Support `alter table set access method` on hypertable.
* timescale#7411: Change parameter name to enable hypercore table access method.
* timescale#7436: Add index creation on `order by` columns.
* timescale#7443: Add hypercore function and view aliases.
* timescale#7521: Add optional `force` argument to `refresh_continuous_aggregate`.
* timescale#7528: Transform sorting on `time_bucket` to sorting on time for compressed chunks in some cases.
* timescale#7565: Add hint when hypertable creation fails.
* timescale#7390: Disable custom `hashagg` planner code.
* timescale#7587: Add `include_tiered_data` parameter to `add_continuous_aggregate_policy` API.
* timescale#7486: Prevent building against PostgreSQL versions with broken ABI.
* timescale#7412: Add [GUC](https://www.postgresql.org/docs/current/acronyms.html#:~:text=GUC) for the `hypercore_use_access_method` default.
* timescale#7413: Add GUC for segmentwise recompression.

**Bugfixes**
* timescale#7378: Remove obsolete job referencing `policy_job_error_retention`.
* timescale#7409: Update `bgw_job` table when altering procedure.
* timescale#7410: Fix the `aggregated compressed column not found` error on aggregation query.
* timescale#7426: Fix `datetime` parsing error in chunk constraint creation.
* timescale#7432: Verify that the heap tuple is valid before using.
* timescale#7434: Fix the segfault when internally setting the replica identity for a given chunk.
* timescale#7488: Emit error for transition table trigger on chunks.
* timescale#7514: Fix the error: `invalid child of chunk append`.
* timescale#7517: Fix the performance regression on the `cagg_migrate` procedure.
* timescale#7527: Restart scheduler on error.
* timescale#7557: Fix null handling for in-memory tuple filtering.
* timescale#7566: Improve transaction check in CAGG refresh.
* timescale#7584: Fix NaN-handling for vectorized aggregation.
* timescale#7598: Match the Postgres NaN comparison behavior in WHERE clause over compressed tables.

**Thanks**
* @bharrisau for reporting the segfault when creating chunks.
* @jakehedlund for reporting the incompatible NaN behavior in WHERE clause over compressed tables.
* @k-rus for suggesting that we add a hint when hypertable creation fails.
* @staticlibs for sending the pull request that improves the transaction check in CAGG refresh.
* @uasiddiqi for reporting the `aggregated compressed column not found` error.
svenklemm pushed a commit to pallavisontakke/timescaledb that referenced this pull request Jan 24, 2025
This release introduces the ability to add secondary indexes to the columnstore, improves group by and filtering performance through columnstore vectorization, and contains the highly upvoted community request of transition table support. We recommend that you upgrade at the next available opportunity.

**Highlighted features in TimescaleDB v2.18.0**

* The ability to add secondary indexes to the columnstore through the new hypercore table access method.
* Significant performance improvements through vectorization (`SIMD`) for aggregations using a group by with one column and/or using a filter clause when querying the columnstore.
* Hypertables support triggers for transition tables, which is one of the most upvoted community feature requests.
* Updated methods to manage Timescale's hybrid row-columnar store (hypercore) that highlight the usage of the columnstore which includes both an optimized columnar format as well as compression.

**Dropping support for Bitnami images**

After the recent change in Bitnami’s [LTS support policy](bitnami/containers#75671), we are no longer building Bitnami images for TimescaleDB. We recommend using the [official TimescaleDB Docker image](https://hub.docker.com/r/timescale/timescaledb-ha)

**Deprecation Notice**

We are deprecating the following parameters, functions, procedures and views. They will be removed with the next major release of TimescaleDB. Please find the replacements in the table below:

| Deprecated | Replacement | Type |
| --- | --- | --- |
| decompress_chunk | convert_to_rowstore | Procedure |
| compress_chunk | convert_to_columnstore | Procedure |
| add_compression_policy | add_columnstore_policy | Function |
| remove_compression_policy | remove_columnstore_policy | Function |
| hypertable_compression_stats | hypertable_columnstore_stats | Function |
| chunk_compression_stats | chunk_columnstore_stats | Function |
| hypertable_compression_settings | hypertable_columnstore_settings | View |
| chunk_compression_settings | chunk_columnstore_settings | View |
| compression_settings | columnstore_settings | View |
| timescaledb.compress | timescaledb.enable_columnstore | Parameter |
| timescaledb.compress_segmentby | timescaledb.segmentby | Parameter |
| timescaledb.compress_orderby  | timescaledb.orderby | Parameter |

**Features**
* timescale#7341: Vectorized aggregation with grouping by one fixed-size by-value compressed column (such as arithmetic types).
* timescale#7104: Hypercore table access method.
* timescale#6901: Add hypertable support for transition tables.
* timescale#7482: Optimize recompression of partially compressed chunks.
* timescale#7458: Support vectorized aggregation with aggregate `filter` clauses that are also vectorizable.
* timescale#7433: Add support for merging chunks.
* timescale#7271: Push down `order by` in real-time continuous aggregate queries.
* timescale#7455: Support `drop not null` on compressed hypertables.
* timescale#7295: Support `alter table set access method` on hypertable.
* timescale#7411: Change parameter name to enable hypercore table access method.
* timescale#7436: Add index creation on `order by` columns.
* timescale#7443: Add hypercore function and view aliases.
* timescale#7521: Add optional `force` argument to `refresh_continuous_aggregate`.
* timescale#7528: Transform sorting on `time_bucket` to sorting on time for compressed chunks in some cases.
* timescale#7565: Add hint when hypertable creation fails.
* timescale#7390: Disable custom `hashagg` planner code.
* timescale#7587: Add `include_tiered_data` parameter to `add_continuous_aggregate_policy` API.
* timescale#7486: Prevent building against PostgreSQL versions with broken ABI.
* timescale#7412: Add [GUC](https://www.postgresql.org/docs/current/acronyms.html#:~:text=GUC) for the `hypercore_use_access_method` default.
* timescale#7413: Add GUC for segmentwise recompression.

**Bugfixes**
* timescale#7378: Remove obsolete job referencing `policy_job_error_retention`.
* timescale#7409: Update `bgw_job` table when altering procedure.
* timescale#7410: Fix the `aggregated compressed column not found` error on aggregation query.
* timescale#7426: Fix `datetime` parsing error in chunk constraint creation.
* timescale#7432: Verify that the heap tuple is valid before using.
* timescale#7434: Fix the segfault when internally setting the replica identity for a given chunk.
* timescale#7488: Emit error for transition table trigger on chunks.
* timescale#7514: Fix the error: `invalid child of chunk append`.
* timescale#7517: Fix the performance regression on the `cagg_migrate` procedure.
* timescale#7527: Restart scheduler on error.
* timescale#7557: Fix null handling for in-memory tuple filtering.
* timescale#7566: Improve transaction check in CAGG refresh.
* timescale#7584: Fix NaN-handling for vectorized aggregation.
* timescale#7598: Match the Postgres NaN comparison behavior in WHERE clause over compressed tables.

**Thanks**
* @bharrisau for reporting the segfault when creating chunks.
* @jakehedlund for reporting the incompatible NaN behavior in WHERE clause over compressed tables.
* @k-rus for suggesting that we add a hint when hypertable creation fails.
* @staticlibs for sending the pull request that improves the transaction check in CAGG refresh.
* @uasiddiqi for reporting the `aggregated compressed column not found` error.
svenklemm pushed a commit to pallavisontakke/timescaledb that referenced this pull request Jan 24, 2025
This release introduces the ability to add secondary indexes to the columnstore, improves group by and filtering performance through columnstore vectorization, and contains the highly upvoted community request of transition table support. We recommend that you upgrade at the next available opportunity.

**Highlighted features in TimescaleDB v2.18.0**

* The ability to add secondary indexes to the columnstore through the new hypercore table access method.
* Significant performance improvements through vectorization (`SIMD`) for aggregations using a group by with one column and/or using a filter clause when querying the columnstore.
* Hypertables support triggers for transition tables, which is one of the most upvoted community feature requests.
* Updated methods to manage Timescale's hybrid row-columnar store (hypercore) that highlight the usage of the columnstore which includes both an optimized columnar format as well as compression.

**Dropping support for Bitnami images**

After the recent change in Bitnami’s [LTS support policy](bitnami/containers#75671), we are no longer building Bitnami images for TimescaleDB. We recommend using the [official TimescaleDB Docker image](https://hub.docker.com/r/timescale/timescaledb-ha)

**Deprecation Notice**

We are deprecating the following parameters, functions, procedures and views. They will be removed with the next major release of TimescaleDB. Please find the replacements in the table below:

| Deprecated | Replacement | Type |
| --- | --- | --- |
| decompress_chunk | convert_to_rowstore | Procedure |
| compress_chunk | convert_to_columnstore | Procedure |
| add_compression_policy | add_columnstore_policy | Function |
| remove_compression_policy | remove_columnstore_policy | Function |
| hypertable_compression_stats | hypertable_columnstore_stats | Function |
| chunk_compression_stats | chunk_columnstore_stats | Function |
| hypertable_compression_settings | hypertable_columnstore_settings | View |
| chunk_compression_settings | chunk_columnstore_settings | View |
| compression_settings | columnstore_settings | View |
| timescaledb.compress | timescaledb.enable_columnstore | Parameter |
| timescaledb.compress_segmentby | timescaledb.segmentby | Parameter |
| timescaledb.compress_orderby  | timescaledb.orderby | Parameter |

**Features**
* timescale#7341: Vectorized aggregation with grouping by one fixed-size by-value compressed column (such as arithmetic types).
* timescale#7104: Hypercore table access method.
* timescale#6901: Add hypertable support for transition tables.
* timescale#7482: Optimize recompression of partially compressed chunks.
* timescale#7458: Support vectorized aggregation with aggregate `filter` clauses that are also vectorizable.
* timescale#7433: Add support for merging chunks.
* timescale#7271: Push down `order by` in real-time continuous aggregate queries.
* timescale#7455: Support `drop not null` on compressed hypertables.
* timescale#7295: Support `alter table set access method` on hypertable.
* timescale#7411: Change parameter name to enable hypercore table access method.
* timescale#7436: Add index creation on `order by` columns.
* timescale#7443: Add hypercore function and view aliases.
* timescale#7521: Add optional `force` argument to `refresh_continuous_aggregate`.
* timescale#7528: Transform sorting on `time_bucket` to sorting on time for compressed chunks in some cases.
* timescale#7565: Add hint when hypertable creation fails.
* timescale#7390: Disable custom `hashagg` planner code.
* timescale#7587: Add `include_tiered_data` parameter to `add_continuous_aggregate_policy` API.
* timescale#7486: Prevent building against PostgreSQL versions with broken ABI.
* timescale#7412: Add [GUC](https://www.postgresql.org/docs/current/acronyms.html#:~:text=GUC) for the `hypercore_use_access_method` default.
* timescale#7413: Add GUC for segmentwise recompression.

**Bugfixes**
* timescale#7378: Remove obsolete job referencing `policy_job_error_retention`.
* timescale#7409: Update `bgw_job` table when altering procedure.
* timescale#7410: Fix the `aggregated compressed column not found` error on aggregation query.
* timescale#7426: Fix `datetime` parsing error in chunk constraint creation.
* timescale#7432: Verify that the heap tuple is valid before using.
* timescale#7434: Fix the segfault when internally setting the replica identity for a given chunk.
* timescale#7488: Emit error for transition table trigger on chunks.
* timescale#7514: Fix the error: `invalid child of chunk append`.
* timescale#7517: Fix the performance regression on the `cagg_migrate` procedure.
* timescale#7527: Restart scheduler on error.
* timescale#7557: Fix null handling for in-memory tuple filtering.
* timescale#7566: Improve transaction check in CAGG refresh.
* timescale#7584: Fix NaN-handling for vectorized aggregation.
* timescale#7598: Match the Postgres NaN comparison behavior in WHERE clause over compressed tables.

**Thanks**
* @bharrisau for reporting the segfault when creating chunks.
* @jakehedlund for reporting the incompatible NaN behavior in WHERE clause over compressed tables.
* @k-rus for suggesting that we add a hint when hypertable creation fails.
* @staticlibs for sending the pull request that improves the transaction check in CAGG refresh.
* @uasiddiqi for reporting the `aggregated compressed column not found` error.
svenklemm pushed a commit that referenced this pull request Jan 24, 2025
This release introduces the ability to add secondary indexes to the columnstore, improves group by and filtering performance through columnstore vectorization, and contains the highly upvoted community request of transition table support. We recommend that you upgrade at the next available opportunity.

**Highlighted features in TimescaleDB v2.18.0**

* The ability to add secondary indexes to the columnstore through the new hypercore table access method.
* Significant performance improvements through vectorization (`SIMD`) for aggregations using a group by with one column and/or using a filter clause when querying the columnstore.
* Hypertables support triggers for transition tables, which is one of the most upvoted community feature requests.
* Updated methods to manage Timescale's hybrid row-columnar store (hypercore) that highlight the usage of the columnstore which includes both an optimized columnar format as well as compression.

**Dropping support for Bitnami images**

After the recent change in Bitnami’s [LTS support policy](bitnami/containers#75671), we are no longer building Bitnami images for TimescaleDB. We recommend using the [official TimescaleDB Docker image](https://hub.docker.com/r/timescale/timescaledb-ha)

**Deprecation Notice**

We are deprecating the following parameters, functions, procedures and views. They will be removed with the next major release of TimescaleDB. Please find the replacements in the table below:

| Deprecated | Replacement | Type |
| --- | --- | --- |
| decompress_chunk | convert_to_rowstore | Procedure |
| compress_chunk | convert_to_columnstore | Procedure |
| add_compression_policy | add_columnstore_policy | Function |
| remove_compression_policy | remove_columnstore_policy | Function |
| hypertable_compression_stats | hypertable_columnstore_stats | Function |
| chunk_compression_stats | chunk_columnstore_stats | Function |
| hypertable_compression_settings | hypertable_columnstore_settings | View |
| chunk_compression_settings | chunk_columnstore_settings | View |
| compression_settings | columnstore_settings | View |
| timescaledb.compress | timescaledb.enable_columnstore | Parameter |
| timescaledb.compress_segmentby | timescaledb.segmentby | Parameter |
| timescaledb.compress_orderby  | timescaledb.orderby | Parameter |

**Features**
* #7341: Vectorized aggregation with grouping by one fixed-size by-value compressed column (such as arithmetic types).
* #7104: Hypercore table access method.
* #6901: Add hypertable support for transition tables.
* #7482: Optimize recompression of partially compressed chunks.
* #7458: Support vectorized aggregation with aggregate `filter` clauses that are also vectorizable.
* #7433: Add support for merging chunks.
* #7271: Push down `order by` in real-time continuous aggregate queries.
* #7455: Support `drop not null` on compressed hypertables.
* #7295: Support `alter table set access method` on hypertable.
* #7411: Change parameter name to enable hypercore table access method.
* #7436: Add index creation on `order by` columns.
* #7443: Add hypercore function and view aliases.
* #7521: Add optional `force` argument to `refresh_continuous_aggregate`.
* #7528: Transform sorting on `time_bucket` to sorting on time for compressed chunks in some cases.
* #7565: Add hint when hypertable creation fails.
* #7390: Disable custom `hashagg` planner code.
* #7587: Add `include_tiered_data` parameter to `add_continuous_aggregate_policy` API.
* #7486: Prevent building against PostgreSQL versions with broken ABI.
* #7412: Add [GUC](https://www.postgresql.org/docs/current/acronyms.html#:~:text=GUC) for the `hypercore_use_access_method` default.
* #7413: Add GUC for segmentwise recompression.

**Bugfixes**
* #7378: Remove obsolete job referencing `policy_job_error_retention`.
* #7409: Update `bgw_job` table when altering procedure.
* #7410: Fix the `aggregated compressed column not found` error on aggregation query.
* #7426: Fix `datetime` parsing error in chunk constraint creation.
* #7432: Verify that the heap tuple is valid before using.
* #7434: Fix the segfault when internally setting the replica identity for a given chunk.
* #7488: Emit error for transition table trigger on chunks.
* #7514: Fix the error: `invalid child of chunk append`.
* #7517: Fix the performance regression on the `cagg_migrate` procedure.
* #7527: Restart scheduler on error.
* #7557: Fix null handling for in-memory tuple filtering.
* #7566: Improve transaction check in CAGG refresh.
* #7584: Fix NaN-handling for vectorized aggregation.
* #7598: Match the Postgres NaN comparison behavior in WHERE clause over compressed tables.

**Thanks**
* @bharrisau for reporting the segfault when creating chunks.
* @jakehedlund for reporting the incompatible NaN behavior in WHERE clause over compressed tables.
* @k-rus for suggesting that we add a hint when hypertable creation fails.
* @staticlibs for sending the pull request that improves the transaction check in CAGG refresh.
* @uasiddiqi for reporting the `aggregated compressed column not found` error.
@pantonis
Copy link

I tested this on a hypertable with compression enabled, using a B-tree index on two columns, and observed that the index was not utilized

@mkindahl
Copy link
Contributor Author

@pantonis I am not sure what you did, but here is an example based on the tests in the commit. Unfortunately, we do not have documentation yet, but that is coming.

Load the extension and disable columnar scan (it is a little too efficient, so for an example table this small it will be used).

Expanded display is used automatically.
Null display is "[NULL]".
SET
psql (17.2 (Ubuntu 17.2-1.pgdg24.04+1), server 16.6 (Ubuntu 16.6-1.pgdg24.04+1))
Type "help" for help.

demo_hypercore=# create extension timescaledb;
CREATE EXTENSION
demo_hypercore=# \dx
                                                List of installed extensions
    Name     | Version |   Schema   |                                      Description                                      
-------------+---------+------------+---------------------------------------------------------------------------------------
 plpgsql     | 1.0     | pg_catalog | PL/pgSQL procedural language
 timescaledb | 2.18.0  | public     | Enables scalable inserts and complex queries for time-series data (Community Edition)
(2 rows)
demo_hypercore=# set timescaledb.enable_columnarscan to false;
SET

Create a hypertable:

demo_hypercore=# create table readings(
       metric_id serial,
       created_at timestamptz not null unique,
       location_id smallint,
       owner_id bigint,
       device_id bigint,
       temp float8,
       humidity float4
);
CREATE TABLE
demo_hypercore=# select create_hypertable('readings', by_range('created_at'));
 create_hypertable 
-------------------
 (1,t)
(1 row)

Set compression parameters and set the default access method for the chunks to hypercore:

demo_hypercore=# alter table readings
      set (timescaledb.compress_orderby = 'created_at',
           timescaledb.compress_segmentby = 'location_id');
ALTER TABLE
demo_hypercore=# alter table readings set access method hypercore;
ALTER TABLE

Insert some data:

demo_hypercore=# insert into readings (created_at, location_id, device_id, owner_id, temp, humidity)
select t, ceil(random()*10), ceil(random()*30), ceil(random() * 5), random()*40, random()*100
from generate_series('2022-06-01'::timestamptz, '2022-07-01', '1s') t;
INSERT 0 2592001

All chunks are now using the hypercore table access method:

demo_hypercore=# select * from chunk_info where hypertable = 'readings'::regclass;
 hypertable |                  chunk                  |  amname   
------------+-----------------------------------------+-----------
 readings   | _timescaledb_internal._hyper_3_13_chunk | hypercore
 readings   | _timescaledb_internal._hyper_3_15_chunk | hypercore
 readings   | _timescaledb_internal._hyper_3_17_chunk | hypercore
 readings   | _timescaledb_internal._hyper_3_19_chunk | hypercore
 readings   | _timescaledb_internal._hyper_3_21_chunk | hypercore
 readings   | _timescaledb_internal._hyper_3_23_chunk | hypercore
(6 rows)

Compress the chunks (note that the new function convert_to_columnstore is an alias for compress_chunk):

demo_hypercore=# select compress_chunk(show_chunks('readings'));
             compress_chunk              
-----------------------------------------
 _timescaledb_internal._hyper_3_13_chunk
 _timescaledb_internal._hyper_3_15_chunk
 _timescaledb_internal._hyper_3_17_chunk
 _timescaledb_internal._hyper_3_19_chunk
 _timescaledb_internal._hyper_3_21_chunk
 _timescaledb_internal._hyper_3_23_chunk
(6 rows)

Test a query that does not use an index scan

demo_hypercore=# explain (analyze, buffers) select * from readings where metric_id = 4711;
                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..42976.60 rows=12960 width=42) (actual time=13.428..72.719 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 1
   Buffers: shared hit=5311 read=29429
   ->  Parallel Append  (cost=0.00..40680.60 rows=5400 width=42) (actual time=35.454..64.195 rows=0 loops=2)
         Buffers: shared hit=5311 read=29429
         ->  Parallel Seq Scan on _hyper_3_15_chunk  (cost=0.00..9399.00 rows=1260 width=42) (actual time=27.661..27.661 rows=0 loops=1)
               Filter: (metric_id = 4711)
               Rows Removed by Filter: 604800
               Buffers: shared hit=1246 read=6883
         ->  Parallel Seq Scan on _hyper_3_17_chunk  (cost=0.00..9399.00 rows=1260 width=42) (actual time=27.898..27.898 rows=0 loops=1)
               Filter: (metric_id = 4711)
               Rows Removed by Filter: 604800
               Buffers: shared hit=1241 read=6861
         ->  Parallel Seq Scan on _hyper_3_19_chunk  (cost=0.00..9399.00 rows=1260 width=42) (actual time=14.256..14.256 rows=0 loops=2)
               Filter: (metric_id = 4711)
               Rows Removed by Filter: 302400
               Buffers: shared hit=1218 read=6869
         ->  Parallel Seq Scan on _hyper_3_21_chunk  (cost=0.00..9399.00 rows=1260 width=42) (actual time=27.539..27.539 rows=0 loops=1)
               Filter: (metric_id = 4711)
               Rows Removed by Filter: 604800
               Buffers: shared hit=1239 read=6848
         ->  Parallel Seq Scan on _hyper_3_13_chunk  (cost=0.00..1656.24 rows=275 width=42) (actual time=0.355..4.308 rows=1 loops=1)
               Filter: (metric_id = 4711)
               Rows Removed by Filter: 93599
               Buffers: shared hit=204 read=1072
         ->  Parallel Seq Scan on _hyper_3_23_chunk  (cost=0.00..1401.36 rows=233 width=42) (actual time=12.458..12.458 rows=0 loops=1)
               Filter: (metric_id = 4711)
               Rows Removed by Filter: 79201
               Buffers: shared hit=163 read=896
 Planning:
   Buffers: shared hit=43 read=10
 Planning Time: 0.596 ms
 Execution Time: 72.785 ms
(34 rows)

Add an index for metric_id:

demo_hypercore=# create index my_index on readings (metric_id);
CREATE INDEX

Run the query again and see that the index scan is used:

demo_hypercore=# explain (analyze, buffers) select * from readings where metric_id = 4711;
                                                                        QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.29..42197.89 rows=12960 width=42) (actual time=0.011..0.044 rows=1 loops=1)
   Buffers: shared read=17 written=1
   ->  Index Scan using _hyper_3_13_chunk_my_index on _hyper_3_13_chunk  (cost=0.29..1524.48 rows=468 width=42) (actual time=0.010..0.011 rows=1 loops=1)
         Index Cond: (metric_id = 4711)
         Buffers: shared read=3
   ->  Index Scan using _hyper_3_15_chunk_my_index on _hyper_3_15_chunk  (cost=0.42..9829.34 rows=3024 width=42) (actual time=0.008..0.008 rows=0 loops=1)
         Index Cond: (metric_id = 4711)
         Buffers: shared read=3 written=1
   ->  Index Scan using _hyper_3_17_chunk_my_index on _hyper_3_17_chunk  (cost=0.42..9829.34 rows=3024 width=42) (actual time=0.007..0.007 rows=0 loops=1)
         Index Cond: (metric_id = 4711)
         Buffers: shared read=3
   ->  Index Scan using _hyper_3_19_chunk_my_index on _hyper_3_19_chunk  (cost=0.42..9829.34 rows=3024 width=42) (actual time=0.007..0.007 rows=0 loops=1)
         Index Cond: (metric_id = 4711)
         Buffers: shared read=3
   ->  Index Scan using _hyper_3_21_chunk_my_index on _hyper_3_21_chunk  (cost=0.42..9829.34 rows=3024 width=42) (actual time=0.006..0.006 rows=0 loops=1)
         Index Cond: (metric_id = 4711)
         Buffers: shared read=3
   ->  Index Scan using _hyper_3_23_chunk_my_index on _hyper_3_23_chunk  (cost=0.29..1291.22 rows=396 width=42) (actual time=0.004..0.004 rows=0 loops=1)
         Index Cond: (metric_id = 4711)
         Buffers: shared read=2
 Planning:
   Buffers: shared hit=227 read=72 dirtied=8 written=6
 Planning Time: 0.530 ms
 Execution Time: 0.068 ms
(24 rows)

@jflambert
Copy link

jflambert commented Jan 31, 2025

@mkindahl thanks for the example! I was looking for documentation but this is just as good!

I would like your opinion over the following scenario. Consider a hypertable (around 1TB over a couple billion rows) on which I want to run timescaledb.enable_columnstore, with timestamp as timescaledb.orderby and device_id as timescaledb.segmentby. Other than the timestamp index which is automatically created by create_hypertable, I also have 3 "secondary" b-tree indexes (eerily similar to yours: device_id, owner_id and location_id)

  1. I have the option of truncating all data, enabling the compression, and "re-synchronizing" this deleted data from an external source (over several hours). Would you recommend that I truncate the table first, or preserve the data and deal with a presumably lengthy initial compression? (I think this question is relevant considering your example inserted the data after you enabled the compression, but I may be overthinking this)

  2. In the case preserving data is fine, and finally on topic, would you recommend I drop the 3 existing indexes before enabling the compression, then rebuild them immediately after?

  3. If I use device_id as timescaledb.segmentby do I still need an index for it?

  4. Finally a generic compression question while I have your attention. My default retention policy is 7 days, chunk interval 1 day, and I planned on setting add_columnstore_policy to 1 day. Does this make sense to you? The past 24 hours are append-heavy. Wouldn't it be strange to use a compression policy shorter than the chunk interval?

Thank you so much!

@jflambert
Copy link

jflambert commented Jan 31, 2025

Ah, unfortunately, I must semi-confirm what @pantonis is reporting. I've tried your example line by line and

  1. The metric_id index is not used (I've also tweaked the generate_series to 1 year instead of 1 month and tried indexing device_id and owner_id without success)
  2. This doesn't work for me: select * from chunk_info where hypertable = 'readings'::regclass; (relation "chunk_info" does not exist)
  3. Answering my own previous question, filtering on location_id does use an "index", so I guess the answer is "no you don't need an index on a compress_segmentby column."

@mkindahl
Copy link
Contributor Author

Ah, unfortunately, I must semi-confirm what @pantonis is reporting. I've tried your example line by line and

1. The `metric_id` index is not used (I've also tweaked the generate_series to 1 year instead of 1 month and tried indexing `device_id` and `owner_id` without success)

Did you use my example literally, or did you use something else?

2. This doesn't work for me: `select * from chunk_info where hypertable = 'readings'::regclass;` (relation "chunk_info" does not exist)

Here is the view:

create view chunk_info as
select inh.inhparent::regclass as hypertable,
       cl.oid::regclass as chunk,
       am.amname
  from pg_class cl
  join pg_am am on cl.relam = am.oid
  join pg_inherits inh on inh.inhrelid = cl.oid;
3. Answering my own previous question, filtering on `location_id` does use an "index", so I guess the answer is "no you don't need an index on a `compress_segmentby` column."

You can add an index on a segment-by column, but it indexes the compressed data directly.

@mkindahl
Copy link
Contributor Author

@mkindahl thanks for the example! I was looking for documentation but this is just as good!

I would like your opinion over the following scenario. Consider a hypertable (around 1TB over a couple billion rows) on which I want to run timescaledb.enable_columnstore, with timestamp as timescaledb.orderby and device_id as timescaledb.segmentby. Other than the timestamp index which is automatically created by create_hypertable, I also have 3 "secondary" b-tree indexes (eerily similar to yours: device_id, owner_id and location_id)

1. I have the option of truncating all data, enabling the compression, and "re-synchronizing" this deleted data from an external source (over several hours). Would you recommend that I truncate the table first, or preserve the data and deal with a presumably lengthy initial compression? (I think this question is relevant considering your example inserted the data _after_ you enabled the compression, but I may be overthinking this)

Data inserted is added to the uncompressed region, so not automatically compressed as you insert.

2. In the case preserving data is fine, and finally on topic, would you recommend I drop the 3 existing indexes before enabling the compression, then rebuild them immediately after?

Updating the indexes while you insert is likely to increase the total time and I/O on the operation, so if you have that option, rebuilding the indexes afterwards probably have a lower total execution time and lower total I/O, but you would have to measure to make sure since it depends on a lot on factors like the amount of memory, speed of disk, etc.

3. If I use `device_id` as `timescaledb.segmentby` do I still need an index for it?

If you don't have an index, a filter will be used:

mats=# explain select * from readings where location_id = 10;
                                   QUERY PLAN                                   
--------------------------------------------------------------------------------
 Append  (cost=0.00..59446.69 rows=256535 width=42)
   ->  Seq Scan on _hyper_5_19_chunk  (cost=0.00..1173.00 rows=9382 width=42)
         Filter: (location_id = 10)
   ->  Seq Scan on _hyper_5_21_chunk  (cost=0.00..13796.00 rows=59391 width=42)
         Filter: (location_id = 10)
   ->  Seq Scan on _hyper_5_23_chunk  (cost=0.00..13796.00 rows=57960 width=42)
         Filter: (location_id = 10)
   ->  Seq Scan on _hyper_5_25_chunk  (cost=0.00..13796.00 rows=61831 width=42)
         Filter: (location_id = 10)
   ->  Seq Scan on _hyper_5_27_chunk  (cost=0.00..13796.00 rows=60077 width=42)
         Filter: (location_id = 10)
   ->  Seq Scan on _hyper_5_29_chunk  (cost=0.00..1807.01 rows=7894 width=42)
         Filter: (location_id = 10)
(13 rows)

If you add an index, it will be more compact. Note that for small tables, sequence scan is very efficient in itself, so cost estimate will be lower, so to get an index scan you need to disable sequence scan (which actually does not disable it, just makes it more expensive, so reducing the likelihood that it is chosen).

mats=# create index on readings (location_id);
CREATE INDEX
mats=# explain select * from readings where location_id = 10;
                                   QUERY PLAN                                   
--------------------------------------------------------------------------------
 Append  (cost=0.00..60422.68 rows=256535 width=42)
   ->  Seq Scan on _hyper_5_19_chunk  (cost=0.00..2148.99 rows=9382 width=42)
         Filter: (location_id = 10)
   ->  Seq Scan on _hyper_5_21_chunk  (cost=0.00..13796.00 rows=59391 width=42)
         Filter: (location_id = 10)
   ->  Seq Scan on _hyper_5_23_chunk  (cost=0.00..13796.00 rows=57960 width=42)
         Filter: (location_id = 10)
   ->  Seq Scan on _hyper_5_25_chunk  (cost=0.00..13796.00 rows=61831 width=42)
         Filter: (location_id = 10)
   ->  Seq Scan on _hyper_5_27_chunk  (cost=0.00..13796.00 rows=60077 width=42)
         Filter: (location_id = 10)
   ->  Seq Scan on _hyper_5_29_chunk  (cost=0.00..1807.01 rows=7894 width=42)
         Filter: (location_id = 10)
(13 rows)

mats=# set enable_seqscan to false;
SET
mats=# explain select * from readings where location_id = 10;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.17..112532.23 rows=256535 width=42)
   ->  Index Scan using _hyper_5_19_chunk_readings_location_id_idx on _hyper_5_19_chunk  (cost=0.17..4043.17 rows=9382 width=42)
         Index Cond: (location_id = 10)
   ->  Index Scan using _hyper_5_21_chunk_readings_location_id_idx on _hyper_5_21_chunk  (cost=0.42..25950.81 rows=59391 width=42)
         Index Cond: (location_id = 10)
   ->  Index Scan using _hyper_5_23_chunk_readings_location_id_idx on _hyper_5_23_chunk  (cost=0.42..25899.86 rows=57960 width=42)
         Index Cond: (location_id = 10)
   ->  Index Scan using _hyper_5_25_chunk_readings_location_id_idx on _hyper_5_25_chunk  (cost=0.42..26005.53 rows=61831 width=42)
         Index Cond: (location_id = 10)
   ->  Index Scan using _hyper_5_27_chunk_readings_location_id_idx on _hyper_5_27_chunk  (cost=0.42..25945.02 rows=60077 width=42)
         Index Cond: (location_id = 10)
   ->  Index Scan using _hyper_5_29_chunk_readings_location_id_idx on _hyper_5_29_chunk  (cost=0.29..3405.16 rows=7894 width=42)
         Index Cond: (location_id = 10)
 JIT:
   Functions: 6
   Options: Inlining false, Optimization false, Expressions true, Deforming true
(16 rows)
4. Finally a generic compression question while I have your attention. My default retention policy is 7 days, chunk interval 1 day, and I planned on setting `add_columnstore_policy` to 1 day. Does this make sense to you? The past 24 hours are append-heavy. Wouldn't it be strange to use a compression policy shorter than the chunk interval?

This is impossible to decide without testing it out:

  1. Smaller chunks means you can compress more aggressively but on the other hand you might not have enough data to fill the compressed rows.
  2. Larger chunks means you might get better compression ratio, but then you cannot compress as aggressively.
  3. If the policy is compressing a chunk while inserting into it, this might cause issues for the inserting operations and block them, so smaller chunks are better here.
  4. Compression policies that start executing soon after a chunk is "closed" might still interfere with the insert sessions just because the timing is bad.

@pantonis
Copy link

I created my hypertable, create the btree index, fill table with data. Next morning (not compressed manually) I can see that all chunks are compressed.

this is my index

CREATE INDEX IF NOT EXISTS "IX_Order_DimDataSourceKey_Ticket"
    ON dw."Order" USING btree
    ("DimDataSourceKey" ASC NULLS LAST, "Ticket" COLLATE pg_catalog."default" ASC NULLS LAST)
    TABLESPACE pg_default;

When I run the following query

EXPLAIN ANALYZE
SELECT *
FROM dw."Order"
WHERE "DimDataSourceKey" = 1
AND "Ticket" = '123456'

I get the following execution plan where I can see that the btree index is not used

"Append  (cost=0.30..110160.85 rows=20871001 width=4081) (actual time=9163.038..16989.204 rows=2 loops=1)"
"  ->  Custom Scan (DecompressChunk) on _hyper_4_1_chunk  (cost=0.30..421.38 rows=1398000 width=2068) (actual time=1085.782..1085.783 rows=0 loops=1)"
"        Filter: ((""Ticket"")::text = '123456'::text)"
"        Rows Removed by Filter: 1298377"
"        Vectorized Filter: (""DimDataSourceKey"" = 1)"
"        ->  Seq Scan on compress_hyper_13_102_chunk  (cost=0.00..421.38 rows=1398 width=3192) (actual time=32.672..33.372 rows=1398 loops=1)"
"              Filter: ((""_ts_meta_v2_min_DimDataSourceKey"" <= 1) AND (""_ts_meta_v2_max_DimDataSourceKey"" >= 1))"
"              Rows Removed by Filter: 561"
"  ->  Custom Scan (DecompressChunk) on _hyper_4_5_chunk  (cost=1.00..33.97 rows=34000 width=2058) (actual time=2.255..2.256 rows=0 loops=1)"
"        Filter: ((""Ticket"")::text = '123456'::text)"
"        Rows Removed by Filter: 313"
"        Vectorized Filter: (""DimDataSourceKey"" = 1)"
"        ->  Seq Scan on compress_hyper_13_135_chunk  (cost=0.00..33.97 rows=34 width=3192) (actual time=0.020..0.054 rows=34 loops=1)"
"              Filter: ((""_ts_meta_v2_min_DimDataSourceKey"" <= 1) AND (""_ts_meta_v2_max_DimDataSourceKey"" >= 1))"
"              Rows Removed by Filter: 97"
"  ->  Custom Scan (DecompressChunk) on _hyper_4_7_chunk  (cost=10.15..10.15 rows=1000 width=2067) (actual time=0.553..0.554 rows=0 loops=1)"
"        Filter: ((""Ticket"")::text = '123456'::text)"
"        Rows Removed by Filter: 95"
"        Vectorized Filter: (""DimDataSourceKey"" = 1)"
"        ->  Seq Scan on compress_hyper_13_136_chunk  (cost=0.00..10.15 rows=1 width=3192) (actual time=0.006..0.014 rows=8 loops=1)"
"              Filter: ((""_ts_meta_v2_min_DimDataSourceKey"" <= 1) AND (""_ts_meta_v2_max_DimDataSourceKey"" >= 1))"
"              Rows Removed by Filter: 17"
"  ->  Custom Scan (DecompressChunk) on _hyper_4_9_chunk  (cost=10.15..10.15 rows=1000 width=2067) (actual time=0.844..0.844 rows=0 loops=1)"
"        Filter: ((""Ticket"")::text = '123456'::text)"
"        Rows Removed by Filter: 250"
"        Vectorized Filter: (""DimDataSourceKey"" = 1)"
"        ->  Seq Scan on compress_hyper_13_137_chunk  (cost=0.00..10.15 rows=1 width=3192) (actual time=0.005..0.014 rows=11 loops=1)"
"              Filter: ((""_ts_meta_v2_min_DimDataSourceKey"" <= 1) AND (""_ts_meta_v2_max_DimDataSourceKey"" >= 1))"
"              Rows Removed by Filter: 30"
"  ->  Custom Scan (DecompressChunk) on _hyper_4_11_chunk  (cost=0.54..16.82 rows=31000 width=2061) (actual time=1.890..1.891 rows=0 loops=1)"
"        Filter: ((""Ticket"")::text = '123456'::text)"
"        Rows Removed by Filter: 143"
"        Vectorized Filter: (""DimDataSourceKey"" = 1)"
"        ->  Seq Scan on compress_hyper_13_138_chunk  (cost=0.00..16.82 rows=31 width=3192) (actual time=0.006..0.022 rows=31 loops=1)"
"              Filter: ((""_ts_meta_v2_min_DimDataSourceKey"" <= 1) AND (""_ts_meta_v2_max_DimDataSourceKey"" >= 1))"
"              Rows Removed by Filter: 24"
"  ->  Custom Scan (DecompressChunk) on _hyper_4_18_chunk  (cost=10.15..10.15 rows=1000 width=7178) (actual time=0.124..0.124 rows=0 loops=1)"
"        Filter: ((""Ticket"")::text = '123456'::text)"
"        Rows Removed by Filter: 16"
"        Vectorized Filter: (""DimDataSourceKey"" = 1)"
"        ->  Seq Scan on compress_hyper_13_139_chunk  (cost=0.00..10.15 rows=1 width=3192) (actual time=0.005..0.006 rows=2 loops=1)"
"              Filter: ((""_ts_meta_v2_min_DimDataSourceKey"" <= 1) AND (""_ts_meta_v2_max_DimDataSourceKey"" >= 1))"
"  ->  Custom Scan (DecompressChunk) on _hyper_4_19_chunk  (cost=10.15..10.15 rows=1000 width=7178) (actual time=0.223..0.223 rows=0 loops=1)"
"        Filter: ((""Ticket"")::text = '123456'::text)"
"        Rows Removed by Filter: 17"
"        Vectorized Filter: (""DimDataSourceKey"" = 1)"
"        ->  Seq Scan on compress_hyper_13_140_chunk  (cost=0.00..10.15 rows=1 width=3192) (actual time=0.006..0.007 rows=4 loops=1)"
"              Filter: ((""_ts_meta_v2_min_DimDataSourceKey"" <= 1) AND (""_ts_meta_v2_max_DimDataSourceKey"" >= 1))"
"  ->  Custom Scan (DecompressChunk) on _hyper_4_22_chunk  (cost=0.31..1238.33 rows=4000000 width=2067) (actual time=3506.518..3506.519 rows=0 loops=1)"
"        Filter: ((""Ticket"")::text = '123456'::text)"
"        Rows Removed by Filter: 3896363"
"        Vectorized Filter: (""DimDataSourceKey"" = 1)"
"        ->  Seq Scan on compress_hyper_13_141_chunk  (cost=0.00..1238.33 rows=4000 width=3192) (actual time=0.006..2.197 rows=4000 loops=1)"
"              Filter: ((""_ts_meta_v2_min_DimDataSourceKey"" <= 1) AND (""_ts_meta_v2_max_DimDataSourceKey"" >= 1))"
"              Rows Removed by Filter: 1755"
"  ->  Custom Scan (DecompressChunk) on _hyper_4_24_chunk  (cost=0.27..1509.20 rows=5685000 width=2067) (actual time=4553.551..4553.552 rows=0 loops=1)"
"        Filter: ((""Ticket"")::text = '123456'::text)"
"        Rows Removed by Filter: 5594835"
"        Vectorized Filter: (""DimDataSourceKey"" = 1)"
"        ->  Seq Scan on compress_hyper_13_142_chunk  (cost=0.00..1509.20 rows=5685 width=3192) (actual time=0.021..2.628 rows=5685 loops=1)"
"              Filter: ((""_ts_meta_v2_min_DimDataSourceKey"" <= 1) AND (""_ts_meta_v2_max_DimDataSourceKey"" >= 1))"
"              Rows Removed by Filter: 1062"
"  ->  Custom Scan (DecompressChunk) on _hyper_4_30_chunk  (cost=10.15..10.15 rows=1000 width=7178) (actual time=0.144..0.144 rows=0 loops=1)"
"        Filter: ((""Ticket"")::text = '123456'::text)"
"        Rows Removed by Filter: 13"
"        Vectorized Filter: (""DimDataSourceKey"" = 1)"
"        ->  Seq Scan on compress_hyper_13_143_chunk  (cost=0.00..10.15 rows=1 width=3192) (actual time=0.021..0.021 rows=2 loops=1)"
"              Filter: ((""_ts_meta_v2_min_DimDataSourceKey"" <= 1) AND (""_ts_meta_v2_max_DimDataSourceKey"" >= 1))"
"  ->  Custom Scan (DecompressChunk) on _hyper_4_31_chunk  (cost=10.15..10.15 rows=1000 width=7178) (actual time=0.118..0.119 rows=0 loops=1)"
"        Filter: ((""Ticket"")::text = '123456'::text)"
"        Rows Removed by Filter: 6"
"        Vectorized Filter: (""DimDataSourceKey"" = 1)"
"        ->  Seq Scan on compress_hyper_13_144_chunk  (cost=0.00..10.15 rows=1 width=3192) (actual time=0.005..0.006 rows=2 loops=1)"
"              Filter: ((""_ts_meta_v2_min_DimDataSourceKey"" <= 1) AND (""_ts_meta_v2_max_DimDataSourceKey"" >= 1))"
"  ->  Custom Scan (DecompressChunk) on _hyper_4_36_chunk  (cost=0.25..2472.14 rows=9712000 width=2070) (actual time=11.030..7834.682 rows=2 loops=1)"
"        Filter: ((""Ticket"")::text = '123456'::text)"
"        Rows Removed by Filter: 9613716"
"        Vectorized Filter: (""DimDataSourceKey"" = 1)"
"        ->  Seq Scan on compress_hyper_13_145_chunk  (cost=0.00..2472.14 rows=9712 width=3192) (actual time=0.005..4.487 rows=9712 loops=1)"
"              Filter: ((""_ts_meta_v2_min_DimDataSourceKey"" <= 1) AND (""_ts_meta_v2_max_DimDataSourceKey"" >= 1))"
"              Rows Removed by Filter: 1497"
"  ->  Custom Scan (DecompressChunk) on _hyper_4_44_chunk  (cost=10.15..10.15 rows=1000 width=2059) (actual time=1.007..1.007 rows=0 loops=1)"
"        Filter: ((""Ticket"")::text = '123456'::text)"
"        Rows Removed by Filter: 57"
"        Vectorized Filter: (""DimDataSourceKey"" = 1)"
"        ->  Seq Scan on compress_hyper_13_146_chunk  (cost=0.00..10.15 rows=1 width=3192) (actual time=0.024..0.032 rows=15 loops=1)"
"              Filter: ((""_ts_meta_v2_min_DimDataSourceKey"" <= 1) AND (""_ts_meta_v2_max_DimDataSourceKey"" >= 1))"
"              Rows Removed by Filter: 16"
"  ->  Custom Scan (DecompressChunk) on _hyper_4_68_chunk  (cost=10.15..10.15 rows=1000 width=7178) (actual time=0.114..0.114 rows=0 loops=1)"
"        Filter: ((""Ticket"")::text = '123456'::text)"
"        Rows Removed by Filter: 6"
"        Vectorized Filter: (""DimDataSourceKey"" = 1)"
"        ->  Seq Scan on compress_hyper_13_147_chunk  (cost=0.00..10.15 rows=1 width=3192) (actual time=0.009..0.009 rows=2 loops=1)"
"              Filter: ((""_ts_meta_v2_min_DimDataSourceKey"" <= 1) AND (""_ts_meta_v2_max_DimDataSourceKey"" >= 1))"
"  ->  Custom Scan (DecompressChunk) on _hyper_4_69_chunk  (cost=10.15..10.15 rows=1000 width=7178) (actual time=0.062..0.062 rows=0 loops=1)"
"        Filter: ((""Ticket"")::text = '123456'::text)"
"        Rows Removed by Filter: 1"
"        Vectorized Filter: (""DimDataSourceKey"" = 1)"
"        ->  Seq Scan on compress_hyper_13_148_chunk  (cost=0.00..10.15 rows=1 width=3192) (actual time=0.005..0.006 rows=1 loops=1)"
"              Filter: ((""_ts_meta_v2_min_DimDataSourceKey"" <= 1) AND (""_ts_meta_v2_max_DimDataSourceKey"" >= 1))"
"  ->  Custom Scan (DecompressChunk) on _hyper_4_70_chunk  (cost=10.15..10.15 rows=1000 width=7178) (actual time=0.060..0.060 rows=0 loops=1)"
"        Filter: ((""Ticket"")::text = '123456'::text)"
"        Rows Removed by Filter: 1"
"        Vectorized Filter: (""DimDataSourceKey"" = 1)"
"        ->  Seq Scan on compress_hyper_13_149_chunk  (cost=0.00..10.15 rows=1 width=3192) (actual time=0.006..0.007 rows=1 loops=1)"
"              Filter: ((""_ts_meta_v2_min_DimDataSourceKey"" <= 1) AND (""_ts_meta_v2_max_DimDataSourceKey"" >= 1))"
"  ->  Custom Scan (DecompressChunk) on _hyper_4_76_chunk  (cost=10.15..10.15 rows=1000 width=2058) (actual time=1.256..1.256 rows=0 loops=1)"
"        Filter: ((""Ticket"")::text = '123456'::text)"
"        Rows Removed by Filter: 173"
"        Vectorized Filter: (""DimDataSourceKey"" = 1)"
"        ->  Seq Scan on compress_hyper_13_150_chunk  (cost=0.00..10.15 rows=1 width=3192) (actual time=0.005..0.013 rows=18 loops=1)"
"              Filter: ((""_ts_meta_v2_min_DimDataSourceKey"" <= 1) AND (""_ts_meta_v2_max_DimDataSourceKey"" >= 1))"
"              Rows Removed by Filter: 4"
"  ->  Index Scan using ""_hyper_4_84_chunk_IX_Order_DimDataSourceKey_Ticket"" on _hyper_4_84_chunk  (cost=0.14..2.36 rows=1 width=2575) (actual time=0.006..0.006 rows=0 loops=1)"
"        Index Cond: ((""DimDataSourceKey"" = 1) AND ((""Ticket"")::text = '123456'::text))"
"Planning Time: 7.730 ms"
"JIT:"
"  Functions: 53"
"  Options: Inlining false, Optimization false, Expressions true, Deforming true"
"  Timing: Generation 2.343 ms, Inlining 0.000 ms, Optimization 1.520 ms, Emission 31.263 ms, Total 35.127 ms"
"Execution Time: 16992.049 ms"

@jflambert
Copy link

jflambert commented Jan 31, 2025

@mkindahl thank you so much for your feedback!

@pantonis I should mention this, I am running pg16.6 and tsdb 2.18.0. How about you?

Is there a chance this new functionality is for pg17 only?

Did you use my example literally, or did you use something else?

Yes, line for line as I said. I will try a few more times, then put up a new issue if I can't get the index to work.

@pantonis
Copy link

@jflambert same here pg16.6 with timescaledb 2.18.0

@jflambert
Copy link

@mkindahl false alert I guess? I spent two hours on this yesterday, no success on indexes. This morning I press F5 in pgadmin without changing anything and suddenly indexes kicked in. I've then recreated my test setup several times and indexes always work immediately. I'll put this in production and I'll let you know if I have any other issues. Thanks!

@jflambert
Copy link

jflambert commented Jan 31, 2025

@pantonis to be precise, we see the index being used in the uncompressed chunk (search for _hyper_4_84_chunk_IX_Order_DimDataSourceKey_Ticket)

but yeah you'd expect it to be used for compressed chunks as well, unless the scan is just more efficient (but I doubt that's the situation here) could you try with set enable_seqscan to false; first?

@pantonis
Copy link

Still the same.

@mkindahl
Copy link
Contributor Author

mkindahl commented Feb 3, 2025

@mkindahl thank you so much for your feedback!

Is there a chance this new functionality is for pg17 only?

Tests are from PG15 and upwards.

@mkindahl false alert I guess? I spent two hours on this yesterday, no success on indexes. This morning I press F5 in pgadmin without changing anything and suddenly indexes kicked in.

Good that you got it to work, but it's weird that you had it in the first place.

I've then recreated my test setup several times and indexes always work immediately. I'll put this in production and I'll let you know if I have any other issues. Thanks!

One thing that might affect the situation is if you do not have up-to-date stats. This could also be explained by it suddenly working. You could try to run vacuum analyze after you have compressed the chunks.

@mkindahl
Copy link
Contributor Author

mkindahl commented Feb 3, 2025

@pantonis to be precise, we see the index being used in the uncompressed chunk (search for _hyper_4_84_chunk_IX_Order_DimDataSourceKey_Ticket)

but yeah you'd expect it to be used for compressed chunks as well, unless the scan is just more efficient (but I doubt that's the situation here) could you try with set enable_seqscan to false; first?

The compressed chunk is "internal" when using the TAM, similar to how TOAST tables are internal. It is being used, but indirectly through the TAM API.

For @pantonis it looks more like the TAM is not used at all. Check with this view:

create view chunk_info as
select inh.inhparent::regclass as hypertable,
       cl.oid::regclass as chunk,
       am.amname
  from pg_class cl
  join pg_am am on cl.relam = am.oid
  join pg_inherits inh on inh.inhrelid = cl.oid;

And do something like:

select * from chunk_info where hypertable = 'dw."Order"'::regclass

@pantonis
Copy link

pantonis commented Feb 3, 2025

What shall I check on that view?

@jflambert
Copy link

jflambert commented Feb 3, 2025

What shall I check on that view?

I think he's interested in knowing if hypercore is the access method.

@pantonis
Copy link

pantonis commented Feb 3, 2025

"hypertable"	"chunk"	"amname"
dw."OrderFact"	_timescaledb_internal._hyper_4_1_chunk	heap

I see 18 chunks like the above

@mkindahl
Copy link
Contributor Author

mkindahl commented Feb 3, 2025

"hypertable"	"chunk"	"amname"
"dw.""OrderFact"""	"_timescaledb_internal._hyper_4_1_chunk"	"heap"

I see 18 chunks like the above

@pantonis This means you're not using the hypercore access method and this is the reason to why you do not get any index scans.

If you try something like this on the chunk I can see:

alter table _timescaledb_internal._hyper_4_1_chunk set access method hypercore;

And then try a query that will touch that chunk, you should hopefully see an index scan on that chunk. You can verify it with the view.

@pantonis
Copy link

pantonis commented Feb 3, 2025

will give it a try later. but may I ask do I have to run this for every chunk that get's created? Any documentiaton about it?

@jflambert
Copy link

jflambert commented Feb 4, 2025

@pantonis I don't think there's any documentation yet no. If you check this example (third code block), this line applies access method hypercore to the entire table (not just a chunk)

alter table readings set access method hypercore;

I feel that's what's missing for you. And if not, do also try to update the stats with analyze table_name;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants