-
Notifications
You must be signed in to change notification settings - Fork 99
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
chore: add first cookbook recipe. (#3550)
* chore: add first cookbook recipe. Co-authored-by: atovpeko <114177030+atovpeko@users.noreply.github.com> Co-authored-by: Jônatas Davi Paganini <jonatas@timescale.com>
- Loading branch information
1 parent
20262e0
commit 0afb88f
Showing
4 changed files
with
236 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
|
||
## Hypertable recipes | ||
|
||
This section contains recipes about hypertables. | ||
|
||
### Remove duplicates from an existing hypertable | ||
|
||
Looking to remove duplicates from an existing hypertable? One method is to run a `PARTITION BY` query to get | ||
`ROW_NUMBER()` and then the `ctid` of rows where `row_number>1`. You then delete these rows. However, | ||
you need to check `tableoid` and `ctid`. This is because `ctid` is not unique and might be duplicated in | ||
different chunks. The following code example took 17 hours to process a table with 40 million rows: | ||
|
||
```sql | ||
CREATE OR REPLACE FUNCTION deduplicate_chunks(ht_name TEXT, partition_columns TEXT, bot_id INT DEFAULT NULL) | ||
RETURNS TABLE | ||
( | ||
chunk_schema name, | ||
chunk_name name, | ||
deleted_count INT | ||
) | ||
AS | ||
$$ | ||
DECLARE | ||
chunk RECORD; | ||
where_clause TEXT := ''; | ||
deleted_count INT; | ||
BEGIN | ||
IF bot_id IS NOT NULL THEN | ||
where_clause := FORMAT('WHERE bot_id = %s', bot_id); | ||
END IF; | ||
|
||
FOR chunk IN | ||
SELECT c.chunk_schema, c.chunk_name | ||
FROM timescaledb_information.chunks c | ||
WHERE c.hypertable_name = ht_name | ||
LOOP | ||
EXECUTE FORMAT(' | ||
WITH cte AS ( | ||
SELECT ctid, | ||
ROW_NUMBER() OVER (PARTITION BY %s ORDER BY %s ASC) AS row_num, | ||
* | ||
FROM %I.%I | ||
%s | ||
) | ||
DELETE FROM %I.%I | ||
WHERE ctid IN ( | ||
SELECT ctid | ||
FROM cte | ||
WHERE row_num > 1 | ||
) | ||
RETURNING 1; | ||
', partition_columns, partition_columns, chunk.chunk_schema, chunk.chunk_name, where_clause, chunk.chunk_schema, | ||
chunk.chunk_name) | ||
INTO deleted_count; | ||
|
||
RETURN QUERY SELECT chunk.chunk_schema, chunk.chunk_name, COALESCE(deleted_count, 0); | ||
END LOOP; | ||
END | ||
$$ LANGUAGE plpgsql; | ||
|
||
|
||
SELECT * | ||
FROM deduplicate_chunks('nudge_events', 'bot_id, session_id, nudge_id, time', 2540); | ||
``` | ||
|
||
Shoutout to **Mathias Ose** and **Christopher Piggott** for this recipe. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,131 @@ | ||
## IoT recipes | ||
|
||
This section contains recipes for IoT issues: | ||
|
||
### Work with columnar IoT data | ||
|
||
Narrow and medium width tables are a great way to store IoT data. A lot of reasons are outlined in | ||
[Designing Your Database Schema: Wide vs. Narrow Postgres Tables][blog-wide-vs-narrow]. | ||
|
||
One of the key advantages of narrow tables is that the schema does not have to change when you add new | ||
sensors. Another big advantage is that each sensor can sample at different rates and times. This helps | ||
support things like hysteresis, where new values are written infrequently unless the value changes by a | ||
certain amount. | ||
|
||
#### Narrow table format example | ||
|
||
Working with narrow table data structures presents a few challenges. In the IoT world one concern is that | ||
many data analysis approaches - including machine learning as well as more traditional data analysis - | ||
require that your data is resampled and synchronized to a common time basis. Fortunately, TimescaleDB provides | ||
you with [hyperfunctions][hyperfunctions] and other tools to help you work with this data. | ||
|
||
An example of a narrow table format is: | ||
|
||
| ts | sensor_id | value | | ||
|-------------------------|-----------|-------| | ||
| 2024-10-31 11:17:30.000 | 1007 | 23.45 | | ||
|
||
Typically you would couple this with a sensor table: | ||
|
||
| sensor_id | sensor_name | units | | ||
|-----------|--------------|--------------------------| | ||
| 1007 | temperature | degreesC | | ||
| 1012 | heat_mode | on/off | | ||
| 1013 | cooling_mode | on/off | | ||
| 1041 | occupancy | number of people in room | | ||
|
||
A medium table retains the generic structure but adds columns of various types so that you can | ||
use the same table to store float, int, bool, or even JSON (jsonb) data: | ||
|
||
| ts | sensor_id | d | i | b | t | j | | ||
|-------------------------|-----------|-------|------|------|------|------| | ||
| 2024-10-31 11:17:30.000 | 1007 | 23.45 | null | null | null | null | | ||
| 2024-10-31 11:17:47.000 | 1012 | null | null | TRUE | null | null | | ||
| 2024-10-31 11:18:01.000 | 1041 | null | 4 | null | null | null | | ||
|
||
To remove all-null entries, use an optional constraint such as: | ||
|
||
```sql | ||
CONSTRAINT at_least_one_not_null | ||
CHECK ((d IS NOT NULL) OR (i IS NOT NULL) OR (b IS NOT NULL) OR (j IS NOT NULL) OR (t IS NOT NULL)) | ||
``` | ||
|
||
#### Get the last value of every sensor | ||
|
||
There are several ways to get the latest value of every sensor. The following examples use the | ||
structure defined in [Narrow table format example][setup-a-narrow-table-format] as a reference: | ||
|
||
- [SELECT DISTINCT ON][select-distinct-on] | ||
- [JOIN LATERAL][join-lateral] | ||
|
||
##### SELECT DISTINCT ON | ||
|
||
If you have a list of sensors, the easy way to get the latest value of every sensor is to use | ||
`SELECT DISTINCT ON`: | ||
|
||
```sql | ||
WITH latest_data AS ( | ||
SELECT DISTINCT ON (sensor_id) ts, sensor_id, d | ||
FROM iot_data | ||
WHERE d is not null | ||
AND ts > CURRENT_TIMESTAMP - INTERVAL '1 week' -- important | ||
ORDER BY sensor_id, ts DESC | ||
) | ||
SELECT | ||
sensor_id, sensors.name, ts, d | ||
FROM latest_data | ||
LEFT OUTER JOIN sensors ON latest_data.sensor_id = sensors.id | ||
WHERE latest_data.d is not null | ||
ORDER BY sensor_id, ts; -- Optional, for displaying results ordered by sensor_id | ||
``` | ||
|
||
The common table expression (CTE) used above is not strictly necessary. However, it is an elegant way to join | ||
to the sensor list to get a sensor name in the output. If this is not something you care about, | ||
you can leave it out: | ||
|
||
```sql | ||
SELECT DISTINCT ON (sensor_id) ts, sensor_id, d | ||
FROM iot_data | ||
WHERE d is not null | ||
AND ts > CURRENT_TIMESTAMP - INTERVAL '1 week' -- important | ||
ORDER BY sensor_id, ts DESC | ||
``` | ||
|
||
It is important to take care when down-selecting this data. In the previous examples, | ||
the time that the query would scan back was limited. However, if there any sensors that have either | ||
not reported in a long time or in the worst case, never reported, this query devolves to a full table scan. | ||
In a database with 1000+ sensors and 41 million rows, an unconstrained query takes over an hour. | ||
|
||
#### JOIN LATERAL | ||
|
||
An alternative to [SELECT DISTINCT ON][select-distinct-on] is to use a `JOIN LATERAL`. By selecting your entire | ||
sensor list from the sensors table rather than pulling the IDs out using `SELECT DISTINCT`, `JOIN LATERAL` can offer | ||
some improvements in performance: | ||
|
||
```sql | ||
SELECT sensor_list.id, latest_data.ts, latest_data.d | ||
FROM sensors sensor_list | ||
-- Add a WHERE clause here to downselect the sensor list, if you wish | ||
LEFT JOIN LATERAL ( | ||
SELECT ts, d | ||
FROM iot_data raw_data | ||
WHERE sensor_id = sensor_list.id | ||
ORDER BY ts DESC | ||
LIMIT 1 | ||
) latest_data ON true | ||
WHERE latest_data.d is not null -- only pulling out float values ("d" column) in this example | ||
AND latest_data.ts > CURRENT_TIMESTAMP - interval '1 week' -- important | ||
ORDER BY sensor_list.id, latest_data.ts; | ||
``` | ||
|
||
Limiting the time range is important, especially if you have a lot of data. Best practice is to use these | ||
kinds of queries for dashboards and quick status checks. To query over a much larger time range, encapsulate | ||
the previous example into a materialized query that refreshes infrequently, perhaps once a day. | ||
|
||
Shoutout to **Christopher Piggott** for this recipe. | ||
|
||
[blog-wide-vs-narrow]: https://www.timescale.com/learn/designing-your-database-schema-wide-vs-narrow-postgres-tables | ||
[setup-a-narrow-table-format]: /tutorials/:currentVersion:/cookbook/#narrow-table-format-example | ||
[select-distinct-on]: /tutorials/:currentVersion:/cookbook/#select-distinct-on | ||
[join-lateral]: /tutorials/:currentVersion:/cookbook/#join-lateral | ||
[hyperfunctions]: /use/:currentVersion:/hyperfunctions/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
--- | ||
title: Timescale cookbook | ||
excerpt: Code examples from the community that help you with loads of common conundrums. | ||
product: [cloud, mst, self_hosted] | ||
--- | ||
|
||
import Hypertables from "versionContent/_partials/_cookbook-hypertables.mdx"; | ||
import IOT from "versionContent/_partials/_cookbook-iot.mdx"; | ||
|
||
|
||
# Timescale community cookbook | ||
|
||
This page contains suggestions from the [TimescaleDB Community](https://timescaledb.slack.com/) about how to resolve | ||
common issues. Use these code examples as guidance to work with your own data. | ||
|
||
|
||
## Prerequisites | ||
|
||
To follow the examples in this page, you need a: | ||
|
||
- [Target Timescale Cloud service][create-a-service] | ||
- [Connection to your service][connect-to-service] | ||
|
||
|
||
|
||
<Hypertables /> | ||
|
||
<IOT /> | ||
|
||
|
||
|
||
[create-a-service]: /getting-started/:currentVersion:/services/#create-a-timescale-cloud-service | ||
[connect-to-service]: /getting-started/:currentVersion:/run-queries-from-console/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters