diff --git a/_partials/_cookbook-hypertables.md b/_partials/_cookbook-hypertables.md new file mode 100644 index 0000000000..011e42f51f --- /dev/null +++ b/_partials/_cookbook-hypertables.md @@ -0,0 +1,67 @@ + +## Hypertable recipes + +This section contains recipes about hypertables. + +### Remove duplicates from an existing hypertable + +Looking to remove duplicates from an existing hypertable? One method is to run a `PARTITION BY` query to get +`ROW_NUMBER()` and then the `ctid` of rows where `row_number>1`. You then delete these rows. However, +you need to check `tableoid` and `ctid`. This is because `ctid` is not unique and might be duplicated in +different chunks. The following code example took 17 hours to process a table with 40 million rows: + +```sql +CREATE OR REPLACE FUNCTION deduplicate_chunks(ht_name TEXT, partition_columns TEXT, bot_id INT DEFAULT NULL) + RETURNS TABLE + ( + chunk_schema name, + chunk_name name, + deleted_count INT + ) +AS +$$ +DECLARE + chunk RECORD; + where_clause TEXT := ''; + deleted_count INT; +BEGIN + IF bot_id IS NOT NULL THEN + where_clause := FORMAT('WHERE bot_id = %s', bot_id); + END IF; + + FOR chunk IN + SELECT c.chunk_schema, c.chunk_name + FROM timescaledb_information.chunks c + WHERE c.hypertable_name = ht_name + LOOP + EXECUTE FORMAT(' + WITH cte AS ( + SELECT ctid, + ROW_NUMBER() OVER (PARTITION BY %s ORDER BY %s ASC) AS row_num, + * + FROM %I.%I + %s + ) + DELETE FROM %I.%I + WHERE ctid IN ( + SELECT ctid + FROM cte + WHERE row_num > 1 + ) + RETURNING 1; + ', partition_columns, partition_columns, chunk.chunk_schema, chunk.chunk_name, where_clause, chunk.chunk_schema, + chunk.chunk_name) + INTO deleted_count; + + RETURN QUERY SELECT chunk.chunk_schema, chunk.chunk_name, COALESCE(deleted_count, 0); + END LOOP; +END +$$ LANGUAGE plpgsql; + + +SELECT * +FROM deduplicate_chunks('nudge_events', 'bot_id, session_id, nudge_id, time', 2540); +``` + +Shoutout to **Mathias Ose** and **Christopher Piggott** for this recipe. + diff --git a/_partials/_cookbook-iot.md b/_partials/_cookbook-iot.md new file mode 100644 index 0000000000..5394d98f09 --- /dev/null +++ b/_partials/_cookbook-iot.md @@ -0,0 +1,131 @@ +## IoT recipes + +This section contains recipes for IoT issues: + +### Work with columnar IoT data + +Narrow and medium width tables are a great way to store IoT data. A lot of reasons are outlined in +[Designing Your Database Schema: Wide vs. Narrow Postgres Tables][blog-wide-vs-narrow]. + +One of the key advantages of narrow tables is that the schema does not have to change when you add new +sensors. Another big advantage is that each sensor can sample at different rates and times. This helps +support things like hysteresis, where new values are written infrequently unless the value changes by a +certain amount. + +#### Narrow table format example + +Working with narrow table data structures presents a few challenges. In the IoT world one concern is that +many data analysis approaches - including machine learning as well as more traditional data analysis - +require that your data is resampled and synchronized to a common time basis. Fortunately, TimescaleDB provides +you with [hyperfunctions][hyperfunctions] and other tools to help you work with this data. + +An example of a narrow table format is: + +| ts | sensor_id | value | +|-------------------------|-----------|-------| +| 2024-10-31 11:17:30.000 | 1007 | 23.45 | + +Typically you would couple this with a sensor table: + +| sensor_id | sensor_name | units | +|-----------|--------------|--------------------------| +| 1007 | temperature | degreesC | +| 1012 | heat_mode | on/off | +| 1013 | cooling_mode | on/off | +| 1041 | occupancy | number of people in room | + +A medium table retains the generic structure but adds columns of various types so that you can +use the same table to store float, int, bool, or even JSON (jsonb) data: + +| ts | sensor_id | d | i | b | t | j | +|-------------------------|-----------|-------|------|------|------|------| +| 2024-10-31 11:17:30.000 | 1007 | 23.45 | null | null | null | null | +| 2024-10-31 11:17:47.000 | 1012 | null | null | TRUE | null | null | +| 2024-10-31 11:18:01.000 | 1041 | null | 4 | null | null | null | + +To remove all-null entries, use an optional constraint such as: + +```sql + CONSTRAINT at_least_one_not_null + CHECK ((d IS NOT NULL) OR (i IS NOT NULL) OR (b IS NOT NULL) OR (j IS NOT NULL) OR (t IS NOT NULL)) +``` + +#### Get the last value of every sensor + +There are several ways to get the latest value of every sensor. The following examples use the +structure defined in [Narrow table format example][setup-a-narrow-table-format] as a reference: + +- [SELECT DISTINCT ON][select-distinct-on] +- [JOIN LATERAL][join-lateral] + +##### SELECT DISTINCT ON + +If you have a list of sensors, the easy way to get the latest value of every sensor is to use +`SELECT DISTINCT ON`: + +```sql +WITH latest_data AS ( + SELECT DISTINCT ON (sensor_id) ts, sensor_id, d + FROM iot_data + WHERE d is not null + AND ts > CURRENT_TIMESTAMP - INTERVAL '1 week' -- important + ORDER BY sensor_id, ts DESC +) +SELECT + sensor_id, sensors.name, ts, d +FROM latest_data +LEFT OUTER JOIN sensors ON latest_data.sensor_id = sensors.id +WHERE latest_data.d is not null +ORDER BY sensor_id, ts; -- Optional, for displaying results ordered by sensor_id +``` + +The common table expression (CTE) used above is not strictly necessary. However, it is an elegant way to join +to the sensor list to get a sensor name in the output. If this is not something you care about, +you can leave it out: + +```sql +SELECT DISTINCT ON (sensor_id) ts, sensor_id, d + FROM iot_data + WHERE d is not null + AND ts > CURRENT_TIMESTAMP - INTERVAL '1 week' -- important + ORDER BY sensor_id, ts DESC +``` + +It is important to take care when down-selecting this data. In the previous examples, +the time that the query would scan back was limited. However, if there any sensors that have either +not reported in a long time or in the worst case, never reported, this query devolves to a full table scan. +In a database with 1000+ sensors and 41 million rows, an unconstrained query takes over an hour. + +#### JOIN LATERAL + +An alternative to [SELECT DISTINCT ON][select-distinct-on] is to use a `JOIN LATERAL`. By selecting your entire +sensor list from the sensors table rather than pulling the IDs out using `SELECT DISTINCT`, `JOIN LATERAL` can offer +some improvements in performance: + +```sql +SELECT sensor_list.id, latest_data.ts, latest_data.d +FROM sensors sensor_list + -- Add a WHERE clause here to downselect the sensor list, if you wish +LEFT JOIN LATERAL ( + SELECT ts, d + FROM iot_data raw_data + WHERE sensor_id = sensor_list.id + ORDER BY ts DESC + LIMIT 1 +) latest_data ON true +WHERE latest_data.d is not null -- only pulling out float values ("d" column) in this example + AND latest_data.ts > CURRENT_TIMESTAMP - interval '1 week' -- important +ORDER BY sensor_list.id, latest_data.ts; +``` + +Limiting the time range is important, especially if you have a lot of data. Best practice is to use these +kinds of queries for dashboards and quick status checks. To query over a much larger time range, encapsulate +the previous example into a materialized query that refreshes infrequently, perhaps once a day. + +Shoutout to **Christopher Piggott** for this recipe. + +[blog-wide-vs-narrow]: https://www.timescale.com/learn/designing-your-database-schema-wide-vs-narrow-postgres-tables +[setup-a-narrow-table-format]: /tutorials/:currentVersion:/cookbook/#narrow-table-format-example +[select-distinct-on]: /tutorials/:currentVersion:/cookbook/#select-distinct-on +[join-lateral]: /tutorials/:currentVersion:/cookbook/#join-lateral +[hyperfunctions]: /use/:currentVersion:/hyperfunctions/ diff --git a/tutorials/cookbook.md b/tutorials/cookbook.md new file mode 100644 index 0000000000..d14aa04e2c --- /dev/null +++ b/tutorials/cookbook.md @@ -0,0 +1,33 @@ +--- +title: Timescale cookbook +excerpt: Code examples from the community that help you with loads of common conundrums. +product: [cloud, mst, self_hosted] +--- + +import Hypertables from "versionContent/_partials/_cookbook-hypertables.mdx"; +import IOT from "versionContent/_partials/_cookbook-iot.mdx"; + + +# Timescale community cookbook + +This page contains suggestions from the [TimescaleDB Community](https://timescaledb.slack.com/) about how to resolve +common issues. Use these code examples as guidance to work with your own data. + + +## Prerequisites + +To follow the examples in this page, you need a: + +- [Target Timescale Cloud service][create-a-service] +- [Connection to your service][connect-to-service] + + + + + + + + + +[create-a-service]: /getting-started/:currentVersion:/services/#create-a-timescale-cloud-service +[connect-to-service]: /getting-started/:currentVersion:/run-queries-from-console/ diff --git a/tutorials/page-index/page-index.js b/tutorials/page-index/page-index.js index 1e9510f4c7..8b2281dca2 100644 --- a/tutorials/page-index/page-index.js +++ b/tutorials/page-index/page-index.js @@ -154,6 +154,11 @@ module.exports = [ href: "simulate-iot-sensor-data", excerpt: "Simulate and query an IoT sensor dataset", }, + { + title: "Timescale community cookbook", + href: "cookbook", + excerpt: "Code examples from the community that help you with loads of common conundrums.", + }, ], }, ];