Cleanup routine using table partitioning #659

eskebab · 2025-02-11T13:13:26Z

Problem
We want to keep data for as long as necessary to do proper analysis. When log entries are considered useless for troubleshooting purposes, they should be deleted or partitioned

Solution
We implement database table partitioning using built-in partitioning in postgres. We can use existing cron.schedule function IF we are able to get next month from context

eskebab · 2025-03-04T09:19:20Z

Following these guidelines for creating a partitioned table:
https://www.postgresql.org/docs/current/ddl-partitioning.html

primary key must include partitioned column

To create a unique or primary key constraint on a partitioned table, the partition keys must not include any expressions or function calls and the constraint's columns must include all of the partition key columns. This limitation exists because the individual indexes making up the constraint can only directly enforce uniqueness within their own partitions; therefore, the partition structure itself must guarantee that there are not duplicates in different partitions.

create index on range column

Create an index on the key column(s), as well as any other indexes you might want, on the partitioned table. (The key index is not strictly necessary, but in most scenarios it is helpful.) This automatically creates a matching index on each partition, and any partitions you create or attach later will also have such an index. An index or unique constraint declared on a partitioned table is “virtual” in the same way that the partitioned table is: the actual data is in child indexes on the individual partition tables.

Ensure that the enable_partition_pruning configuration parameter is not disabled in postgresql.conf. If it is, queries will not be optimized as desired.

eskebab · 2025-03-04T10:59:46Z

CREATE TABLE events.trace_log_y2024m03 PARTITION OF events.trace_log
FOR VALUES FROM ('2025-03-01') TO ('2025-04-01');

creates a range partition where this insert will fall into partition:
INSERT INTO events.trace_log(
cloudeventid, resource, eventtype, consumer, "time", subscriptionid, responsecode, subscriberendpoint, activity)
VALUES ('ec10976a-e507-4ef3-9ee8-1e44a52501be', 'resource', 'eventtype', 'consumer', '2025-03-01', 1, 200, 'http://localhost', 'TEST');

while this entry will create an error, given that the above partition is the only available partition:
INSERT INTO events.trace_log(
cloudeventid, resource, eventtype, consumer, "time", subscriptionid, responsecode, subscriberendpoint, activity)
VALUES ('ec10976a-e507-4ef3-9ee8-1e44a52501be', 'resource', 'eventtype', 'consumer', '2025-04-01', 1, 200, 'http://localhost', 'TEST');

ERROR: no partition of relation "trace_log" found for row
Partition key of the failing row contains ("time") = (2025-04-01 00:00:00+02).

eskebab · 2025-03-04T11:02:34Z

If one tries to create a partition that overlaps with another, this error will be given:
partition "trace_log_y2024m04" would overlap partition "trace_log_y2024m03"

eskebab · 2025-03-11T12:58:31Z

since creating a table partition is an idempotent operation, using a background service in .NET could be a good alternative, because we would then have control over the range of the coming month/partition.

eskebab · 2025-03-17T12:54:19Z

@SandGrainOne

I tried creating some partitions of various sizes. First, I created a partition with a duration of 1 month. Then I created one that has a six month range. I then created one only 24 hours long. This worked fine as long as there was no overlap and the names were unique.

I then tested inserting data into all three partitions with success

example executions

1 day:
CREATE TABLE events.trace_log_y2024m10d01 PARTITION OF events.trace_log
FOR VALUES FROM ('2025-10-01') TO ('2025-10-02');

6 months:
CREATE TABLE events.trace_log_y2024m04m10 PARTITION OF events.trace_log
FOR VALUES FROM ('2025-04-01') TO ('2025-10-01');

1 month:
CREATE TABLE events.trace_log_y2024m04 PARTITION OF events.trace_log
FOR VALUES FROM ('2025-03-01') TO ('2025-04-01')

eskebab · 2025-03-19T08:43:20Z

Decided: look at alertManager or exception handling to prevent failure from happening silently.
Create partition for remainder of 2025.

eskebab · 2025-03-19T12:16:54Z

funksjon pg_partition_tree() i Postgres for å hente ut partisjoner som tilhører en tabell:

select * from pg_partition_tree('events.trace_log'::regclass) where parentrelid is not null;

relid	parentrelid	isleaf	level
"events.trace_log_y2024m03"	"events.trace_log"	true	1
"events.trace_log_y2024m0410"	"events.trace_log"	true	1
"events.trace_log_y2024m10d01"	"events.trace_log"	true	1

Med riktig navnekonvensjon kan vi da lage en spørring som kan sjekke om vi har en partisjon klar for neste periode. Dette kunne ha vært en background service som kjører ved ny deploy og lager en metrikk for om neste periode er klar eller ikke og eventuelt en counter for antall partisjoner, hvis det er interessant.

for mer info:
https://www.postgresql.org/docs/current/functions-admin.html

jeg antar at isleaf og level er mest interessant hvis man har flere nivåer, partisjoner av partisjonene osv.

eskebab changed the title ~~cleanup routine~~ Cleanup routine Feb 13, 2025

eskebab changed the title ~~Cleanup routine~~ Cleanup routine using table partitioning Feb 25, 2025

eskebab self-assigned this Feb 28, 2025

eskebab mentioned this issue Mar 11, 2025

Add migration script for partitioning table trace_log #678

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup routine using table partitioning #659

Cleanup routine using table partitioning #659

eskebab commented Feb 11, 2025 •

edited

Loading

eskebab commented Mar 4, 2025 •

edited

Loading

eskebab commented Mar 4, 2025

eskebab commented Mar 4, 2025

eskebab commented Mar 11, 2025

eskebab commented Mar 17, 2025 •

edited

Loading

eskebab commented Mar 19, 2025

eskebab commented Mar 19, 2025 •

edited

Loading

Cleanup routine using table partitioning #659

Cleanup routine using table partitioning #659

Comments

eskebab commented Feb 11, 2025 • edited Loading

eskebab commented Mar 4, 2025 • edited Loading

primary key must include partitioned column

create index on range column

eskebab commented Mar 4, 2025

eskebab commented Mar 4, 2025

eskebab commented Mar 11, 2025

eskebab commented Mar 17, 2025 • edited Loading

example executions

eskebab commented Mar 19, 2025

eskebab commented Mar 19, 2025 • edited Loading

eskebab commented Feb 11, 2025 •

edited

Loading

eskebab commented Mar 4, 2025 •

edited

Loading

eskebab commented Mar 17, 2025 •

edited

Loading

eskebab commented Mar 19, 2025 •

edited

Loading