Document unique IDs (#46)

levkk · web-flow · commit 04ce381b6dd1 · 2025-11-26T17:21:34.000-08:00
diff --git a/docs/features/sharding/unique-ids.md b/docs/features/sharding/unique-ids.md
@@ -0,0 +1,124 @@
+---
+icon: material/identifier
+---
+# Unique IDs
+
+!!! note "Experimental feature"
+    This feature is new and experimental. Please [report](https://github.com/pgdogdev/pgdog/issues) any issues you encounter.
+
+To generate unique identifiers, regular PostgreSQL databases use [sequences](https://www.postgresql.org/docs/current/sql-createsequence.html). For example, `BIGSERIAL` and `SERIAL` columns get their values by calling:
+
+```
+SELECT nextval('sequence_name');
+```
+
+This guarantees that these columns contain unique and monotonically increasing integers.
+
+<!--The `BIGSERIAL` data type is used to identify rows in ORMs like ActiveRecord (Rails) and Django, so making them work in sharded databases is pretty important.-->
+
+If your database is sharded however, using sequences will create identical IDs for different rows on different shards. To address this, PgDog can generate unique 64-bit signed identifiers internally, based on the system clock.
+
+## How it works
+
+The unique ID algorithm implemented by PgDog is based on three inputs:
+
+- Current system time in milliseconds
+- Unique identifier for the PgDog node (e.g. `hostname`)
+- An internal sequence
+
+The unique node identifier ensures that two different instances of PgDog can't produce the same ID at the same time. Additionally, the internal sequence allows for submillisecond ID creation in very busy deployments.
+
+Once configured, you can fetch unique IDs using a standard SQL command:
+
+=== "Command"
+    ```postgresql
+    SHOW pgdog.unique_id;
+    ```
+=== "Output"
+    ```
+       unique_id
+    ----------------
+     29888761298944
+
+    ```
+
+
+### Configuration
+
+#### Node identifier
+
+To make IDs _globally_ unique, a different node identifier is required for each instance in a PgDog deployment.
+
+If you're using our [Helm chart](https://github.com/pgdogdev/helm), this is taken care of automatically when deploying it as a [`StatefulSet`](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) resource:
+
+```yaml
+statefulSet:
+  enabled: true
+```
+
+Otherwise, you need to ensure each PgDog instance has a different **`NODE_ID`** environment variable configured at startup. The variable can contain anything, as long as it ends with `-<number>` (hyphen and a number).
+
+For example, if you have a three node deployment, they could be identified as follows:
+
+=== "Node 1"
+    ```bash
+    export NODE_ID=pgdog-prod-0
+    ```
+=== "Node 2"
+    ```bash
+    export NODE_ID=pgdog-prod-1
+    ```
+=== "Node 3"
+    ```bash
+    export NODE_ID=pgdog-prod-2
+    ```
+
+When configured correctly, you're able to get each node's identifier by querying the [admin](../../administration/index.md) database, for example:
+
+=== "Command"
+    ```
+    SHOW INSTANCE_ID;
+    ```
+=== "Output"
+    ```
+      instance_id
+    ----------------
+      pgdog-prod-0
+    ```
+
+!!! note "Maximum number of nodes"
+    Due to how the ID generation algorithm is implemented, PgDog allows up to a maximum
+    of **1024** instances (starting at 0) in the same deployment.
+
+
+#### Minimum ID
+
+If you're migrating data from an existing database, you can ensure that all IDs generated by PgDog start at a minimum value. This is configurable in [`pgdog.toml`](../../configuration/pgdog.toml/general.md), like so:
+
+```toml
+[general]
+unique_id_min = 5_000_000
+```
+
+When set, all generated IDs are guaranteed to be larger than this value.
+
+## Limitations
+
+The generated unique IDs are 64-bit signed integers, matching the `BIGINT` (and `BIGSERIAL`) PostgreSQL format. However, since they are time-based, subsequently generated IDs will have gaps, for example:
+
+```
+678973936041
+678944576104
+678948770152
+```
+
+This is normally not an issue, since PostgreSQL sequences are not guaranteed to be gap-free either, but this is something to be aware of for applications that attempt to detect rolled back transactions.
+
+Additionally, because PgDog reserves only 41 bits for the timestamp portion of the identifier, the IDs have a maximum value. Currently, the available
+ID range is **69.73 years**, set to overflow on **August 3, 2095**. We expect databases to use 128-bit integers by then, expanding the ID range almost indefinitely.
+
+### Generation rate
+
+Since the identifiers are time-based, to ensure uniqueness, PgDog limits how many IDs can be generated per unit of time. This limit is currently **4,096** IDs per millisecond.
+
+When it's reached, PgDog will pause ID generation until the clock ticks to the next millisecond. This gives it an effective ID generation rate of _4,096,000 / second / node_, which should be sufficient for most deployments.