Skip to content

Commit 04ce381

Browse files
authored
Document unique IDs (#46)
1 parent a2ea674 commit 04ce381

File tree

1 file changed

+124
-0
lines changed

1 file changed

+124
-0
lines changed
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
---
2+
icon: material/identifier
3+
---
4+
# Unique IDs
5+
6+
!!! note "Experimental feature"
7+
This feature is new and experimental. Please [report](https://github.com/pgdogdev/pgdog/issues) any issues you encounter.
8+
9+
To generate unique identifiers, regular PostgreSQL databases use [sequences](https://www.postgresql.org/docs/current/sql-createsequence.html). For example, `BIGSERIAL` and `SERIAL` columns get their values by calling:
10+
11+
```
12+
SELECT nextval('sequence_name');
13+
```
14+
15+
This guarantees that these columns contain unique and monotonically increasing integers.
16+
17+
<!--The `BIGSERIAL` data type is used to identify rows in ORMs like ActiveRecord (Rails) and Django, so making them work in sharded databases is pretty important.-->
18+
19+
If your database is sharded however, using sequences will create identical IDs for different rows on different shards. To address this, PgDog can generate unique 64-bit signed identifiers internally, based on the system clock.
20+
21+
## How it works
22+
23+
The unique ID algorithm implemented by PgDog is based on three inputs:
24+
25+
- Current system time in milliseconds
26+
- Unique identifier for the PgDog node (e.g. `hostname`)
27+
- An internal sequence
28+
29+
The unique node identifier ensures that two different instances of PgDog can't produce the same ID at the same time. Additionally, the internal sequence allows for submillisecond ID creation in very busy deployments.
30+
31+
Once configured, you can fetch unique IDs using a standard SQL command:
32+
33+
=== "Command"
34+
```postgresql
35+
SHOW pgdog.unique_id;
36+
```
37+
=== "Output"
38+
```
39+
unique_id
40+
----------------
41+
29888761298944
42+
43+
```
44+
45+
46+
### Configuration
47+
48+
#### Node identifier
49+
50+
To make IDs _globally_ unique, a different node identifier is required for each instance in a PgDog deployment.
51+
52+
If you're using our [Helm chart](https://github.com/pgdogdev/helm), this is taken care of automatically when deploying it as a [`StatefulSet`](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) resource:
53+
54+
```yaml
55+
statefulSet:
56+
enabled: true
57+
```
58+
59+
Otherwise, you need to ensure each PgDog instance has a different **`NODE_ID`** environment variable configured at startup. The variable can contain anything, as long as it ends with `-<number>` (hyphen and a number).
60+
61+
For example, if you have a three node deployment, they could be identified as follows:
62+
63+
=== "Node 1"
64+
```bash
65+
export NODE_ID=pgdog-prod-0
66+
```
67+
=== "Node 2"
68+
```bash
69+
export NODE_ID=pgdog-prod-1
70+
```
71+
=== "Node 3"
72+
```bash
73+
export NODE_ID=pgdog-prod-2
74+
```
75+
76+
When configured correctly, you're able to get each node's identifier by querying the [admin](../../administration/index.md) database, for example:
77+
78+
=== "Command"
79+
```
80+
SHOW INSTANCE_ID;
81+
```
82+
=== "Output"
83+
```
84+
instance_id
85+
----------------
86+
pgdog-prod-0
87+
```
88+
89+
!!! note "Maximum number of nodes"
90+
Due to how the ID generation algorithm is implemented, PgDog allows up to a maximum
91+
of **1024** instances (starting at 0) in the same deployment.
92+
93+
94+
#### Minimum ID
95+
96+
If you're migrating data from an existing database, you can ensure that all IDs generated by PgDog start at a minimum value. This is configurable in [`pgdog.toml`](../../configuration/pgdog.toml/general.md), like so:
97+
98+
```toml
99+
[general]
100+
unique_id_min = 5_000_000
101+
```
102+
103+
When set, all generated IDs are guaranteed to be larger than this value.
104+
105+
## Limitations
106+
107+
The generated unique IDs are 64-bit signed integers, matching the `BIGINT` (and `BIGSERIAL`) PostgreSQL format. However, since they are time-based, subsequently generated IDs will have gaps, for example:
108+
109+
```
110+
678973936041
111+
678944576104
112+
678948770152
113+
```
114+
115+
This is normally not an issue, since PostgreSQL sequences are not guaranteed to be gap-free either, but this is something to be aware of for applications that attempt to detect rolled back transactions.
116+
117+
Additionally, because PgDog reserves only 41 bits for the timestamp portion of the identifier, the IDs have a maximum value. Currently, the available
118+
ID range is **69.73 years**, set to overflow on **August 3, 2095**. We expect databases to use 128-bit integers by then, expanding the ID range almost indefinitely.
119+
120+
### Generation rate
121+
122+
Since the identifiers are time-based, to ensure uniqueness, PgDog limits how many IDs can be generated per unit of time. This limit is currently **4,096** IDs per millisecond.
123+
124+
When it's reached, PgDog will pause ID generation until the clock ticks to the next millisecond. This gives it an effective ID generation rate of _4,096,000 / second / node_, which should be sufficient for most deployments.

0 commit comments

Comments
 (0)