Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Commit 15754d7

Browse files
author
David Robertson
authored
Update UPSERT comment now that native upserts are the default (#13924)
1 parent ebd9e2d commit 15754d7

File tree

2 files changed

+51
-10
lines changed

2 files changed

+51
-10
lines changed

changelog.d/13924.misc

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Update an innaccurate comment in Synapse's upsert database helper.

synapse/storage/database.py

+50-10
Original file line numberDiff line numberDiff line change
@@ -1141,17 +1141,57 @@ async def simple_upsert(
11411141
desc: str = "simple_upsert",
11421142
lock: bool = True,
11431143
) -> bool:
1144-
"""
1144+
"""Insert a row with values + insertion_values; on conflict, update with values.
1145+
1146+
All of our supported databases accept the nonstandard "upsert" statement in
1147+
their dialect of SQL. We call this a "native upsert". The syntax looks roughly
1148+
like:
1149+
1150+
INSERT INTO table VALUES (values + insertion_values)
1151+
ON CONFLICT (keyvalues)
1152+
DO UPDATE SET (values); -- overwrite `values` columns only
1153+
1154+
If (values) is empty, the resulting query is slighlty simpler:
1155+
1156+
INSERT INTO table VALUES (insertion_values)
1157+
ON CONFLICT (keyvalues)
1158+
DO NOTHING; -- do not overwrite any columns
1159+
1160+
This function is a helper to build such queries.
1161+
1162+
In order for upserts to make sense, the database must be able to determine when
1163+
an upsert CONFLICTs with an existing row. Postgres and SQLite ensure this by
1164+
requiring that a unique index exist on the column names used to detect a
1165+
conflict (i.e. `keyvalues.keys()`).
1166+
1167+
If there is no such index, we can "emulate" an upsert with a SELECT followed
1168+
by either an INSERT or an UPDATE. This is unsafe: we cannot make the same
1169+
atomicity guarantees that a native upsert can and are very vulnerable to races
1170+
and crashes. Therefore if we wish to upsert without an appropriate unique index,
1171+
we must either:
1172+
1173+
1. Acquire a table-level lock before the emulated upsert (`lock=True`), or
1174+
2. VERY CAREFULLY ensure that we are the only thread and worker which will be
1175+
writing to this table, in which case we can proceed without a lock
1176+
(`lock=False`).
1177+
1178+
Generally speaking, you should use `lock=True`. If the table in question has a
1179+
unique index[*], this class will use a native upsert (which is atomic and so can
1180+
ignore the `lock` argument). Otherwise this class will use an emulated upsert,
1181+
in which case we want the safer option unless we been VERY CAREFUL.
1182+
1183+
[*]: Some tables have unique indices added to them in the background. Those
1184+
tables `T` are keys in the dictionary UNIQUE_INDEX_BACKGROUND_UPDATES,
1185+
where `T` maps to the background update that adds a unique index to `T`.
1186+
This dictionary is maintained by hand.
1187+
1188+
At runtime, we constantly check to see if each of these background updates
1189+
has run. If so, we deem the coresponding table safe to upsert into, because
1190+
we can now use a native insert to do so. If not, we deem the table unsafe
1191+
to upsert into and require an emulated upsert.
11451192
1146-
`lock` should generally be set to True (the default), but can be set
1147-
to False if either of the following are true:
1148-
1. there is a UNIQUE INDEX on the key columns. In this case a conflict
1149-
will cause an IntegrityError in which case this function will retry
1150-
the update.
1151-
2. we somehow know that we are the only thread which will be updating
1152-
this table.
1153-
As an additional note, this parameter only matters for old SQLite versions
1154-
because we will use native upserts otherwise.
1193+
Tables that do not appear in this dictionary are assumed to have an
1194+
appropriate unique index and therefore be safe to upsert into.
11551195
11561196
Args:
11571197
table: The table to upsert into

0 commit comments

Comments
 (0)