@@ -1141,17 +1141,57 @@ async def simple_upsert(
1141
1141
desc : str = "simple_upsert" ,
1142
1142
lock : bool = True ,
1143
1143
) -> bool :
1144
- """
1144
+ """Insert a row with values + insertion_values; on conflict, update with values.
1145
+
1146
+ All of our supported databases accept the nonstandard "upsert" statement in
1147
+ their dialect of SQL. We call this a "native upsert". The syntax looks roughly
1148
+ like:
1149
+
1150
+ INSERT INTO table VALUES (values + insertion_values)
1151
+ ON CONFLICT (keyvalues)
1152
+ DO UPDATE SET (values); -- overwrite `values` columns only
1153
+
1154
+ If (values) is empty, the resulting query is slighlty simpler:
1155
+
1156
+ INSERT INTO table VALUES (insertion_values)
1157
+ ON CONFLICT (keyvalues)
1158
+ DO NOTHING; -- do not overwrite any columns
1159
+
1160
+ This function is a helper to build such queries.
1161
+
1162
+ In order for upserts to make sense, the database must be able to determine when
1163
+ an upsert CONFLICTs with an existing row. Postgres and SQLite ensure this by
1164
+ requiring that a unique index exist on the column names used to detect a
1165
+ conflict (i.e. `keyvalues.keys()`).
1166
+
1167
+ If there is no such index, we can "emulate" an upsert with a SELECT followed
1168
+ by either an INSERT or an UPDATE. This is unsafe: we cannot make the same
1169
+ atomicity guarantees that a native upsert can and are very vulnerable to races
1170
+ and crashes. Therefore if we wish to upsert without an appropriate unique index,
1171
+ we must either:
1172
+
1173
+ 1. Acquire a table-level lock before the emulated upsert (`lock=True`), or
1174
+ 2. VERY CAREFULLY ensure that we are the only thread and worker which will be
1175
+ writing to this table, in which case we can proceed without a lock
1176
+ (`lock=False`).
1177
+
1178
+ Generally speaking, you should use `lock=True`. If the table in question has a
1179
+ unique index[*], this class will use a native upsert (which is atomic and so can
1180
+ ignore the `lock` argument). Otherwise this class will use an emulated upsert,
1181
+ in which case we want the safer option unless we been VERY CAREFUL.
1182
+
1183
+ [*]: Some tables have unique indices added to them in the background. Those
1184
+ tables `T` are keys in the dictionary UNIQUE_INDEX_BACKGROUND_UPDATES,
1185
+ where `T` maps to the background update that adds a unique index to `T`.
1186
+ This dictionary is maintained by hand.
1187
+
1188
+ At runtime, we constantly check to see if each of these background updates
1189
+ has run. If so, we deem the coresponding table safe to upsert into, because
1190
+ we can now use a native insert to do so. If not, we deem the table unsafe
1191
+ to upsert into and require an emulated upsert.
1145
1192
1146
- `lock` should generally be set to True (the default), but can be set
1147
- to False if either of the following are true:
1148
- 1. there is a UNIQUE INDEX on the key columns. In this case a conflict
1149
- will cause an IntegrityError in which case this function will retry
1150
- the update.
1151
- 2. we somehow know that we are the only thread which will be updating
1152
- this table.
1153
- As an additional note, this parameter only matters for old SQLite versions
1154
- because we will use native upserts otherwise.
1193
+ Tables that do not appear in this dictionary are assumed to have an
1194
+ appropriate unique index and therefore be safe to upsert into.
1155
1195
1156
1196
Args:
1157
1197
table: The table to upsert into
0 commit comments