You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently ran out of ID space on the external_ids table and solved the immediate problem by updating the id column’s type from integer to bigint, which won’t run out for ~4.3 billion years at the current usage level. However! It might be nice to solve the underlying issue and do some other cleanup on this table.
Some things that might be worth addressing here:
The cause of the above problem: we keep the table up-to-date by using INSERT ... ON CONFLICT (provider_location_id, system, value) DO ... and just sending every (provider_location_id, system, value) tuple we have for the location (see addExternalIds() in db.ts). That’s convenient—we can have a quick, atomic update that is handled entirely in the database. However, it turns out this causes Postgres to increment the sequence we use for the id column on every potential update, not just the ones that wind up writing new rows. It would be nice to avoid that by doing one of the following:
Dropping the id column and using (provider_location_id, system, value) as a compound primary key. Since Support multiple external IDs with the same system #206, we never change or update rows in this table (only add or delete them), so we don’t get a whole lot out of having a separate id.
Using a non-atomic query-and-insert SQL statement instead of the atomic INSERT ... ON CONFLICT ... statement. That is, use a INSERT ... SELECT ... WHERE NOT EXISTS (...) pattern:
INSERT INTO"external_ids" ("provider_location_id", "system", "value")
SELECT'c2a16e1d-3a7d-461e-9a42-7a4f1e3a183c', 'cvs', '7156'WHERE NOT EXISTS (
SELECTFROM external_ids
WHERE provider_location_id ='c2a16e1d-3a7d-461e-9a42-7a4f1e3a183c'AND system ='cvs'AND value ='71567'
);
This would be slightly slower to execute, makes for a little messier code, requires a separate statement for each tuple (with the current approach we can do the whole batch in a single statement) and make it possible for the statement to fail if two inserts wind up happening in parallel (extremely unlikely in this particular system, but possible), but wouldn’t touch the id sequence unless we’re almost certain to actually be inserting new data.
Compare the current IDs and the IDs we’re inserting in JS land and only try to insert the ones we don’t already have. That is, option 2 above but in JS instead of SQL.
We currently do INSERT ... ON CONFLICT DO UPDATE ..., but we only update the rows in conflict, which is pointless. We should INSERT ... ON CONFLICT DO NOTHING. This is probably an accidental holdover from before Support multiple external IDs with the same system #206, when we used to sometimes change external IDs, because we could only have one value per (provider_location_id, system) pair.
Drop the updated_at column, since we never do updates at this point (see above). We don’t even have lingering data from when we once did:
>selectcount(*) from external_ids where created_at != updated_at;
count
-------0
Change the system and value columns from text to varchar(125). Our current longest values for these two are 43 and 102, respectively, and they probably should have some reasonable size cap to prevent them getting out of hand (it doesn’t need to be 125; we just get slightly more efficient storage at that size or below). This is probably a bigger deal if we drop the id column as the primary key.
Make provider_location_id, system, and value not null. I’m not sure why they are currently nullable. 🤷
The text was updated successfully, but these errors were encountered:
This project has been shut down. This issue has been left open as a guide for anyone forking this project — addressing this issue (or at least knowing about it!) is likely to be worthwhile for you if you are maintaining a running copy or fork of UNIVAF.
We recently ran out of ID space on the
external_ids
table and solved the immediate problem by updating theid
column’s type frominteger
tobigint
, which won’t run out for ~4.3 billion years at the current usage level. However! It might be nice to solve the underlying issue and do some other cleanup on this table.Some things that might be worth addressing here:
The cause of the above problem: we keep the table up-to-date by using
INSERT ... ON CONFLICT (provider_location_id, system, value) DO ...
and just sending every(provider_location_id, system, value)
tuple we have for the location (seeaddExternalIds()
indb.ts
). That’s convenient—we can have a quick, atomic update that is handled entirely in the database. However, it turns out this causes Postgres to increment the sequence we use for theid
column on every potential update, not just the ones that wind up writing new rows. It would be nice to avoid that by doing one of the following:Dropping the
id
column and using(provider_location_id, system, value)
as a compound primary key. Since Support multiple external IDs with the same system #206, we never change or update rows in this table (only add or delete them), so we don’t get a whole lot out of having a separateid
.Using a non-atomic query-and-insert SQL statement instead of the atomic
INSERT ... ON CONFLICT ...
statement. That is, use aINSERT ... SELECT ... WHERE NOT EXISTS (...)
pattern:This would be slightly slower to execute, makes for a little messier code, requires a separate statement for each tuple (with the current approach we can do the whole batch in a single statement) and make it possible for the statement to fail if two inserts wind up happening in parallel (extremely unlikely in this particular system, but possible), but wouldn’t touch the
id
sequence unless we’re almost certain to actually be inserting new data.Compare the current IDs and the IDs we’re inserting in JS land and only try to insert the ones we don’t already have. That is, option 2 above but in JS instead of SQL.
We currently do
INSERT ... ON CONFLICT DO UPDATE ...
, but we only update the rows in conflict, which is pointless. We shouldINSERT ... ON CONFLICT DO NOTHING
. This is probably an accidental holdover from before Support multiple external IDs with the same system #206, when we used to sometimes change external IDs, because we could only have one value per(provider_location_id, system)
pair.univaf/server/src/db.ts
Lines 132 to 149 in a95243a
Drop the
updated_at
column, since we never do updates at this point (see above). We don’t even have lingering data from when we once did:Change the
system
andvalue
columns fromtext
tovarchar(125)
. Our current longest values for these two are 43 and 102, respectively, and they probably should have some reasonable size cap to prevent them getting out of hand (it doesn’t need to be 125; we just get slightly more efficient storage at that size or below). This is probably a bigger deal if we drop theid
column as the primary key.Make
provider_location_id
,system
, andvalue
not null. I’m not sure why they are currently nullable. 🤷The text was updated successfully, but these errors were encountered: