Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

synapse_port_db does not correctly create indexes #4877

Closed
richvdh opened this issue Mar 18, 2019 · 38 comments
Closed

synapse_port_db does not correctly create indexes #4877

richvdh opened this issue Mar 18, 2019 · 38 comments
Assignees
Labels
z-bug (Deprecated Label)

Comments

@richvdh
Copy link
Member

richvdh commented Mar 18, 2019

There are a few reports of people having migrated their database from sqlite to postgres and ending up with missing or completely wrong indexes.

Tables known to be affected include:

  • user_ips (missing unique index so upserts fail)
  • device_lists_remote_cache (missing unique index so upserts fail)
  • state_groups_state (index is on (state_group) instead of (state_group, type, state_key))
@richvdh
Copy link
Member Author

richvdh commented Mar 18, 2019

The indexes in question are created on background updates. We don't run the background updates on a migration (synapse_port_db truncates the background update table), which is normally fine because we expect the data in the sqlite database to have been updated. However, we create the postgres data schema by walking the delta files, so we'll end up with entirely the wrong schema.

The best fix for now is probably to create a new schema snapshot; however it would be good to think about how we could avoid this becoming a problem again in future.

We'll also need to think about how to put everyone's schema back together again now that we have messed it up.

@MparkG
Copy link

MparkG commented Apr 1, 2019

I get this following error aswell (after applying the top posts corrections)
however the first two points are not clear to me..

#2210 error says:
psycopg2.ProgrammingError: there is no unique or exclusion constraint matching the ON CONFLICT specification

Does that mean for this device_lists_remote_cache table there are not just indexes misconfigured, but also Constraints missing?

Where can i look up how it should look, to sort it out once and for all?
Is there a .sql file with the base structure of the postgresql db, where all is listed?

Thanks

@huguesdk
Copy link

huguesdk commented Apr 7, 2019

With Synapse 0.99.3 (schema version 53), this is how to fix the database (I diffed the schemas from a normal PostgreSQL installation and from one migrated from SQLite):

CREATE INDEX access_tokens_device_id ON public.access_tokens USING btree (user_id, device_id);
CREATE INDEX current_state_events_member_index ON public.current_state_events USING btree (state_key) WHERE (type = 'm.room.member'::text);
DROP INDEX device_inbox_stream_id;
CREATE INDEX device_inbox_stream_id_user_id ON public.device_inbox USING btree (stream_id, user_id);
CREATE UNIQUE INDEX device_lists_remote_cache_unique_id ON public.device_lists_remote_cache USING btree (user_id, device_id);
CREATE UNIQUE INDEX device_lists_remote_extremeties_unique_idx ON public.device_lists_remote_extremeties USING btree (user_id);
CREATE INDEX device_lists_stream_user_id ON public.device_lists_stream USING btree (user_id, device_id);
CREATE INDEX event_contains_url_index ON public.events USING btree (room_id, topological_ordering, stream_ordering) WHERE ((contains_url = true) AND (outlier = false));
CREATE INDEX event_push_actions_highlights_index ON public.event_push_actions USING btree (user_id, room_id, topological_ordering, stream_ordering) WHERE (highlight = 1);
CREATE INDEX event_push_actions_u_highlight ON public.event_push_actions USING btree (user_id, stream_ordering);
CREATE UNIQUE INDEX event_search_event_id_idx ON public.event_search USING btree (event_id);
CREATE INDEX event_to_state_groups_sg_index ON public.event_to_state_groups USING btree (state_group);
CREATE INDEX local_media_repository_url_idx ON public.local_media_repository USING btree (created_ts) WHERE (url_cache IS NOT NULL);
DROP INDEX state_groups_state_id;
CREATE INDEX state_groups_state_type_idx ON public.state_groups_state USING btree (state_group, type, state_key);
CREATE INDEX user_ips_device_id ON public.user_ips USING btree (user_id, device_id, last_seen);
CREATE INDEX user_ips_last_seen ON public.user_ips USING btree (user_id, last_seen);
CREATE INDEX user_ips_last_seen_only ON public.user_ips USING btree (last_seen);
DROP INDEX user_ips_user_ip;
CREATE UNIQUE INDEX user_ips_user_token_ip_unique_index ON public.user_ips USING btree (user_id, access_token, ip);
CREATE INDEX users_creation_ts ON public.users USING btree (creation_ts);

After running this, the two schemas are exactly the same (except for the presence of the port_from_sqlite3 table, of course). This solved the problem for me. 🎉

@julianfoad
Copy link

Thank you, @huguesdk ! Can you also post how you compared the schemas and the expected pristine schema of 0.99.3, so I can check mine? (I have run your fixes, but I don't know what synapse version I was running when I migrated from sqlite, so the fixes I need might be different from yours.)

@huguesdk
Copy link

huguesdk commented Apr 9, 2019

Glad to help! I’m running PostgreSQL in a Docker container, so to dump the schema I use:

docker exec <container> pg_dump -s -d homeserver -U synapse > synapse_schema.sql

If you are not using Docker, just use the last part:

pg_dump -s -d homeserver -U synapse > synapse_schema.sql

I simply compared the two schemas with diff:

diff synapse_schema_expected.sql synapse_schema.sql

The expected schema is here. (It’s gzipped, because GitHub doesn’t support attaching .sql files.)

@marcotuna
Copy link

Thanks @huguesdk, had the same issue after migrating from Sqlite to Postgres, this solved my problem also.

@jayavanth
Copy link

jayavanth commented Apr 22, 2019

@huguesdk I get this error when I run the first query

synapse=# CREATE INDEX access_tokens_device_id ON public.access_tokens USING btree (user_id, device_id);
ERROR:  relation "access_tokens_device_id" already exists

@huguesdk
Copy link

@jayavanth: Was your database migrated from SQLite? To be sure what’s missing, dump the schema and diff it like explained above. What issue do you actually have?

@jayavanth
Copy link

jayavanth commented Apr 23, 2019

@huguesdk Here's the diff https://pastebin.com/hK3Nv5Dm. We can ignore the db username in the diff right?

(Posted this in Synapse Admins but it got lost in the threads)
I'm trying to purge older messages in a particular room. I can see the older messages in Riot but when I run

SELECT event_id FROM events WHERE type='m.room.message' AND received_ts<'1553290290477' AND room_id=<room_id>;

I get (0) rows

Even when I do SELECT event_id FROM events WHERE type='m.room.message' ; I only get like 1000 events. We've been using it for a long time so there's wayyyyyyyy more messages than that

So, that's why I think something is wrong with my port.

@huguesdk
Copy link

@jayavanth: Looking at your diff, it seems that the only difference (besides the port_from_sqlite3 table of course) is the owner name, so your schema seems correct.

I have no experience in purging older messages, and I know little about the database structure. Did you actually do any operation on the database to purge older messages? If you did and still see the older messages in Riot, maybe it’s because they are still in Riot’s cache? Did you try clearing the cache? In Riot web, you can do this by going to Settings > Help & About > Clear Cache and Reload.

@jayavanth
Copy link

@huguesdk No, I didn't do any operation to purge it. Just cleared the cache but I can still see the messages 🙁. What's a good place to ask this question? A new issue maybe?

@huguesdk
Copy link

@jayavanth: Are you trying to purge messages manually by making changes directly in the database? I would advice against this (unless you know the structure very well). Did you take a look at the Purge History API? Currently, the best place to ask questions about this is in the Synapse Admins room. Unfortunately, questions get quickly lost in the conversation, since there is no proper threading yet. I think that a forum (like Discourse for example) would be more appropriate for this, but currently there is none.

@jayavanth
Copy link

@huguesdk I'm using this script https://github.com/matrix-org/synapse/blob/master/contrib/purge_api/purge_history.sh which calls Purge History API. The script does a similar query to what I described above.

Totally agree with having different channels to avoid messages getting lost in the pile

@peterhoeg
Copy link

I have 2 missing indices, which cannot be created because of duplicate values:

psycopg2.IntegrityError: could not create unique index "device_lists_remote_cache_unique_id"
psycopg2.IntegrityError: could not create unique index "user_ips_user_token_ip_unique_index"

Is it safe to shut down synapse and then wipe the 2 tables as they seem to be cache tables only?

@babolivier babolivier self-assigned this May 30, 2019
@peterhoeg
Copy link

peterhoeg commented May 30, 2019

Further to my earlier message, I can confirm that the following made synapse properly create the indices automatically with no adverse effects (that I have seen) - but please see the next message for a warning on federated e2e encrypted chats:

DELETE FROM public.device_lists_remote_cache;
DELETE FROM public.device_lists_remote_extremeties;
DELETE FROM public.user_ips;

I hope I didn't break anything but so far it looks good.

@richvdh
Copy link
Member Author

richvdh commented May 30, 2019

DELETE FROM public.device_lists_remote_cache;
DELETE FROM public.device_lists_remote_extremeties;

Just as a warning to others tempted to try the same thing: this has the potential to break e2e messaging. If you're not using e2e over federation, you'll be fine. Otherwise, you get to keep both parts when it breaks ;)

@peterhoeg
Copy link

Do you by chance have a safe set of sql statements that can be run to clean things up? I am thinking cleaning up duplicates and the likes so that the indices can be created properly?

@Tokodomo
Copy link

Tokodomo commented Jun 2, 2019

@ara4n @richvdh @erikjohnston
I'm sorry guys, but the ongoing problems with synapse_port_db are outrageously ridiculous... How many people should ran into that knife? Can you please concentrate on fixing the critical bugs in the official codebase (like synapse_port_db)? Testing such sensible parts of Synapse is essential, isn't it? Remove that script from the official github repo (if it doesn't work like expected). Otherwise, please fix it. Luckily, I tried it beforehand (on a test system):

cat homeserver.log
...
2019-06-02 13:39:53,824 - synapse.storage.client_ips - 359 - ERROR - update_client_ips-853 - Failed to insert client IP (('@alice:nas', 'MDAxMWxvY2F0aW9uIG5hcwowMDEzaWRlbnRpZmllciBrZXkKMDAxMGNpZCBnZW4gPSAxCjAwMWRjaWQgdXNlcl9pZCA9IEBhbGljZTpuYXMKMDAxNmNpZCB0eXBlID0gYWNjZXNzCjAwMjFjaWQgbm9uY2UgPSBHNyprV3cwUU5YTkh1LlcrCjAwMmZzaWduYXR1cmUgf0HxJE-hAR0Trao_F6LlgqzpFjU4mZ4A40gymOKLveYK', '::1'), ('Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', 'VUONEXXEBP', 1559475590506)): InvalidColumnReference('there is no unique or exclusion constraint matching the ON CONFLICT specification\n')
2019-06-02 13:41:23,824 - synapse.storage.client_ips - 359 - ERROR - update_client_ips-871 - Failed to insert client IP (('@carol:nas', 'MDAxMWxvY2F0aW9uIG5hcwowMDEzaWRlbnRpZmllciBrZXkKMDAxMGNpZCBnZW4gPSAxCjAwMWRjaWQgdXNlcl9pZCA9IEBjYXJvbDpuYXMKMDAxNmNpZCB0eXBlID0gYWNjZXNzCjAwMjFjaWQgbm9uY2UgPSA9ak8tZmtnKkB-dUtPcVlxCjAwMmZzaWduYXR1cmUgCi9fTVCB-1JJNy6av5APpbjoTy2YMZ8zMeToN0rpa2AK', '::1'), ('Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', 'ATQMZGBFYH', 1559475681482)): InvalidColumnReference('there is no unique or exclusion constraint matching the ON CONFLICT specification\n')
2019-06-02 13:42:13,824 - synapse.storage.client_ips - 359 - ERROR - update_client_ips-881 - Failed to insert client IP (('@alice:nas', 'MDAxMWxvY2F0aW9uIG5hcwowMDEzaWRlbnRpZmllciBrZXkKMDAxMGNpZCBnZW4gPSAxCjAwMWRjaWQgdXNlcl9pZCA9IEBhbGljZTpuYXMKMDAxNmNpZCB0eXBlID0gYWNjZXNzCjAwMjFjaWQgbm9uY2UgPSBHNyprV3cwUU5YTkh1LlcrCjAwMmZzaWduYXR1cmUgf0HxJE-hAR0Trao_F6LlgqzpFjU4mZ4A40gymOKLveYK', '::1'), ('Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', 'VUONEXXEBP', 1559475729288)): InvalidColumnReference('there is no unique or exclusion constraint matching the ON CONFLICT specification\n')
2019-06-02 13:43:48,824 - synapse.storage.client_ips - 359 - ERROR - update_client_ips-900 - Failed to insert client IP (('@carol:nas', 'MDAxMWxvY2F0aW9uIG5hcwowMDEzaWRlbnRpZmllciBrZXkKMDAxMGNpZCBnZW4gPSAxCjAwMWRjaWQgdXNlcl9pZCA9IEBjYXJvbDpuYXMKMDAxNmNpZCB0eXBlID0gYWNjZXNzCjAwMjFjaWQgbm9uY2UgPSA9ak8tZmtnKkB-dUtPcVlxCjAwMmZzaWduYXR1cmUgCi9fTVCB-1JJNy6av5APpbjoTy2YMZ8zMeToN0rpa2AK', '::1'), ('Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', 'ATQMZGBFYH', 1559475824841)): InvalidColumnReference('there is no unique or exclusion constraint matching the ON CONFLICT specification\n')
2019-06-02 13:44:18,824 - synapse.storage.client_ips - 359 - ERROR - update_client_ips-906 - Failed to insert client IP (('@alice:nas', 'MDAxMWxvY2F0aW9uIG5hcwowMDEzaWRlbnRpZmllciBrZXkKMDAxMGNpZCBnZW4gPSAxCjAwMWRjaWQgdXNlcl9pZCA9IEBhbGljZTpuYXMKMDAxNmNpZCB0eXBlID0gYWNjZXNzCjAwMjFjaWQgbm9uY2UgPSBHNyprV3cwUU5YTkh1LlcrCjAwMmZzaWduYXR1cmUgf0HxJE-hAR0Trao_F6LlgqzpFjU4mZ4A40gymOKLveYK', '::1'), ('Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', 'VUONEXXEBP', 1559475854569)): InvalidColumnReference('there is no unique or exclusion constraint matching the ON CONFLICT specification\n')
cat pgsql_store/postgresql.log
2019-06-02 11:22:11.263 CEST [23641] LOG:  listening on IPv6 address "::1", port 54320
2019-06-02 11:22:11.263 CEST [23641] LOG:  listening on IPv4 address "127.0.0.1", port 54320
2019-06-02 11:22:11.329 CEST [23641] LOG:  listening on Unix socket "~/pgsql_store/.s.PGSQL.54320"
2019-06-02 11:22:11.448 CEST [23642] LOG:  database system was shut down at 2019-06-02 11:21:50 CEST
2019-06-02 11:22:11.492 CEST [23641] LOG:  database system is ready to accept connections
2019-06-02 11:38:11.504 CEST [28307] ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
2019-06-02 11:38:11.504 CEST [28307] STATEMENT:  INSERT INTO user_ips (user_id, access_token, ip, user_agent, device_id, last_seen) VALUES ('@bob:nas', 'MDAxMWxvY2F0aW9uIG5hcwowMDEzaWRlbnRpZmllciBrZXkKMDAxMGNpZCBnZW4gPSAxCjAwMWJjaWQgdXNlcl9pZCA9IEBib2I6bmFzCjAwMTZjaWQgdHlwZSA9IGFjY2VzcwowMDIxY2lkIG5vbmNlID0gQUhRKjJtS3dPRExEMkx6NAowMDJmc2lnbmF0dXJlILxzIZUbG2qcu4bTR0QSFi1-Gz_jqbUuPRYnkziqCfcdCg', '-', 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', 'LVVTBWVFVR', 1559468290864) ON CONFLICT (user_id, access_token, ip) DO UPDATE SET user_agent=EXCLUDED.user_agent, device_id=EXCLUDED.device_id, last_seen=EXCLUDED.last_seen
2019-06-02 11:38:11.510 CEST [28307] ERROR:  current transaction is aborted, commands ignored until end of transaction block
2019-06-02 11:38:11.510 CEST [28307] STATEMENT:  INSERT INTO user_ips (user_id, access_token, ip, user_agent, device_id, last_seen) VALUES ('@alice:nas', 'MDAxMWxvY2F0aW9uIG5hcwowMDEzaWRlbnRpZmllciBrZXkKMDAxMGNpZCBnZW4gPSAxCjAwMWRjaWQgdXNlcl9pZCA9IEBhbGljZTpuYXMKMDAxNmNpZCB0eXBlID0gYWNjZXNzCjAwMjFjaWQgbm9uY2UgPSBHNyprV3cwUU5YTkh1LlcrCjAwMmZzaWduYXR1cmUgf0HxJE-hAR0Trao_F6LlgqzpFjU4mZ4A40gymOKLveYK', '-', 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', 'VUONEXXEBP', 1559468291411) ON CONFLICT (user_id, access_token, ip) DO UPDATE SET user_agent=EXCLUDED.user_agent, device_id=EXCLUDED.device_id, last_seen=EXCLUDED.last_seen
2019-06-02 11:40:11.505 CEST [28306] ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
2019-06-02 11:40:11.505 CEST [28306] STATEMENT:  INSERT INTO user_ips (user_id, access_token, ip, user_agent, device_id, last_seen) VALUES ('@bob:nas', 'MDAxMWxvY2F0aW9uIG5hcwowMDEzaWRlbnRpZmllciBrZXkKMDAxMGNpZCBnZW4gPSAxCjAwMWJjaWQgdXNlcl9pZCA9IEBib2I6bmFzCjAwMTZjaWQgdHlwZSA9IGFjY2VzcwowMDIxY2lkIG5vbmNlID0gQUhRKjJtS3dPRExEMkx6NAowMDJmc2lnbmF0dXJlILxzIZUbG2qcu4bTR0QSFi1-Gz_jqbUuPRYnkziqCfcdCg', '-', 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', 'LVVTBWVFVR', 1559468411294) ON CONFLICT (user_id, access_token, ip) DO UPDATE SET user_agent=EXCLUDED.user_agent, device_id=EXCLUDED.device_id, last_seen=EXCLUDED.last_seen
2019-06-02 11:40:16.504 CEST [28307] ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
2019-06-02 11:40:16.504 CEST [28307] STATEMENT:  INSERT INTO user_ips (user_id, access_token, ip, user_agent, device_id, last_seen) VALUES ('@alice:nas', 'MDAxMWxvY2F0aW9uIG5hcwowMDEzaWRlbnRpZmllciBrZXkKMDAxMGNpZCBnZW4gPSAxCjAwMWRjaWQgdXNlcl9pZCA9IEBhbGljZTpuYXMKMDAxNmNpZCB0eXBlID0gYWNjZXNzCjAwMjFjaWQgbm9uY2UgPSBHNyprV3cwUU5YTkh1LlcrCjAwMmZzaWduYXR1cmUgf0HxJE-hAR0Trao_F6LlgqzpFjU4mZ4A40gymOKLveYK', '-', 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', 'VUONEXXEBP', 1559468411785) ON CONFLICT (user_id, access_token, ip) DO UPDATE SET user_agent=EXCLUDED.user_agent, device_id=EXCLUDED.device_id, last_seen=EXCLUDED.last_seen
2019-06-02 11:42:10.741 CEST [28301] ERROR:  duplicate key value violates unique constraint "device_uniqueness"
2019-06-02 11:42:10.741 CEST [28301] DETAIL:  Key (user_id, device_id)=(@alice:nas, VUONEXXEBP) already exists.
2019-06-02 11:42:10.741 CEST [28301] STATEMENT:  INSERT INTO devices (user_id, device_id, display_name) VALUES('@alice:nas', 'VUONEXXEBP', NULL)
2019-06-02 11:42:16.504 CEST [28308] ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
2019-06-02 11:42:16.504 CEST [28308] STATEMENT:  INSERT INTO user_ips (user_id, access_token, ip, user_agent, device_id, last_seen) VALUES ('@bob:nas', 'MDAxMWxvY2F0aW9uIG5hcwowMDEzaWRlbnRpZmllciBrZXkKMDAxMGNpZCBnZW4gPSAxCjAwMWJjaWQgdXNlcl9pZCA9IEBib2I6bmFzCjAwMTZjaWQgdHlwZSA9IGFjY2VzcwowMDIxY2lkIG5vbmNlID0gQUhRKjJtS3dPRExEMkx6NAowMDJmc2lnbmF0dXJlILxzIZUbG2qcu4bTR0QSFi1-Gz_jqbUuPRYnkziqCfcdCg', '-', 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', 'LVVTBWVFVR', 1559468531571) ON CONFLICT (user_id, access_token, ip) DO UPDATE SET user_agent=EXCLUDED.user_agent, device_id=EXCLUDED.device_id, last_seen=EXCLUDED.last_seen
2019-06-02 11:42:19.028 CEST [28306] ERROR:  duplicate key value violates unique constraint "device_uniqueness"
2019-06-02 11:42:19.028 CEST [28306] DETAIL:  Key (user_id, device_id)=(@bob:nas, LVVTBWVFVR) already exists.
2019-06-02 11:42:19.028 CEST [28306] STATEMENT:  INSERT INTO devices (user_id, device_id, display_name) VALUES('@bob:nas', 'LVVTBWVFVR', NULL)
2019-06-02 11:42:41.504 CEST [28301] ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
2019-06-02 11:42:41.504 CEST [28301] STATEMENT:  INSERT INTO user_ips (user_id, access_token, ip, user_agent, device_id, last_seen) VALUES ('@alice:nas', 'MDAxMWxvY2F0aW9uIG5hcwowMDEzaWRlbnRpZmllciBrZXkKMDAxMGNpZCBnZW4gPSAxCjAwMWRjaWQgdXNlcl9pZCA9IEBhbGljZTpuYXMKMDAxNmNpZCB0eXBlID0gYWNjZXNzCjAwMjFjaWQgbm9uY2UgPSBHNyprV3cwUU5YTkh1LlcrCjAwMmZzaWduYXR1cmUgf0HxJE-hAR0Trao_F6LlgqzpFjU4mZ4A40gymOKLveYK', '-', 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', 'VUONEXXEBP', 1559468561455) ON CONFLICT (user_id, access_token, ip) DO UPDATE SET user_agent=EXCLUDED.user_agent, device_id=EXCLUDED.device_id, last_seen=EXCLUDED.last_seen
2019-06-02 11:44:26.504 CEST [28297] ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
2019-06-02 11:44:26.504 CEST [28297] STATEMENT:  INSERT INTO user_ips (user_id, access_token, ip, user_agent, device_id, last_seen) VALUES ('@bob:nas', 'MDAxMWxvY2F0aW9uIG5hcwowMDEzaWRlbnRpZmllciBrZXkKMDAxMGNpZCBnZW4gPSAxCjAwMWJjaWQgdXNlcl9pZCA9IEBib2I6bmFzCjAwMTZjaWQgdHlwZSA9IGFjY2VzcwowMDIxY2lkIG5vbmNlID0gQUhRKjJtS3dPRExEMkx6NAowMDJmc2lnbmF0dXJlILxzIZUbG2qcu4bTR0QSFi1-Gz_jqbUuPRYnkziqCfcdCg', '-', 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', 'LVVTBWVFVR', 1559468666410) ON CONFLICT (user_id, access_token, ip) DO UPDATE SET user_agent=EXCLUDED.user_agent, device_id=EXCLUDED.device_id, last_seen=EXCLUDED.last_seen
2019-06-02 11:45:01.504 CEST [29593] ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
2019-06-02 11:45:01.504 CEST [29593] STATEMENT:  INSERT INTO user_ips (user_id, access_token, ip, user_agent, device_id, last_seen) VALUES ('@alice:nas', 'MDAxMWxvY2F0aW9uIG5hcwowMDEzaWRlbnRpZmllciBrZXkKMDAxMGNpZCBnZW4gPSAxCjAwMWRjaWQgdXNlcl9pZCA9IEBhbGljZTpuYXMKMDAxNmNpZCB0eXBlID0gYWNjZXNzCjAwMjFjaWQgbm9uY2UgPSBHNyprV3cwUU5YTkh1LlcrCjAwMmZzaWduYXR1cmUgf0HxJE-hAR0Trao_F6LlgqzpFjU4mZ4A40gymOKLveYK', '-', 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', 'VUONEXXEBP', 1559468697985) ON CONFLICT (user_id, access_token, ip) DO UPDATE SET user_agent=EXCLUDED.user_agent, device_id=EXCLUDED.device_id, last_seen=EXCLUDED.last_seen
...
2019-06-02 12:28:48.826 CEST [10630] ERROR:  current transaction is aborted, commands ignored until end of transaction block
2019-06-02 12:28:48.826 CEST [10630] STATEMENT:  INSERT INTO user_ips (user_id, access_token, ip, user_agent, device_id, last_seen) VALUES ('@alice:nas', 'MDAxMWxvY2F0aW9uIG5hcwowMDEzaWRlbnRpZmllciBrZXkKMDAxMGNpZCBnZW4gPSAxCjAwMWRjaWQgdXNlcl9pZCA9IEBhbGljZTpuYXMKMDAxNmNpZCB0eXBlID0gYWNjZXNzCjAwMjFjaWQgbm9uY2UgPSBHNyprV3cwUU5YTkh1LlcrCjAwMmZzaWduYXR1cmUgf0HxJE-hAR0Trao_F6LlgqzpFjU4mZ4A40gymOKLveYK', '::1', 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', 'VUONEXXEBP', 1559471328473) ON CONFLICT (user_id, access_token, ip) DO UPDATE SET user_agent=EXCLUDED.user_agent, device_id=EXCLUDED.device_id, last_seen=EXCLUDED.last_seen
2019-06-02 12:29:39.294 CEST [10629] ERROR:  duplicate key value violates unique constraint "device_uniqueness"
2019-06-02 12:29:39.294 CEST [10629] DETAIL:  Key (user_id, device_id)=(@bob:nas, KMOALBDPRY) already exists.
2019-06-02 12:29:39.294 CEST [10629] STATEMENT:  INSERT INTO devices (user_id, device_id, display_name) VALUES('@bob:nas', 'KMOALBDPRY', NULL)
2019-06-02 12:29:48.386 CEST [10630] ERROR:  duplicate key value violates unique constraint "device_uniqueness"
2019-06-02 12:29:48.386 CEST [10630] DETAIL:  Key (user_id, device_id)=(@alice:nas, VUONEXXEBP) already exists.
2019-06-02 12:29:48.386 CEST [10630] STATEMENT:  INSERT INTO devices (user_id, device_id, display_name) VALUES('@alice:nas', 'VUONEXXEBP', NULL)
2019-06-02 12:31:08.823 CEST [10629] ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
2019-06-02 12:31:08.823 CEST [10629] STATEMENT:  INSERT INTO user_ips (user_id, access_token, ip, user_agent, device_id, last_seen) VALUES ('@carol:nas', 'MDAxMWxvY2F0aW9uIG5hcwowMDEzaWRlbnRpZmllciBrZXkKMDAxMGNpZCBnZW4gPSAxCjAwMWRjaWQgdXNlcl9pZCA9IEBjYXJvbDpuYXMKMDAxNmNpZCB0eXBlID0gYWNjZXNzCjAwMjFjaWQgbm9uY2UgPSA9ak8tZmtnKkB-dUtPcVlxCjAwMmZzaWduYXR1cmUgCi9fTVCB-1JJNy6av5APpbjoTy2YMZ8zMeToN0rpa2AK', '::1', 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', 'ATQMZGBFYH', 1559471466274) ON CONFLICT (user_id, access_token, ip) DO UPDATE SET user_agent=EXCLUDED.user_agent, device_id=EXCLUDED.device_id, last_seen=EXCLUDED.last_seen
2019-06-02 12:31:18.823 CEST [10620] ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
2019-06-02 12:31:18.823 CEST [10620] STATEMENT:  INSERT INTO user_ips (user_id, access_token, ip, user_agent, device_id, last_seen) VALUES ('@alice:nas', 'MDAxMWxvY2F0aW9uIG5hcwowMDEzaWRlbnRpZmllciBrZXkKMDAxMGNpZCBnZW4gPSAxCjAwMWRjaWQgdXNlcl9pZCA9IEBhbGljZTpuYXMKMDAxNmNpZCB0eXBlID0gYWNjZXNzCjAwMjFjaWQgbm9uY2UgPSBHNyprV3cwUU5YTkh1LlcrCjAwMmZzaWduYXR1cmUgf0HxJE-hAR0Trao_F6LlgqzpFjU4mZ4A40gymOKLveYK', '::1', 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', 'VUONEXXEBP', 1559471476459) ON CONFLICT (user_id, access_token, ip) DO UPDATE SET user_agent=EXCLUDED.user_agent, device_id=EXCLUDED.device_id, last_seen=EXCLUDED.last_seen
...
2019-06-02 13:42:13.824 CEST [10620] ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
2019-06-02 13:42:13.824 CEST [10620] STATEMENT:  INSERT INTO user_ips (user_id, access_token, ip, user_agent, device_id, last_seen) VALUES ('@alice:nas', 'MDAxMWxvY2F0aW9uIG5hcwowMDEzaWRlbnRpZmllciBrZXkKMDAxMGNpZCBnZW4gPSAxCjAwMWRjaWQgdXNlcl9pZCA9IEBhbGljZTpuYXMKMDAxNmNpZCB0eXBlID0gYWNjZXNzCjAwMjFjaWQgbm9uY2UgPSBHNyprV3cwUU5YTkh1LlcrCjAwMmZzaWduYXR1cmUgf0HxJE-hAR0Trao_F6LlgqzpFjU4mZ4A40gymOKLveYK', '::1', 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', 'VUONEXXEBP', 1559475729288) ON CONFLICT (user_id, access_token, ip) DO UPDATE SET user_agent=EXCLUDED.user_agent, device_id=EXCLUDED.device_id, last_seen=EXCLUDED.last_seen
2019-06-02 13:43:48.824 CEST [11460] ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
2019-06-02 13:43:48.824 CEST [11460] STATEMENT:  INSERT INTO user_ips (user_id, access_token, ip, user_agent, device_id, last_seen) VALUES ('@carol:nas', 'MDAxMWxvY2F0aW9uIG5hcwowMDEzaWRlbnRpZmllciBrZXkKMDAxMGNpZCBnZW4gPSAxCjAwMWRjaWQgdXNlcl9pZCA9IEBjYXJvbDpuYXMKMDAxNmNpZCB0eXBlID0gYWNjZXNzCjAwMjFjaWQgbm9uY2UgPSA9ak8tZmtnKkB-dUtPcVlxCjAwMmZzaWduYXR1cmUgCi9fTVCB-1JJNy6av5APpbjoTy2YMZ8zMeToN0rpa2AK', '::1', 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', 'ATQMZGBFYH', 1559475824841) ON CONFLICT (user_id, access_token, ip) DO UPDATE SET user_agent=EXCLUDED.user_agent, device_id=EXCLUDED.device_id, last_seen=EXCLUDED.last_seen
2019-06-02 13:44:18.823 CEST [11458] ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
2019-06-02 13:44:18.823 CEST [11458] STATEMENT:  INSERT INTO user_ips (user_id, access_token, ip, user_agent, device_id, last_seen) VALUES ('@alice:nas', 'MDAxMWxvY2F0aW9uIG5hcwowMDEzaWRlbnRpZmllciBrZXkKMDAxMGNpZCBnZW4gPSAxCjAwMWRjaWQgdXNlcl9pZCA9IEBhbGljZTpuYXMKMDAxNmNpZCB0eXBlID0gYWNjZXNzCjAwMjFjaWQgbm9uY2UgPSBHNyprV3cwUU5YTkh1LlcrCjAwMmZzaWduYXR1cmUgf0HxJE-hAR0Trao_F6LlgqzpFjU4mZ4A40gymOKLveYK', '::1', 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', 'VUONEXXEBP', 1559475854569) ON CONFLICT (user_id, access_token, ip) DO UPDATE SET user_agent=EXCLUDED.user_agent, device_id=EXCLUDED.device_id, last_seen=EXCLUDED.last_seen

Finally, two sql dumps (if anybody is interested):
before_synapse_port_db_(sqlite3).zip
after_synapse_port_db_(pgsql).zip

@Tokodomo
Copy link

Tokodomo commented Jun 2, 2019

#4628 #5306 #5312

@z3ntu
Copy link
Contributor

z3ntu commented Jun 27, 2019

What's the recommended action to take after such a failed migration? Continue using the sqlite db or is it safe to use the "broken" postgres db?

@maakuth
Copy link

maakuth commented Jul 11, 2019

What's the recommended action to take after such a failed migration? Continue using the sqlite db or is it safe to use the "broken" postgres db?

I just applied @huguesdk's schema changes after running a few days with the wrong schema, after migrating from SQLite, so I'm very much about to find out.

@babolivier
Copy link
Contributor

I did some investigation and fiddling around this, but won't be able to go much further due to a lack of time to dedicate to it. Here's what we've decided needed to be done to fix this issue:

  • Make the script fail with an informative message if there are pending background updates in the SQLite DB
  • Once the PostgreSQL DB is created, make the script run all of the background updates on it before it begins inserting the data

To do the latter, we need to create a new class in Synapse that holds the code for all background updates but doesn't require access to the hs object, otherwise the script needs to instantiate one and it becomes quite trickier to manage. This should be doable, especially as a new full schema has been created in v1.0.0 (which effectively decreases the amount of background updates to run on a new DB).

@jptsroh
Copy link

jptsroh commented Aug 16, 2019

I have a similar problem. I cannot upload/attach files anymore due to missing indices. This however only manifested after upgrading to synapse v1.2.1. I would appreciate a quick fix, which reestablishes the mandatory db structure very much.

@maakuth
Copy link

maakuth commented Aug 16, 2019

@jptsroh did you try the SQL script in huguesdk's comment?

@jptsroh
Copy link

jptsroh commented Aug 17, 2019

After I chose the right db I could execute those sql statements. Now I can upload files again. Thanks!
However, the postgres error log still shows errors and now complains about duplicate entries:

ERROR: duplicate key value violates unique constraint "device_uniqueness"
2019-08-17 08:11:56.416 UTC [486] matrix_synapse@dopadb DETAIL: Key (user_id, device_id)=(@....) already exists.
STATEMENT: INSERT INTO devices (user_id, device_id, display_name) VALUES('@.....', 'xxx', NULL)

@babolivier
Copy link
Contributor

Good news, a fix to this is on its way: #6102! :)

babolivier added a commit that referenced this issue Oct 23, 2019
Make `synapse_port_db` correctly create indexes in the PostgreSQL database, by having it run the background updates on the database before migrating the data.

To ensure we're migrating the right data, also block the port if the SQLite3 database still has pending or ongoing background updates.

Fixes #4877
@babolivier
Copy link
Contributor

Fixed in #6102 (and #6243)! :)

@z3ntu
Copy link
Contributor

z3ntu commented Oct 23, 2019

Awesome! Which version will this be included in? 1.4.2, 1.5.0?

@babolivier
Copy link
Contributor

Awesome! Which version will this be included in? 1.4.2, 1.5.0?

1.5.0, which should be out in one or two weeks (don't take my word for it though)

@richvdh
Copy link
Member Author

richvdh commented Oct 29, 2019

I'm going to reopen this I'm afraid, because I feel like an important part of it (helping out all those people who have migrated to postgres at some point in the last 4 years) hasn't been done.

@richvdh richvdh reopened this Oct 29, 2019
@pstn
Copy link

pstn commented Dec 1, 2019

Since it might be related and help people out here: I also had wrong behavior. Namely read markers didn't work correctly and (way more dangerous for the network, if my instance wasn't the only one doing it), the timeout for destinations didn't get saved properly. Turns out public.destinations and public.room_account_data didn't have constraints in place that should have been there. I manually deleted the lines that broke the unique constraints and readded them. with
ALTER TABLE ONLY public.destinations ADD CONSTRAINT destinations_pkey PRIMARY KEY (destination);
and
ALTER TABLE ONLY public.room_account_data ADD CONSTRAINT room_account_data_uniqueness UNIQUE (user_id, room_id, account_data_type);

I apologize to anyone, my instance might have spammed while their instances were already struggling.

@richvdh Do you think that there is a way to find other instances with that behavior (if there are any) and notify the administrators of them?

@Ralith
Copy link
Contributor

Ralith commented Dec 1, 2019

A good way to do that would be to configure a future version of synapse to detect inconsistencies on first startup and correct them, or at least produce a suitably informative fatal error.

@richvdh
Copy link
Member Author

richvdh commented Dec 2, 2019

the initial plan here is to produce a document to help admins fix their databases. Hopefully we'll get tot that in the next few days.

Automated detection of the missing indexes might happen later but there are no current plans to implement that.

@pstn
Copy link

pstn commented Dec 2, 2019

I think a document to fix the indexes is appropriate, but I do think the bug I found for the missing constraints lead to a way more serious behavious since it floods the network of hosts that are already struggling to answer to requests in time. I think it would be appropriate to rather not start an intstance (at least without a switch), than just let it continue to do harm to the network.

@richvdh
Copy link
Member Author

richvdh commented Dec 24, 2019

I'll try and make some progress here.

Here are some schema dumps for various clean databases:

  • A fresh postgres database, created with synapse v1.7.2: full.172.sql.txt
  • A database created in sqlite and ported to postgres by synapse 1.4.1 (the last version with a broken synapse_port_db script), then upgraded to v1.7.2: full.ported_141.sql.txt
  • A database created in sqlite and ported to postgres by synapse 0.99.5 (the last version before we recreated the full schema file which synapse_port_db uses to initialise the postgres database), then upgraded to v1.7.2: full.ported_0995.sql.txt

@richvdh
Copy link
Member Author

richvdh commented Dec 24, 2019

Diffs between the ported database and the clean db are at https://gist.github.com/richvdh/f1d84edf7c3da1fce2347675dd3d55e5.

So, to fix a broken database, you should be able to run the following sql in psql:

CREATE INDEX IF NOT EXISTS access_tokens_device_id ON access_tokens USING btree (user_id, device_id);
CREATE INDEX IF NOT EXISTS current_state_events_member_index ON current_state_events USING btree (state_key) WHERE (type = 'm.room.member'::text);
CREATE INDEX IF NOT EXISTS device_inbox_stream_id_user_id ON device_inbox USING btree (stream_id, user_id);
DROP INDEX IF EXISTS device_inbox_stream_id;

CREATE UNIQUE INDEX IF NOT EXISTS device_lists_remote_cache_unique_id ON device_lists_remote_cache USING btree (user_id, device_id);
CREATE UNIQUE INDEX IF NOT EXISTS device_lists_remote_extremeties_unique_idx ON device_lists_remote_extremeties USING btree (user_id);

CREATE INDEX IF NOT EXISTS device_lists_stream_user_id ON device_lists_stream USING btree (user_id, device_id);
CREATE INDEX IF NOT EXISTS event_contains_url_index ON events USING btree (room_id, topological_ordering, stream_ordering) WHERE ((contains_url = true) AND (outlier = false));
CREATE INDEX IF NOT EXISTS event_push_actions_highlights_index ON event_push_actions USING btree (user_id, room_id, topological_ordering, stream_ordering) WHERE (highlight = 1);
CREATE INDEX IF NOT EXISTS event_push_actions_u_highlight ON event_push_actions USING btree (user_id, stream_ordering);
CREATE UNIQUE INDEX IF NOT EXISTS event_search_event_id_idx ON event_search USING btree (event_id);
CREATE INDEX IF NOT EXISTS event_to_state_groups_sg_index ON event_to_state_groups USING btree (state_group);
CREATE INDEX IF NOT EXISTS local_media_repository_url_idx ON local_media_repository USING btree (created_ts) WHERE (url_cache IS NOT NULL);

CREATE INDEX IF NOT EXISTS state_groups_state_type_idx ON state_groups_state USING btree (state_group, type, state_key);
DROP INDEX IF EXISTS state_groups_state_id;

CREATE UNIQUE INDEX IF NOT EXISTS user_ips_user_token_ip_unique_index ON user_ips USING btree (user_id, access_token, ip);
CREATE INDEX IF NOT EXISTS user_ips_device_id ON user_ips USING btree (user_id, device_id, last_seen);
CREATE INDEX IF NOT EXISTS user_ips_last_seen ON user_ips USING btree (user_id, last_seen);
CREATE INDEX IF NOT EXISTS user_ips_last_seen_only ON user_ips USING btree (last_seen);
DROP INDEX IF EXISTS user_ips_user_ip;

CREATE INDEX IF NOT EXISTS users_creation_ts ON users USING btree (creation_ts);

CREATE INDEX IF NOT EXISTS room_memberships_user_room_forgotten ON room_memberships USING btree (user_id, room_id) WHERE (forgotten = 1);

Note that some of these tables (especially state_groups_state!) may be large, and adding an index will lock the table, potentially making synapse hang while it waits for database operations to complete, so it may be best to either schedule synapse downtime, or to add CONCURRENTLY to the CREATE INDEX statements (in which case, be sure to run them individually rather than pasting all of the above at once).

Note also that some of the CREATE UNIQUE INDEX commands may fail if there are duplicates in the table. user_ips is particularly likely to be affected in this way; the easiest solution there may be simply to delete the existing data in the table (TRUNCATE TABLE user_ips;): its primary use is in populating the 'last seen IP address' in a user's device list so if that information is not valuable to you or your users, it can be deleted.

@richvdh
Copy link
Member Author

richvdh commented Dec 24, 2019

I'd welcome any feedback from administrators who have success or otherwise running the above.

@richvdh
Copy link
Member Author

richvdh commented Jan 2, 2020

so we think the above looks about right. It would be nice to figure out a way to either automate the repair process, or at least point people to the commands when necessary, but I'm going to consider that as a potential issue for the future and close this one.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
z-bug (Deprecated Label)
Projects
None yet
Development

No branches or pull requests