Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Can't join #rust:matrix.org, violating constraint event_reference_hashes_event_id_algorithm_key #8323

Closed
MatMaul opened this issue Sep 16, 2020 · 15 comments
Labels
z-bug (Deprecated Label) z-p2 (Deprecated Label)

Comments

@MatMaul
Copy link
Contributor

MatMaul commented Sep 16, 2020

Description

I can't join #rust:matrix.org .
I am pretty certain I already joined it before and left it later on.

Steps to reproduce

  • try to join #rust:matrix.org
  • get an Internal server error
sept. 16 12:06:47 matrix-server synapse[229645]: 2020-09-16 10:06:47,419 - synapse.http.server - 81 - ERROR - POST-50930 - Failed handle request via 'JoinRoomAliasServlet': <XForwardedForRequest at 0x7f1dd87b8910 method='POST' uri='/_matrix/client/r0/join/%23rust%3Amatrix.org' clientproto='HTTP/1.1' site=8008>
sept. 16 12:06:47 matrix-server synapse[229645]: Traceback (most recent call last):
sept. 16 12:06:47 matrix-server synapse[229645]:   File "/usr/local/lib/python3.8/site-packages/synapse/http/server.py", line 230, in _async_render_wrapper
sept. 16 12:06:47 matrix-server synapse[229645]:     callback_return = await self._async_render(request)
sept. 16 12:06:47 matrix-server synapse[229645]:   File "/usr/local/lib/python3.8/site-packages/synapse/http/server.py", line 407, in _async_render
sept. 16 12:06:47 matrix-server synapse[229645]:     callback_return = await raw_callback_return
sept. 16 12:06:47 matrix-server synapse[229645]:   File "/usr/local/lib/python3.8/site-packages/synapse/rest/client/v1/room.py", line 320, in on_POST
sept. 16 12:06:47 matrix-server synapse[229645]:     await self.room_member_handler.update_membership(
sept. 16 12:06:47 matrix-server synapse[229645]:   File "/usr/local/lib/python3.8/site-packages/synapse/handlers/room_member.py", line 339, in update_membership
sept. 16 12:06:47 matrix-server synapse[229645]:     result = await self._update_membership(
sept. 16 12:06:47 matrix-server synapse[229645]:   File "/usr/local/lib/python3.8/site-packages/synapse/handlers/room_member.py", line 534, in _update_membership
sept. 16 12:06:47 matrix-server synapse[229645]:     remote_join_response = await self._remote_join(
sept. 16 12:06:47 matrix-server synapse[229645]:   File "/usr/local/lib/python3.8/site-packages/synapse/handlers/room_member.py", line 1085, in _remote_join
sept. 16 12:06:47 matrix-server synapse[229645]:     event_id, stream_id = await self.federation_handler.do_invite_join(
sept. 16 12:06:47 matrix-server synapse[229645]:   File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 1365, in do_invite_join
sept. 16 12:06:47 matrix-server synapse[229645]:     max_stream_id = await self._persist_auth_tree(
sept. 16 12:06:47 matrix-server synapse[229645]:   File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 2044, in _persist_auth_tree
sept. 16 12:06:47 matrix-server synapse[229645]:     await self.persist_events_and_notify(
sept. 16 12:06:47 matrix-server synapse[229645]:   File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 2926, in persist_events_and_notify
sept. 16 12:06:47 matrix-server synapse[229645]:     max_stream_id = await self.storage.persistence.persist_events(
sept. 16 12:06:47 matrix-server synapse[229645]:   File "/usr/local/lib/python3.8/site-packages/synapse/storage/persist_events.py", line 225, in persist_events
sept. 16 12:06:47 matrix-server synapse[229645]:     await make_deferred_yieldable(
sept. 16 12:06:47 matrix-server synapse[229645]: twisted.internet.defer.FirstError: FirstError[#0, [Failure instance: Traceback: <class 'psycopg2.errors.UniqueViolation'>: duplicate key value violates unique constraint "event_reference_hashes_event_id_algorithm_key"
sept. 16 12:06:47 matrix-server synapse[229645]: DETAIL:  Key (event_id, algorithm)=($1430066785567IObof:jki.re, sha256) already exists.
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/twisted/internet/defer.py:654:_runCallbacks
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/twisted/internet/defer.py:1475:gotResult
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/twisted/internet/defer.py:1416:_inlineCallbacks
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/twisted/python/failure.py:512:throwExceptionIntoGenerator
sept. 16 12:06:47 matrix-server synapse[229645]: --- <exception caught here> ---
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/synapse/storage/persist_events.py:148:handle_queue_loop
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/synapse/storage/persist_events.py:253:persisting_queue
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/synapse/storage/persist_events.py:425:_persist_events
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/events.py:168:_persist_events_and_state_updates
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:534:runInteraction
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:589:runWithConnection
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/twisted/python/threadpool.py:250:inContext
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/twisted/python/threadpool.py:266:<lambda>
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/twisted/python/context.py:122:callWithContext
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/twisted/python/context.py:85:callWithContext
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/twisted/enterprise/adbapi.py:306:_runWithConnection
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/twisted/python/compat.py:464:reraise
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/twisted/enterprise/adbapi.py:297:_runWithConnection
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:587:inner_func
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:429:new_transaction
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/synapse/logging/utils.py:71:wrapped
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/events.py:400:_persist_events_txn
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/events.py:946:_update_metadata_tables_txn
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/events.py:1091:_store_event_reference_hashes_txn
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:741:simple_insert_many_txn
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:215:executemany
sept. 16 12:06:47 matrix-server synapse[229645]: /usr/local/lib/python3.8/site-packages/synapse/storage/database.py:238:_do_execute

Version information

  • Version:
    1.20.0rc2

  • Install method:
    Docker

  • Platform:
    CentOS 8

@MatMaul
Copy link
Contributor Author

MatMaul commented Sep 16, 2020

Now getting spammed regularly in my logs with the same error, but for this event:
Key (event_id, algorithm)=($TBYij1X86poI2AreS6vd78aEHBvx7_Buu4Dk4ivjWrg, sha256) already exists.

Updated to 1.20.0rc4 since then.

@richvdh
Copy link
Member

richvdh commented Sep 21, 2020

related: #6561

@anoadragon453
Copy link
Member

possibly related: #8340

@anoadragon453
Copy link
Member

At first I thought this might be due to malformed events given recent happenings, but it looks like this has been a problem with v1 rooms in the past (see linked issues), which #rust:matrix.org is.

@richvdh notes that this is likely due to an entry in event_reference_hashes, but not in events/events_json.

For a temporary fix, you could just remove any references to $TBYij1X86poI2AreS6vd78aEHBvx7_Buu4Dk4ivjWrg in your event_reference_hashes table. A more long term solution would involve a proper fix, or in fact stop using this table altogether.

@anoadragon453 anoadragon453 added z-bug (Deprecated Label) z-p2 (Deprecated Label) labels Sep 21, 2020
@richvdh
Copy link
Member

richvdh commented Sep 21, 2020

@richvdh notes that this is likely due to an entry in event_reference_hashes, but not in events/events_json.

the question is, what could cause that?

@anoadragon453
Copy link
Member

anoadragon453 commented Sep 21, 2020

Well, the referenced event is in my DB (after joining and leaving #rust:matrix.org), and nothing really looks out of the ordinary with it:

synapse=# select * from events where event_id = '$TBYij1X86poI2AreS6vd78aEHBvx7_Buu4Dk4ivjWrg';
 stream_ordering | topological_ordering |                   event_id                   |     type      |            room_id             | content | unrecognized_keys | processed | outlier | depth | origin_server_ts |  received_ts  |            sender             | contains_url 
-----------------+----------------------+----------------------------------------------+---------------+--------------------------------+---------+-------------------+-----------+---------+-------+------------------+---------------+-------------------------------+--------------
         4352316 |                11949 | $TBYij1X86poI2AreS6vd78aEHBvx7_Buu4Dk4ivjWrg | m.room.member | !mjbDjyNsRXndKLkHIe:matrix.org |         |                   | t         | f       | 11949 |    1566200040292 | 1566200042231 | @freenode_h4ckz0n3:matrix.org | f
(1 row)

synapse=#
synapse=# select * from event_json where event_id = '$TBYij1X86poI2AreS6vd78aEHBvx7_Buu4Dk4ivjWrg';
                   event_id                   |            room_id             |               internal_metadata                |                                                                                                                                                                                                                                                                                                                                                                                                                  json                                                                                                                                                                                                                                                                                                                                                                                                                  | format_version 
----------------------------------------------+--------------------------------+------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------
 $TBYij1X86poI2AreS6vd78aEHBvx7_Buu4Dk4ivjWrg | !mjbDjyNsRXndKLkHIe:matrix.org | {"stream_ordering": 4352316, "outlier": false} | {"auth_events": ["$1xQcgZTO4XfR1D0-xZ5OAeqngk2QzyTVOoOVEUIWqV0", "$44vIat4gq6zdBgaaPeWPRZDjuU09wigV-dDKUhh4ffw", "$BdKKS-ChF_ULi15oPrfcm4YEpst8JhAWEQw6b4KzbR0"], "prev_events": ["$5G84YlpIwlI7I3uHiECiIni6ndx6XtwV7gAQr5attZE"], "origin_server_ts": 1566200040292, "depth": 11949, "hashes": {"sha256": "niUOs4ixFX1l5GmzS75MLfV6ETQGEtI5a5V6VUItpkU"}, "prev_state": [], "content": {"avatar_url": null, "membership": "join", "displayname": "h4ckz0n3"}, "sender": "@freenode_h4ckz0n3:matrix.org", "room_id": "!mjbDjyNsRXndKLkHIe:matrix.org", "signatures": {"matrix.org": {"ed25519:auto": "pbhsEZwwyXtXvkx1/XVd6oe9G9Y4N+SFKEIX14zYQIc3k6jC5xLHJ5zwqQ257z42h0S7Kdt0ZRMpp7ZIGQ9EDA"}}, "type": "m.room.member", "unsigned": {"age_ts": 1566200040292}, "state_key": "@freenode_h4ckz0n3:matrix.org", "origin": "matrix.org"} |              3

I also had no troubles rejoining #rust:matrix.org in the hopes of reproducing the bug.

I suppose looking around where event_reference_hashes is inserted into:

# Insert event_reference_hashes table.
self._store_event_reference_hashes_txn(
txn, [event for event, _ in events_and_contexts]
)

It would be worthwhile to check whether the state_events and room_membership tables contain this event.

self._store_event_txn(txn, events_and_contexts=events_and_contexts)
is the point in the chain where this should be getting inserted into events and event_json.

And tbh all of this happens in one transaction, so it's more likely that the data is getting removed from events and events_json somehow then it is that it never made it in in the first place.

I wonder whether the server is somehow deleting events from event and event_json on leave... or perhaps when purging rooms? It would make sense that #rust:matrix.org would be purged from the DB as it's a massive room (>6000 members) that's attached to Freenode, so lots of state events.

OK, well this looks plausible. The event_reference_hashes table does not have a room_id column, yet when we purge a room we delete event IDs from tables using a room_id on each table:

for table in (
"event_auth",
"event_edges",
"event_push_actions_staging",
"event_reference_hashes",
"event_relations",
"event_to_state_groups",
"redactions",
"rejections",
"state_events",
):
logger.info("[purge] removing %s from %s", room_id, table)
txn.execute(
"""
DELETE FROM %s WHERE event_id IN (
SELECT event_id FROM events WHERE room_id=?
)
"""
% (table,),
(room_id,),
)

Edit: nevermind, I can't read. That query is asking for room_id on the events table, which certainly does exist.

Additionally, I tried leaving #rust:matrix.org on my homeserver, purging it and then rejoining. No problems.

@anoadragon453
Copy link
Member

@MatMaul could you run the following queries for us please?

select * from events where event_id = '$TBYij1X86poI2AreS6vd78aEHBvx7_Buu4Dk4ivjWrg';
select * from event_json where event_id = '$TBYij1X86poI2AreS6vd78aEHBvx7_Buu4Dk4ivjWrg';

@MatMaul
Copy link
Contributor Author

MatMaul commented Sep 21, 2020

@anoadragon453 no luck, not there anymore.
Perhaps some leftover of a fixed bug in purge ?

@richvdh
Copy link
Member

richvdh commented Sep 21, 2020

AFAIK we have never had a bug in the purge code around failing to remove entries in event_reference_hashes.

@MatMaul do you happen still to have logs from when the failure happened?

@MatMaul
Copy link
Contributor Author

MatMaul commented Sep 21, 2020

You mean the potential purge or the failed join ? The purge for sure not, I don't even remember doing it. The failed join that's easy to reproduce but I think I got you all the interesting logs already.

@richvdh
Copy link
Member

richvdh commented Sep 22, 2020

the (first) failed join.

@MatMaul
Copy link
Contributor Author

MatMaul commented Sep 23, 2020

It's out of reach unfortunately :/

@richvdh
Copy link
Member

richvdh commented Sep 24, 2020

yeah; I don't think we're going to get to the bottom of this one. Suggest you remove the offending rows from event_reference_hashes and we move on. In general any entry in event_reference_hashes without a corresponding row in events shouldn't be there and can be removed.

@MatMaul
Copy link
Contributor Author

MatMaul commented Sep 24, 2020

Done, closing then, thx both for having a look.

@richvdh
Copy link
Member

richvdh commented Dec 9, 2020

a note for the record: one thing that can cause this is running synapse_janitor.sql from https://github.com/xwiki-labs/synapse_scripts, since that script doesn't correctly remove entries from event_reference_hashes.

A sql command to clean up the mess is:

DELETE FROM event_reference_hashes AS refs WHERE NOT EXISTS (SELECT 1 FROM events AS es WHERE es.event_id = refs.event_id);

(The above should be safe to run, though as ever when doing this sort of operation, having a backup is recommended)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
z-bug (Deprecated Label) z-p2 (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

3 participants