Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Synapse users lose access to all rooms after logging out and in again #11568

Closed
ginkel opened this issue Dec 13, 2021 · 15 comments
Closed

Synapse users lose access to all rooms after logging out and in again #11568

ginkel opened this issue Dec 13, 2021 · 15 comments

Comments

@ginkel
Copy link

ginkel commented Dec 13, 2021

Description

I was somewhat uncertain whether this may quailify as a support issue, but considering it involves catastrophic data loss, where the synapse instance is unable to locate rooms despite them being present in the database, I am inclined to open a bug report.

I am operating a Synapse (currently v1.48) instance along with an instance of mxisd. The trouble started when I migrated mxisd to ma1sd and - due to a misconfigured URL - Synapse could no longer reach ma1sd. This caused Element (Android) clients to be logged out.

Upon fixing the configuration error and logging in the client again, I was greeted with a room list that just contains one "Empty room" entry.

I do have a backup of the database and file system of the Matrix instance that pre-dates the configuration glitch.

Unfortunately, restoring it to that state does not remediate the situation, i.e., rooms are still gone.

I verified in the database that rooms are present in the rooms table and that the current_state_events table contains an m.room.create event for those rooms.

In the log I see messages like:

Exception: Room !QaxEOpfBhMxtIQrNBv:<redacted> for event $16386208139DWpYC:<redacted> is unknown

But what I wrote above about the room being present on the DB holds true for the room mentioned in the error message.

I can't rule out that the underlying problem has been lingering for a while and was only uncovered by the auth issues that logged out some clients. The Synapse instance is > 2 years old, so it has undergone all database migrations that occurred over time as the software evolved.

I'd appreciate your assistance to figure out what may be causing this and to eventually restore access for all users' rooms.

Thanks,
Thilo

P.S.: Most channels were encrypted if that makes any difference

Steps to reproduce

I am afraid that I do not have clear clear instructions how to reproduce the problem except from what I wrote above.

Version information

  • Homeserver: My privately hosted homeserver

If not matrix.org:

  • Version: {"server_version":"1.48.0","python_version":"3.8.12"}

  • Install method: Docker

  • Platform: Linux
@ginkel
Copy link
Author

ginkel commented Dec 13, 2021

Update: I have captured a pcap of the initial login process. The following requests are sent by the client:

image

Which request's response should contain the rooms of the user?

@ginkel ginkel changed the title All Synapse users lost access to all rooms after logging in again Synapse users lose access to all rooms after logging out and in again Dec 13, 2021
@richvdh
Copy link
Member

richvdh commented Dec 13, 2021

Which request's response should contain the rooms of the user?

/sync.

I suggest you share your homeserver logs, including the full exception - it's unlikely we'll be able to shed much light without them.

@ginkel
Copy link
Author

ginkel commented Dec 13, 2021

Thanks for your reply! Please find the server log at info level (w/ some IP addresses redacted) at: https://gist.github.com/ginkel/d954e86015f514df9a49a19dc0178806. The login operation (that ends up with a basically empty client) starts at time index 2021-12-13 23:04:55,879.

Please let me know if you need additional details / more verbose logs.

@ginkel
Copy link
Author

ginkel commented Dec 14, 2021

Please find the server log at info level

I can easily create a log at DEBUG level, but I'd welcome any hint which sensitive information I'd need to remove...

@DMRobertson
Copy link
Contributor

DMRobertson commented Dec 14, 2021

Thanks @ginkel. I'm surprised to read that restoring the DB from backup didn't resolve the problem.

Not much to go on in those logs, I'm afraid. No logging in the /sync responses, and they all ended up returning with status 200 OK. If possible, could you try logging out and in again on a client and capturing the server logs at DEBUG level? I'm particularly interested in the logs for the first call to /sync after a login (which should have no since= query parameter in the URL, or any sync that doesn't return with a 200 OK.

I can easily create a log at DEBUG level, but I'd welcome any hint which sensitive information I'd need to remove...

Don't have an authoritative answer for you here. You could try emailing them to me (davidr@element.io) or sending them over matrix (@dmrobertson:matrix.org), though I appreciate the latter is hard at the moment. I can see if there are any smoking guns.

@DMRobertson
Copy link
Contributor

DMRobertson commented Dec 14, 2021

Exception: Room !QaxEOpfBhMxtIQrNBv:<redacted> for event $16386208139DWpYC:<redacted> is unknown

I also wanted to note that this looks like a room an event ID for an old room version (it contains a homeserver name). I wonder if that's causing pain somewhere?

@ginkel
Copy link
Author

ginkel commented Dec 14, 2021

Thanks @ginkel. I'm surprised to read that restoring the DB from backup didn't resolve the problem.

Me too. ;-) I have run the Element Android client in a debugger and AFAICS room membership info gets synced to the client, but somehow has no effect:

image

I can easily create a log at DEBUG level, but I'd welcome any hint which sensitive information I'd need to remove...

Don't have an authoritative answer for you here. You could try emailing them to me (davidr@element.io) or sending them over matrix (@dmrobertson:matrix.org), though I appreciate the latter is hard at the moment. I can see if there are any smoking guns.

I sent you the debug log via Matrix.

@ginkel
Copy link
Author

ginkel commented Dec 14, 2021

Exception: Room !QaxEOpfBhMxtIQrNBv:<redacted> for event $16386208139DWpYC:<redacted> is unknown

I also wanted to note that this looks like a room an event ID for an old room version (it contains a homeserver name). I wonder if that's causing pain somewhere?

That room is actually version 1. But v6 rooms (that's the newest I have) also have the homeserver name on the room id.

@DMRobertson
Copy link
Contributor

I sent you the debug log via Matrix.

Thanks for this. I don't see an invite as such yet, perhaps your homeserver isn't federating due to this problem? I'll check again tomorrow, but it might be worth forwarding a copy over email too as a fallback.

@ginkel
Copy link
Author

ginkel commented Dec 14, 2021

@DMRobertson Weird... Anyway, I just sent you a copy via email. Thanks for your support! :-)

@richvdh
Copy link
Member

richvdh commented Dec 14, 2021

The logs you've sent to @DMRobertson might shed more light, but in the meantime it might be worth a quick sanity-check on your database. Could you try this query?

                SELECT room_id
                FROM current_state_events AS c
                INNER JOIN events AS e USING (room_id, event_id)
                WHERE
                    c.type = 'm.room.member'
                    AND c.state_key = '@tg:tgbyte.de'
                    AND c.membership = 'join';

It should return a list of rooms your server thinks your user is a member of.

@ginkel
Copy link
Author

ginkel commented Dec 14, 2021

Interesting. That query comes back empty for my user as well as any other user I tried...

current_state_events contains join events for plenty of room / user combinations, though. It's as if the event IDs didn't match...

@richvdh
Copy link
Member

richvdh commented Dec 14, 2021

iiinteresting. are you able to DM me? I'm @richvdh:sw1v.org.

@ginkel
Copy link
Author

ginkel commented Dec 14, 2021

As it turned out, a database locale problem caused JOINs not to work correctly after the restore. This has now been fixed. Thanks for your help!

@ginkel ginkel closed this as completed Dec 14, 2021
@richvdh
Copy link
Member

richvdh commented Dec 14, 2021

related: #6696

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants