Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Login fails after update from 1.25.0 to 1.26.0 - syncing forever ... many rooms #9264

Closed
greinick opened this issue Jan 29, 2021 · 24 comments
Closed
Assignees
Labels
S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Release-Blocker Must be resolved before making a release

Comments

@greinick
Copy link

greinick commented Jan 29, 2021

Description

We did an update to our system today from V 1.25.0 to the recent 1.26.0. Since that time no Users can log in anymore.

Still logged in users do not see any problem. Even after a restart of synapse.

Steps to reproduce

Point the webbrowser (Firefox or safari Mac, or FF Linux) to the element webpage or open the element app on Mac, linux, iOS and point it to the URL.

Enter the login username and password for an AD user.

In the webbrowser I get the message: "Syncing, If you've joined lots of rooms, this might take a while."
On the iOS iPad I get a spinning logo.

I'd expect to be logged in an beeing granted access to all my room etc.

Version information

  • Homeserver:

matrix.dangermouse.filmakademie.de

  • Version:

{"server_version":"1.26.0","python_version":"3.6.9"}

  • Install method:

install: apt install matrix-synapse-py3
update: apt upgrade

  • Platform:

Ubuntu 18.04.5 LTS

Virtual server.

Note:

Nevertheless: Amazing project, great Work !!!! - Kowtow

@greinick greinick changed the title Login fails after update from 1.25.0 to 1.26.0 - syncing forever ... joined many rooms Login fails after update from 1.25.0 to 1.26.0 - syncing forever ... many rooms Jan 29, 2021
@JimmyPesto
Copy link

JimmyPesto commented Jan 30, 2021

I can confirm the same problem.
While the update provided a new homeserver.yaml I decided to keep my old config file. Do I need to update the yaml file?

Here are some logs that came up while trying to log in with #username:example.org.
The FluffyChat Session with the incorrectly configured pusher WARNING is still active but login with Electron / Element fails.

021-01-31 12:34:03,401 - synapse.app.homeserver - 163 - INFO - None - Synapse now listening on TCP port 8008
2021-01-31 12:34:03,405 - synapse.storage.background_updates - 110 - INFO - background_updates-0 - Starting background schema updates
2021-01-31 12:34:03,405 - synapse.handlers.deactivate_account - 212 - INFO - user_parter_loop-0 - User parter finished: stopping
2021-01-31 12:34:03,468 - synapse.push.pusherpool - 333 - WARNING - start_pushers-0 - Pusher incorrectly configured id=23, user=@username:example.org, appid=fluffychat.christianpauly_fluffychat, pushkey=SomeKeyHere: 'url' must have a path of '/_matrix/push/v1/notify'
2021-01-31 12:34:03,473 - synapse.push.pusherpool - 308 - INFO - start_pushers-0 - Started pushers
2021-01-31 12:34:04,407 - synapse.storage.background_updates - 124 - INFO - background_updates-0 - No more background updates to do. Unscheduling background update task.
2021-01-31 12:36:28,275 - synapse.metrics - 576 - INFO - None - Collecting gc 1
2021-01-31 12:37:15,153 - synapse.access.http.8008 - 316 - INFO - GET-0 - - - 8008 - {None} Processed request: 0.001sec/-0.000sec (0.001sec, 0.000sec) (0.000sec/0.000sec/0) 54B 200 "GET /.well-known/matrix/client HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.7.17 Chrome/<IP_ADDR> Electron/10.2.0 Safari/537.36" [0 dbevts]
2021-01-31 12:37:15,183 - synapse.access.http.8008 - 316 - INFO - GET-1 - - - 8008 - {None} Processed request: 0.000sec/-0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 353B 200 "GET /_matrix/client/versions HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.7.17 Chrome/<IP_ADDR> Electron/10.2.0 Safari/537.36" [0 dbevts]
2021-01-31 12:37:16,738 - synapse.access.http.8008 - 316 - INFO - GET-2 - - - 8008 - {None} Processed request: 0.001sec/-0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 54B 200 "GET /.well-known/matrix/client HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.7.17 Chrome/<IP_ADDR> Electron/10.2.0 Safari/537.36" [0 dbevts]
2021-01-31 12:37:16,768 - synapse.access.http.8008 - 316 - INFO - GET-3 - - - 8008 - {None} Processed request: 0.000sec/-0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 353B 200 "GET /_matrix/client/versions HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.7.17 Chrome/<IP_ADDR> Electron/10.2.0 Safari/537.36" [0 dbevts]
2021-01-31 12:37:16,784 - synapse.access.http.8008 - 316 - INFO - GET-4 - - - 8008 - {None} Processed request: 0.000sec/-0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 353B 200 "GET /_matrix/client/versions HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.7.17 Chrome/<IP_ADDR> Electron/10.2.0 Safari/537.36" [0 dbevts]
2021-01-31 12:37:16,801 - synapse.access.http.8008 - 316 - INFO - GET-5 - - - 8008 - {None} Processed request: 0.000sec/-0.000sec (0.001sec, 0.000sec) (0.000sec/0.000sec/0) 97B 200 "GET /_matrix/client/r0/login HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.7.17 Chrome/<IP_ADDR> Electron/10.2.0 Safari/537.36" [0 dbevts]
2021-01-31 12:37:24,325 - synapse.access.http.8008 - 316 - INFO - GET-6 - - - 8008 - {None} Processed request: 0.001sec/-0.000sec (0.001sec, 0.000sec) (0.000sec/0.000sec/0) 353B 200 "GET /_matrix/client/versions HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.7.17 Chrome/<IP_ADDR> Electron/10.2.0 Safari/537.36" [0 dbevts]
2021-01-31 12:37:24,357 - synapse.rest.client.v1.login - 191 - INFO - POST-8 - Got login request with identifier: {'type': 'm.id.user', 'user': 'username'}, medium: None, address: None, user: None
2021-01-31 12:37:24,724 - synapse.handlers.auth - 800 - INFO - POST-8 - Logging in user @username:example.org on device GZJEZDGEDB
2021-01-31 12:37:24,730 - synapse.access.http.8008 - 316 - INFO - POST-8 - - - 8008 - {None} Processed request: 0.373sec/-0.000sec (0.352sec, 0.000sec) (0.006sec/0.005sec/18) 448B 200 "POST /_matrix/client/r0/login HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.7.17 Chrome/<IP_ADDR> Electron/10.2.0 Safari/537.36" [0 dbevts]
2021-01-31 12:39:03,405 - synapse.storage.databases.main.metrics - 279 - INFO - generate_user_daily_visits-0 - Calling _generate_user_daily_visits
2021-01-31 12:41:33,275 - synapse.metrics - 576 - INFO - None - Collecting gc 1
2021-01-31 12:44:03,265 - synapse.storage.databases.main.event_push_actions - 530 - INFO - event_push_action_stream_orderings-0 - Searching for stream ordering 1 month ago
2021-01-31 12:44:03,266 - synapse.storage.databases.main.event_push_actions - 535 - INFO - event_push_action_stream_orderings-0 - Found stream ordering 1 month ago: it's 2
2021-01-31 12:44:03,266 - synapse.storage.databases.main.event_push_actions - 537 - INFO - event_push_action_stream_orderings-0 - Searching for stream ordering 1 day ago
2021-01-31 12:44:03,267 - synapse.storage.databases.main.event_push_actions - 542 - INFO - event_push_action_stream_orderings-0 - Found stream ordering 1 day ago: it's 1381
2021-01-31 12:44:03,405 - synapse.storage.databases.main.metrics - 279 - INFO - generate_user_daily_visits-1 - Calling _generate_user_daily_visits

@thomasgraf
Copy link

I have the problem with version 1.26.0 a new installation on a new home server domain.
I generated the config with the new version.

Homeserver:
matrix.waldkirch-chat.de

Version:
Python: 3.8.5
Synapse: 1.26.0

Install method:
pip install matrix-server

Database:
postgres

OS:
Ubuntu 20.04.2 LTS

The problem occurs on element android clients and webcleints.

@thomasgraf
Copy link

An update to my problem. The same problem occurs when using 1.25.0.

I downgraded and started always with fresh config and database.

With 1.24.0 the problem was gone.

@JimmyPesto
Copy link

Thanks for the update!

Is it possible to apply the Rolling back to v1.25.0 after a failed upgrade database commands and install 1.24 to go directly backwards to a working state?

@menschentier
Copy link

Maybe noone needs to revert back to older server versions - I had the same problem and it was solved by correcting a misconfigured homeserver.yaml configuration.

  1. My configuration is debian but I either had server_name uncommented by accident or the update uncommented this line.
    Line above says: # This is set in /etc/matrix-synapse/conf.d/server_name.yaml for Debian installations.

  2. public_baseurl was https://servername but had to be https://servername:8448.

After I changed these two settings, logging in on new element devices (including nextcloud client) worked again.

Hope this helps anyone...

Kind regards

Simon

@kovrom
Copy link

kovrom commented Feb 1, 2021

Same behavior. @menschentier suggestion was not applicable/didn't help due to the new installation.
New install of Synapse: 1.26.0 on Ubuntu 20.04 LTS

@menschentier
Copy link

What I found while diagnosing the error was this here:
element-hq/element-android#983
And actually the above mentioned solution is mentioned there as well.

What I tried before (and what is still active in my current working configuration) is setting filter_timeline_limit down to 25. I'm not much of a developer, but it would make sense as this login process seems to run into a timeout after around 90 seconds (nextcloud app).

@greinick
Copy link
Author

greinick commented Feb 1, 2021

What I found while diagnosing the error was this here:
vector-im/element-android#983
And actually the above mentioned solution is mentioned there as well.

What I tried before (and what is still active in my current working configuration) is setting filter_timeline_limit down to 25. I'm not much of a developer, but it would make sense as this login process seems to run into a timeout after around 90 seconds (nextcloud app).

This did not help on our installation.

@greinick
Copy link
Author

greinick commented Feb 1, 2021

Maybe noone needs to revert back to older server versions - I had the same problem and it was solved by correcting a misconfigured homeserver.yaml configuration.

  1. My configuration is debian but I either had server_name uncommented by accident or the update uncommented this line.
    Line above says: # This is set in /etc/matrix-synapse/conf.d/server_name.yaml for Debian installations.
  2. public_baseurl was https://servername but had to be https://servername:8448.

After I changed these two settings, logging in on new element devices (including nextcloud client) worked again.

Hope this helps anyone...

Kind regards

Simon

This did not help on our installation.

@richvdh
Copy link
Member

richvdh commented Feb 1, 2021

2. public_baseurl was https://servername but had to be https://servername:8448.

A change introduced in 1.26.0 is that public_baseurl is now set by default (it defaults to https://servername). We thought this would be a harmless change but it's certainly possible that it will break login for installations where that is not correct.

So I do recommend that people double-check their public_baseurl setting. It is likely that https://servername:8448 will not be correct for most people either.

@kovrom
Copy link

kovrom commented Feb 1, 2021

Uncommenting public_baseurl, which is commented out by default, fixes the issue. public_baseurl should be uncommented and set to matrix.domain.com or whereever your synapse lives.

@greinick
Copy link
Author

greinick commented Feb 1, 2021

2. public_baseurl was https://servername but had to be https://servername:8448.

A change introduced in 1.26.0 is that public_baseurl is now set by default (it defaults to https://servername). We thought this would be a harmless change but it's certainly possible that it will break login for installations where that is not correct.

So I do recommend that people double-check their public_baseurl setting. It is likely that https://servername:8448 will not be correct for most people either.

I double cross checked my public_baseurl setting and now it is pointing to the right URL! Sometimes you have to look twice!

Thx. works for me now.

@greinick greinick closed this as completed Feb 1, 2021
@JimmyPesto
Copy link

2. public_baseurl was https://servername but had to be https://servername:8448.

A change introduced in 1.26.0 is that public_baseurl is now set by default (it defaults to https://servername). We thought this would be a harmless change but it's certainly possible that it will break login for installations where that is not correct.

So I do recommend that people double-check their public_baseurl setting. It is likely that https://servername:8448 will not be correct for most people either.

Solved the issue for me too. Thanks!

@aaronraimist
Copy link
Contributor

I don't think this issue should be closed. Lots of people are running in to this. UPGRADE.rst should be updated to mention this change.

@greinick
Copy link
Author

greinick commented Feb 3, 2021

I don't think this issue should be closed. Lots of people are running in to this. UPGRADE.rst should be updated to mention this change.

Good point on updating the UPGRADE.rst. I closed the ticket because the issue as such is solved. I reopen it, so may be someone could update the docs.

@richvdh
Copy link
Member

richvdh commented Feb 3, 2021

I agree, this does need addressing.

The reason it is breaking existing installations is that the response to /login includes the URI of the homeserver if public_baseurl is set. If there is no URI in the /login response, clients default to using the same URI as they used for login. Now that public_baseurl is always set, that URI is always included - but the default may be wrong, hence clients are being directed to the wrong place.

So, one solution might be to only include the URI in the login response if public_baseurl is set explicitly, rather than inferred. However, that feels a bit magical, and to be honest having an incorrect public_baseurl is probably going to bring you problems with other features later on. So on balance I think I prefer updating the docs.

We would probably need to update the INSTALL.md doc as well as UPGRADE.rst.

@Nourio
Copy link

Nourio commented Feb 3, 2021

This change broke my specific setup as well, but I'm confused as to what I should be adding to public_baseurl. I don't own a domain name and I host a server just for me and small group of friends. My friends access it via my external IP and I access it via my LAN IP.

To clarify, I've had public_baseurl commented out this whole time and my setup worked fine.

@callahad callahad added the X-Release-Blocker Must be resolved before making a release label Feb 3, 2021
@callahad
Copy link
Contributor

callahad commented Feb 3, 2021

We need to make a decision on this before releasing 1.27.

Obvious options:

  • Revert the change
  • Add documentation, do nothing else
  • Fail loudly if unset, making it required for startup (would also need to verify our setup docs).

@callahad callahad closed this as completed Feb 3, 2021
@callahad callahad reopened this Feb 3, 2021
@callahad
Copy link
Contributor

callahad commented Feb 3, 2021

This was introduced in #9159

@richvdh
Copy link
Member

richvdh commented Feb 3, 2021

Revert the change

note that a bunch of later PRs have built on top of the change, so this is a somewhat intrusive option.

@callahad
Copy link
Contributor

callahad commented Feb 3, 2021

We believe we'll want to move toward making this mandatory in the long term. We will need to determine a graceful way to do that. In the shorter term, we need to find a way to unbreak users in the common case when upgrading.

Regardless, we will retroactively add a note to the upgrade docs.

@callahad
Copy link
Contributor

callahad commented Feb 3, 2021

Our current plan is to attempt to revert this and figure out how painful that actually is.

If it turns to be too painful, we'll go with temporarily avoiding well-known generation when public_baseurl is not explicitly set.

@clokep clokep self-assigned this Feb 3, 2021
@erikjohnston erikjohnston added S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Feb 4, 2021
@petrov-adg
Copy link

Same problem. Uncommenting public_baseurl was the solution

@clokep
Copy link
Member

clokep commented Feb 11, 2021

Should be fixed in v1.27.0rc2.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Release-Blocker Must be resolved before making a release
Projects
None yet
Development

No branches or pull requests