Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Federation sender doesn't start federating for 20+ minutes after coming up #3852

Closed
turt2live opened this issue Sep 12, 2018 · 3 comments
Closed
Labels
z-p2 (Deprecated Label)

Comments

@turt2live
Copy link
Member

The logs indicate that it is acquiring a ton of locks for joined room hosts, and this is observed on t2bot.io. Given those two variables, it's probably voyager or something sending some sort of event right at startup that causes the whole thing to deadlock federation while it calculates hosts in all 3000 rooms.

@richvdh richvdh self-assigned this Oct 2, 2018
@richvdh
Copy link
Member

richvdh commented Oct 2, 2018

can you share logs demonstrating the problem pleeeease?

@richvdh
Copy link
Member

richvdh commented Oct 2, 2018

Having spent a while looking at @turt2live's logs and metrics, I have the following conclusions:

The outgoing transaction rate doesn't actually drop to zero, but certainly it sits around 1Hz (with the odd spike) for some time after restart, compared to a more normal rate of around 30-40Hz.

However, if you exclude transactions which do not contain any events (ie, they are only presence, to_device messages, device list updates, etc), then the transaction rate is consistently 1Hz, with no significant drop after a restart.

My suspicion is therefore that what we're actually seeing is the server transmitting unnecessary presence storms as per #3962 most of the time. Accordingly my inclination is to fix #3962 first, and then see how this looks afterwards.

@neilisfragile neilisfragile added the z-p2 (Deprecated Label) label Oct 5, 2018
@richvdh richvdh removed their assignment Oct 10, 2018
@turt2live
Copy link
Member Author

#3962 fixed this

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
z-p2 (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

4 participants