-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Pintobyte.com regularly receives thundering herds of presence EDUs which badly overloads it. #2514
Comments
This is still happening, and it's really quite bizarre. I think there are two problems: synapse failing to handle spikes of EDU traffic well... but also the fact we're seeing weird spikes in presence traffic too. Jeremy has shared logs of this happening at DEBUG, and looking at the pattern of RX and TX federation traffic, we see huge spikes in RXs during which CPU goes to 100% and the reactor melts:
Looking at the biggest RX spike, this is 27(!) fed /sends arriving at 15:54:15:
Most of these presence EDUs are nothing remotely interesting - users like cadair happening to be offline and last active 1.1 days ago. I have no idea what is causing this thundering herd. @erikjohnston any idea? @leonerd any chance this rings a bell from your original implementation? |
having had a very cursory look at the code, the presence handler has synapse/synapse/handlers/presence.py Lines 1270 to 1276 in eaaabc6
|
Didn't this get fixed? |
not afaik. |
it may also be a dup of #1324 |
assuming this is caused by #3962, it's fixed. |
The box apparently spikes CPU when this happens, and the logs look like:
However, in the preceding 15s, there are pauses of 2-3s in the logs. It looks like the reactor is getting wedged on something, but no idea what, nor does it seem to be covered by a log context. It's not getting stuck behind a linearizer lock seemingly.
I'm wondering if there's some queuing going on in processing the requests such that they suddenly all end up getting processed at the same point. I vaguely remember seeing this when investigating the Christmas meltdown - that Twisted has some kind of internal queue where requests start to stack up if the reactor is too busy?
The text was updated successfully, but these errors were encountered: