Suspending accounts cause huge delays in federation #9377

ChatonneLibertaire · 2018-11-27T16:43:56Z

Expected behaviour

Suspending accounts should be smooth and shouldn't create any delays in the federation

Actual behaviour

anarchism.space is currently (as with many other instances) a spambot surge. However suspending them makes my instances to have huge delays in terms of posting things to the federation (not receiving). The CPU is fine (8 cores working at 5% each), the nginx log is filled with attempted accesses to the suspended account from all the other instances I'm federated with (I'm not sure if it's the cause of this but maybe)

Steps to reproduce the problem

Have a decently federated instance
Suspend >20 accounts
Send a DM to someone on another instance
Wait 30 minutes for the DM to be received

Specifications

Mastodon: 2.6.1

Gargron · 2018-11-27T18:31:34Z

Would I be correct to guess that you participate in a relay, and do not have proxy caching configured in nginx?

ChatonneLibertaire · 2018-11-27T19:48:08Z

@Gargron what do you mean by relay? Also, I have exactly the same nginx configuration as noted in the old Production guide (https://github.com/tootsuite/documentation/blob/master/Running-Mastodon/Production-guide.md#nginx-configuration) and the default configuration otherwise, which I guess doesn't have proxy caching?

Gargron · 2018-11-27T20:01:26Z

Compare your configuration with: https://github.com/tootsuite/mastodon/blob/master/dist/nginx.conf

I don't see a way for new accounts to cause requests from other servers unless they get followers from those servers or you are using a relay service which broadcasts their posts. So new accounts are likely unrelated to your issues.

ChatonneLibertaire · 2018-11-27T22:46:15Z

so, indeed there was no proxy caching. I patched the nginx config, restarted nginx, retried to suspend just one ad-bot (no followers, no toots, nothing) and I get flooded with requests from all the federation or something. This is an example of the line I get in the nginx log:

[27/Nov/2018:22:43:37 +0000] "GET /users/<bot name> HTTP/1.1" 410 36 "-" "http.rb/3.3.0 (Mastodon/2.6.2; +<mastodon instance>)"

And the same behavior stays when I send a message to another instance (i.e. very very long delay)

Gargron · 2018-11-27T23:31:57Z

Ah, I know what's happening. You're right after all. When an account is deleted, we want to make sure that everyone deletes it. So, we forward the delete to every known server. Ironically, for new accounts, most servers don't know them, and have to look them up to get the public key to even read the delete message.

That is a consequence of #8305

nightpool · 2018-11-27T23:48:21Z

why would slow web workers cause a sidekiq backup though??

…

On Tue, Nov 27, 2018, 6:31 PM Eugen Rochko ***@***.***> wrote: Ah, I know what's happening. You're right after all. When an account is deleted, we want to make sure that everyone deletes it. So, we forward the delete to every known server. Ironically, for new accounts, most servers don't know them, and have to look them up to get the public key to even read the delete message. That is a consequence of #8305 <#8305> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#9377 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAORV2xMcUSdt_QjiOWMComgo8gY6Zmiks5uzctvgaJpZM4Y15PO> .

nightpool · 2018-11-27T23:50:15Z

something something stoplight something something?

…

On Tue, Nov 27, 2018, 6:31 PM Eugen Rochko ***@***.***> wrote: Ah, I know what's happening. You're right after all. When an account is deleted, we want to make sure that everyone deletes it. So, we forward the delete to every known server. Ironically, for new accounts, most servers don't know them, and have to look them up to get the public key to even read the delete message. That is a consequence of #8305 <#8305> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#9377 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAORV2xMcUSdt_QjiOWMComgo8gY6Zmiks5uzctvgaJpZM4Y15PO> .

ClearlyClaire · 2018-11-27T23:58:58Z

I guess the issue isn't so much the fact that the remote servers do a useless query, but that mass-suspending users queues a hell lot of Delete delivery jobs (one for every suspended account and unique known remote inbox).

nightpool · 2018-11-28T02:17:48Z

i guess? I wouldn't expect Delete jobs to be particularly more expensive/numerous then other types of jobs. Maybe anarchism.space has a lot of known remote inboxes but comparatively few remote followers?

ClearlyClaire · 2018-11-28T09:15:48Z

On my single-user instance, Account.inboxes returns 1802 entries. I assume it's slightly higher for anarchism.space, so that makes tens of thousands of network-bound jobs when suspending 20 accounts, which can definitely cause some delay in job processing.

I'm not too sure how we can make this more efficient, as we have no way to track who has seen (and copied) accounts. We have the same issue with toot deletion to be honest.

ChatonneLibertaire · 2019-02-11T15:05:45Z

Can someone take care of this or something, I can't do this anymore... I got more than a hundred spam bots to remove this morning and now my instance is cut from federation for probably the whole day because of this issue.

Please do something. Who do I have to implore to do something to either this issue of the spam bots issue in general?

ClearlyClaire · 2019-02-11T21:33:36Z

At the very least, I think we should move Delete sent to instances without followers to the lowest-level priority queue (i.e., pull at the time of writing).

penartur · 2019-03-20T14:15:39Z

So what should we do when we're flooded with these GET requests for suspended users? Just wait it out?

An odd thing is that I see some instances doing several requests for a single suspension, e.g. toot.cafe, masto.themimitoof.fr and mastodon.huloop.com on this screenshot:

(Also, it's been like 4 hours since I suspended some dozens of bots on my small unpopular instance, and the torrent does not seem to fade... if only cloudflare supported caching 410 Gone responses and spared me some CPU load...

ClearlyClaire · 2019-03-20T14:23:40Z

I guess we could add a shortcut to not try fetching the key for an account deletion when we don't know the account… it would be a bit awkward to add such special casing at this point in the flow, but it would definitely make sense

penartur · 2019-03-20T18:07:20Z

So the wave has finally abated, after ~5 hours of flooding my instance with requests at 100x (!) the average rate, and making my CPU work at 100% load. Usually I'm getting that number of requests (170k) on a good month...

And I've only suspended ~100 bots with zero followers and zero posts.

Thankfully my instance runs on a decent hardware, and was able to survive this without significant disruption of service. However, I believe that for some other instances it could be more similar to a DDoS attack.

So the fix is definitely warranted IMHO, even if it will look a bit awkward.

ClearlyClaire · 2019-03-21T21:50:50Z

The awkward fix is in master: #10326
It won't stop other software from performing such requests though.

angristan · 2019-03-21T21:53:12Z

Getting hit with the issue:

root@mstdn ~# grep -c "/users/Eagleeyeadventures" /var/log/nginx/mstdn-access.log
22290

My instance became unavailable because of one account.

This will help a great deal with mastodon#9377 when a caching reverse proxy is configured.

This will help a great deal with #9377 when a caching reverse proxy is configured.

…on#10339) This will help a great deal with mastodon#9377 when a caching reverse proxy is configured.

mjankowski · 2024-11-26T20:20:53Z

Last update here was ~6 years ago ... I'm going to close this on the assumption that the previously reference commit, while not a full/exhaustive solution, is an "as good as we can up with" solution ... please reopen/comment if there's something specific that still exists on current versions and could use renewed, robust, respectful contemplation.

ClearlyClaire added a commit to ClearlyClaire/mastodon that referenced this issue Mar 21, 2019

Mark the 410 gone response for suspended accounts as cachable

bf1f3aa

This will help a great deal with mastodon#9377 when a caching reverse proxy is configured.

ClearlyClaire mentioned this issue Mar 21, 2019

Mark the 410 gone response for suspended accounts as cachable #10339

Merged

Gargron pushed a commit that referenced this issue Mar 21, 2019

Mark the 410 gone response for suspended accounts as cachable (#10339)

2361917

This will help a great deal with #9377 when a caching reverse proxy is configured.

Gargron added the partially a bug Architecture or design-imposed shortcomings label May 1, 2019

hiyuki2578 pushed a commit to ProjectMyosotis/mastodon that referenced this issue Oct 2, 2019

Mark the 410 gone response for suspended accounts as cachable (mastod…

1c54cc4

…on#10339) This will help a great deal with mastodon#9377 when a caching reverse proxy is configured.

messenjahofchrist pushed a commit to Origin-Creative/mastodon that referenced this issue Jul 30, 2021

Mark the 410 gone response for suspended accounts as cachable (mastod…

ee39e6a

…on#10339) This will help a great deal with mastodon#9377 when a caching reverse proxy is configured.

nemobis mentioned this issue Nov 26, 2022

In secure mode, remember which instances know about an account and act accordingly #21674

Open

nemobis mentioned this issue Dec 23, 2022

Mastodon Sends Too Many Deletes #22070

Open

mjankowski closed this as completed Nov 26, 2024

trwnh added the performance Runtime performance label Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suspending accounts cause huge delays in federation #9377

Suspending accounts cause huge delays in federation #9377

ChatonneLibertaire commented Nov 27, 2018

Gargron commented Nov 27, 2018 •

edited

Loading

ChatonneLibertaire commented Nov 27, 2018

Gargron commented Nov 27, 2018

ChatonneLibertaire commented Nov 27, 2018

Gargron commented Nov 27, 2018

nightpool commented Nov 27, 2018 via email

nightpool commented Nov 27, 2018 via email

ClearlyClaire commented Nov 27, 2018 •

edited

Loading

nightpool commented Nov 28, 2018

ClearlyClaire commented Nov 28, 2018

ChatonneLibertaire commented Feb 11, 2019 •

edited

Loading

ClearlyClaire commented Feb 11, 2019

penartur commented Mar 20, 2019 •

edited

Loading

ClearlyClaire commented Mar 20, 2019

penartur commented Mar 20, 2019 •

edited

Loading

ClearlyClaire commented Mar 21, 2019

angristan commented Mar 21, 2019

mjankowski commented Nov 26, 2024

Suspending accounts cause huge delays in federation #9377

Suspending accounts cause huge delays in federation #9377

Comments

ChatonneLibertaire commented Nov 27, 2018

Expected behaviour

Actual behaviour

Steps to reproduce the problem

Specifications

Gargron commented Nov 27, 2018 • edited Loading

ChatonneLibertaire commented Nov 27, 2018

Gargron commented Nov 27, 2018

ChatonneLibertaire commented Nov 27, 2018

Gargron commented Nov 27, 2018

nightpool commented Nov 27, 2018 via email

nightpool commented Nov 27, 2018 via email

ClearlyClaire commented Nov 27, 2018 • edited Loading

nightpool commented Nov 28, 2018

ClearlyClaire commented Nov 28, 2018

ChatonneLibertaire commented Feb 11, 2019 • edited Loading

ClearlyClaire commented Feb 11, 2019

penartur commented Mar 20, 2019 • edited Loading

ClearlyClaire commented Mar 20, 2019

penartur commented Mar 20, 2019 • edited Loading

ClearlyClaire commented Mar 21, 2019

angristan commented Mar 21, 2019

mjankowski commented Nov 26, 2024

Gargron commented Nov 27, 2018 •

edited

Loading

ClearlyClaire commented Nov 27, 2018 •

edited

Loading

ChatonneLibertaire commented Feb 11, 2019 •

edited

Loading

penartur commented Mar 20, 2019 •

edited

Loading

penartur commented Mar 20, 2019 •

edited

Loading