Speed up pruning of ratelimiter #19129

erikjohnston · 2025-11-03T18:01:38Z

I noticed this in some profiling. Basically, we prune the ratelimiters by copying and iterating over every entry every 60 seconds. Instead, let's use a wheel timer to track when we should potentially prune a given key, and then we a) check fewer keys, and b) can run more frequently. Hopefully this should mean we don't have a large pause everytime we prune a ratelimiter with lots of keys.

Also fixes a bug where we didn't prune entries that were added via record_action and never subsequently updated. This affected the media and joins-per-room ratelimiter.

anoadragon453

A few questions below as I try to grok this code.

anoadragon453 · 2025-11-04T11:01:54Z

synapse/api/ratelimiting.py

+
+        self.actions[key] = (action_count, time_start, rate_hz)
+
+        prune_time_s += 0.1  # Add a buffer to ensure we don't try and prune too early


What does this buffer do? I worry it might cause difficult-to-debug bugs in unit tests.

Should we only add this buffer if prune_time_s is within a certain amount of time from now? i.e. prune_time_s += min(0.1, prune_time_s - time_now_s)?

It is mostly me being paranoid about timings. When the prune function gets called it checks if the entry can be pruned by doing the comparison again, if its not ready to be pruned the function will not schedule another prune (as it should have been scheduled when we updated it). So the only thing that really matters is that the prune function gets called after the entry expires, and we ensure that by scheduling prune 0.1s after we think the entry should expire.

Will expand comment

anoadragon453 · 2025-11-04T11:07:21Z

synapse/api/ratelimiting.py

+
+        for key in to_prune:
+            value = self.actions.get(key)
+            if value is None:


When would this happen (I don't think we delete keys from actions other than this function)? Is this just a sanity check?

For a given entry that has been updated multiple times we'll schedule multiple calls to prune, I think given the nature of the updates only the last prune will delete the entry. However it really isn't obvious that there aren't edge cases where an earlier prune call will remove the entry even if there are later prune calls scheduled

anoadragon453 · 2025-11-04T11:08:07Z

synapse/api/ratelimiting.py

-            else:
-                del self.actions[key]
+
+            del self.actions[key]


nit: we use del self.actions[key] here and self.actions.pop(key, None) above.

Well, we know the key is in the map as we just looked the key up in the map.

Oh yes, they do different things don't they. Carry on!

anoadragon453 · 2025-11-04T11:25:13Z

synapse/api/ratelimiting.py


-        self.clock.looping_call(self._prune_message_counts, 60 * 1000)
+        # Records when actions should potentially be pruned.
+        self._timer: WheelTimer[Hashable] = WheelTimer()


Is the default accuracy of 5s per bucket enough? Will clients be told to "retry in 3s" and then still get an error if the current bucket has yet to expire?

The timer and pruning has no observable effect, its just a cleanup of stale entries to keep the actions dict from infinitely expanding.

Ahh, I see. I didn't realise that this was all separate from the actual rate-limiting functionality. This makes the timing a lot less critical, and was the missing bit in my understanding.

anoadragon453

LGTM

anoadragon453 · 2025-11-04T11:49:27Z

synapse/api/ratelimiting.py


-        self.clock.looping_call(self._prune_message_counts, 60 * 1000)
+        # Records when actions should potentially be pruned.
+        self._timer: WheelTimer[Hashable] = WheelTimer()


Ahh, I see. I didn't realise that this was all separate from the actual rate-limiting functionality. This makes the timing a lot less critical, and was the missing bit in my understanding.

anoadragon453 · 2025-11-04T11:50:26Z

synapse/api/ratelimiting.py

-            else:
-                del self.actions[key]
+
+            del self.actions[key]


Oh yes, they do different things don't they. Carry on!

erikjohnston added 6 commits November 3, 2025 17:10

Factor out recording of action

5ae7e7e

Use a wheel timer

ebeed26

Ensure 'record_action' entries are pruned

9590bf8

Newsfile

55ea4f6

Fix typing

d76a37a

Fix tests

643d540

erikjohnston marked this pull request as ready for review November 3, 2025 19:20

erikjohnston requested a review from a team as a code owner November 3, 2025 19:20

Fix unit test

d944e70

anoadragon453 reviewed Nov 4, 2025

View reviewed changes

Comments

3c2e43c

anoadragon453 approved these changes Nov 4, 2025

View reviewed changes

erikjohnston merged commit 5408101 into develop Nov 4, 2025
41 of 43 checks passed

erikjohnston deleted the erikj/faster_ratelimit branch November 4, 2025 12:44


		self.actions[key] = (action_count, time_start, rate_hz)

		prune_time_s += 0.1 # Add a buffer to ensure we don't try and prune too early

Speed up pruning of ratelimiter #19129

Speed up pruning of ratelimiter #19129

Uh oh!

Conversation

erikjohnston commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anoadragon453 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anoadragon453 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

erikjohnston commented Nov 3, 2025 •

edited

Loading