Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

[WIP] External sharded cache #12955

Closed
wants to merge 11 commits into from

Conversation

Fizzadar
Copy link
Contributor

@Fizzadar Fizzadar commented Jun 4, 2022

WIP experimental external sharded Redis caching. Our synapse instance experiences extremely high calls to the _get_joined_profile_from_event_id cache, which also happens to never be invalidated. During normal running the misses (~5k/s min) cause a reasonable amount of DB load and during restarts this becomes a real problem.

Thus, because the cache is never invalidated, I've been looking into the possibility of another caching layer shared between workers and Redis seems like the best fit as it's already used. I chose to make this a new config section and recommend separate Redis instances which handle eviction of items from the cache (specifically the MSET Redis command does not support expiry - although it can be applied key-by-key afterwards).

The sharding is client side based using the jump hashing algorithm for speed and ease of deployment, plus txredis doesn't support Redis cluster (and all the complexities it brings).

Feel free to close/cancel this if not desired as well, this work is primarily to be rolled in our own synapse fork but I thought it may be of interest, especially as an optional configurable. I'm going to leave a few comments on the PR itself as well where I'm unsure about things.

Related: #2123

Signed off by Nick @ Beeper

Pull Request Checklist

@Fizzadar Fizzadar requested a review from a team as a code owner June 4, 2022 12:26
batch_size=500,
desc="_get_joined_profiles_from_event_ids",
)
sharded_cache = self.hs.get_external_sharded_cache()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all feels ripe for some kind of wrapper - I had a brief look at cachedList but I don't think it should live there because it's already pretty complex! Maybe some kind of util wrapper? Or out of scope for this PR?

#
#cache_shards:
# enabled: false
# expire_caches: false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not currently supported but could be fairly easily using a Redis transaction block and setting each key one by one.

@Fizzadar Fizzadar changed the title External sharded cache [WIP] External sharded cache Jun 4, 2022
@Fizzadar Fizzadar marked this pull request as draft June 4, 2022 12:32
@erikjohnston
Copy link
Member

This looks super interesting! We're actually currently doing a bunch of work to reduce the amount of state we pull out of the DB, which may (or may not) help a lot. My inclination would be then to revisit this after that work has landed to see whether this is still useful.

On which point: I'd love to know what is triggering calls to _get_joined_profiles_from_event_ids for you, e.g. I think I've mostly seen it happen when calculation push actions for new events?

@erikjohnston erikjohnston removed the request for review from a team June 6, 2022 08:36
synapse/config/redis.py Outdated Show resolved Hide resolved
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
@Fizzadar
Copy link
Contributor Author

@erikjohnston the _get_joined_profiles_from_event_ids calls are all from our worker and federation sender instances, which seems like this is unexpected! I'm currently trying to track down exactly where they're being called from and will get back to you here, it seems the function is really only triggered in a handful of locations...

@erikjohnston
Copy link
Member

@erikjohnston the _get_joined_profiles_from_event_ids calls are all from our worker and federation sender instances, which seems like this is unexpected! I'm currently trying to track down exactly where they're being called from and will get back to you here, it seems the function is really only triggered in a handful of locations...

I think the federation sender instance one will be fixed by #12964, at the very least.

@Fizzadar
Copy link
Contributor Author

@erikjohnston very happy to report 1.61 has removed near all reqs from the federation sender on that cache!

Screenshot 2022-06-22 at 09 20 22

@erikjohnston
Copy link
Member

Hi @Fizzadar, sorry I was out last week.

Great news that the cache request rate has dropped! I think this means that this PR is no longer needed for this particular case, but we'd certainly be open to using this approach for other cases if needed. So I'm going to close this, but feel free to reopen if/when there are other caches that would benefit from this.

@Fizzadar Fizzadar mentioned this pull request Jul 11, 2022
4 tasks
richvdh pushed a commit that referenced this pull request Jul 15, 2022
Some experimental prep work to enable external event caching based on #9379 & #12955. Doesn't actually move the cache at all, just lays the groundwork for async implemented caches.

Signed off by Nick @ Beeper (@Fizzadar)
Fizzadar added a commit to beeper/synapse-legacy-fork that referenced this pull request Aug 23, 2022
Some experimental prep work to enable external event caching based on matrix-org#9379 & matrix-org#12955. Doesn't actually move the cache at all, just lays the groundwork for async implemented caches.

Signed off by Nick @ Beeper (@Fizzadar)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants