Separating schema and user/session cache #6060

kcolton · 2019-09-17T22:38:08Z

Is your feature request related to a problem? Please describe.
We have a large schema which is currently being cached in redis. Our schema cache is using significant resources due to its size (~64KB) and that it gets accessed multiple times per request (we process several hundred requests per second).

Our schema cache is putting an amount of pressure on redis (and even the raw networking) that really isn't sustainable.

Describe the solution you'd like
Being able to have the schema cache be managed in memory, but have the user/session cache in redis.

Describe alternatives you've considered
Using in-memory cache for everything:
We have tried this in the past but had issues with in-memory cache leaking memory. This was several versions ago. Maybe that has been fixed? Any drawbacks with in-memory cache other than memory usage?

Additional context
Overall the implementation of the schema cache, particularly the number of times it is accessed per request and it's potential size, does not seem well tailored for it being stored remotely.

Our schema does not often change w/o the backend getting redeployed anyway so we are not really concerned with schema invalidation.

Looked into how we would be able to use a different cache for schema and it seems like it would require forking and modifying DatabaseController, SchemaCache, and SchemaController to be configured with a different CacheController.

This seems like it would be useful for other users and well and we would be happy to commit it back into the project if there was interest.

Appreciate thoughts/advise on the specific problem we are facing and proposed solution.

Many thanks!

New Relic Graph showing time in redis is taking up 1/3 of all request time. Much more than even mongo.

Even with single schema cache enabled we still get thousands of redis operations per second. Most just reading the same ~64KB schema over and over.

A large amount of our network IO is from the schema cache being read from redis multiple times per request. It's enough bandwidth that it overwhelmed many of the redis configurations we tried.

The text was updated successfully, but these errors were encountered:

dplewis · 2019-09-17T23:17:08Z

Which version of Parse Server are you using? Improvements have been made such as less validation of schema and reducing bottleneck on RedisCacheAdapter.

Have you tried enableSingleSchemaCache: true?

kcolton · 2019-09-18T20:20:05Z

@dplewis Thanks for the reply.

Version: 3.7.2

We saw a reduction in total number of redis ops after #5612 and #5616 (thanks btw! was a really nice change) as well as when we changed to enableSingleSchemaCache: true which did dramatically increase our cache hit ratio. We are also using directAccess: true although that did not seem to affect redis ops.

However, even with all of that, there's still a very high volume of "get" "badpanda:__SCHEMA__MAIN_SCHEMA" ops (multiple per request); and due to our schema being pretty large, that key winds up being ~64KB.

Here is MONITOR output from redis-cli for one of our more simple cloud functions + example of schema key set to show size:
https://gist.github.com/kcolton/8999d06c24f0c00d0c24d87857d2ed51

When there is a cache hit on both the schema and session, there aren't that many MAIN_SCHEMA lookups happening. Seems like 1 for the cloud function and 1 for the collection lookup (we are using directAccess: true although that didnt seem to change much w/ redis)

# simple cloud function execution w/ session and schema cache hit 
1568837359.546346 [0 172.24.0.5:40150] "get" "badpanda:user:r:a98661dac0819f408ea2bb7f4bd24e60"
1568837359.550306 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMA__MAIN_SCHEMA"
# cloud function that executes a lookup on `UserData` collection (we are using directAcces: true)
1568837359.552400 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMAUserData"
1568837359.552687 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMA__MAIN_SCHEMA"

However for the same cloud function on a session cache miss (still schema cache hit) we wind up with 6 lookups to MAIN_SCHEMA.

# simple cloud function execution w/ schema cache hit, but session cache miss
1568837495.823021 [0 172.24.0.5:40150] "get" "badpanda:user:r:a98661dac0819f408ea2bb7f4bd24e60" # session cache miss
1568837495.823508 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMA__MAIN_SCHEMA" # each one of these has to transmit 64KB back 
1568837495.827997 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMA_Session" # triggers additional schema lookups for same execution
1568837495.828373 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMA__MAIN_SCHEMA"
1568837495.833024 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMA__MAIN_SCHEMA"
1568837495.835295 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMA_User"
1568837495.835573 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMA__MAIN_SCHEMA"
1568837495.840052 [0 172.24.0.5:40150] "psetex" "badpanda:user:r:a98661dac0819f408ea2bb7f4bd24e60" "30000" "<redacted>"
1568837495.844669 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMA__MAIN_SCHEMA"
1568837495.844291 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMAUserData"
1568837495.844669 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMA__MAIN_SCHEMA"

With hundreds of cloud function executions per second, many of which involve multiple collections and even more schema lookups you can see how this all starts to add up to significant raw throughput between redis and servers.

With how large our schema is, even if it always only made 1 MAIN_SCHEMA lookup per "request" like it does in the cache hit example, it still doesn't seem appropriate to be storing in redis.

kcolton · 2019-09-18T21:28:02Z

Just tried running 3.9.0 locally and I'm no longer seeing the __SCHEMA<Collection> lookups, but seems to be the same number of "get" "badpanda:__SCHEMA__MAIN_SCHEMA" which is the more problematic op which is the main bottleneck.

dplewis · 2019-09-20T02:55:44Z

Do you have a specific query / use case that uses a lot of get lookups.

Can you write a test case? As many as possible. I only added tests for basic queries and writes.. https://github.com/parse-community/parse-server/blob/master/spec/RedisCacheAdapter.spec.js#L187

This way we can look see how many lookups and find a way to deal with the bottleneck.

There should be a minimum of 1 get lookup per request.

stale · 2019-11-11T22:06:59Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

davimacedo added the discussion label Sep 27, 2019

SebC99 mentioned this issue Nov 6, 2019

Lots of calls to the cache to get Schema during the same query #6193

Closed

stale bot added the wontfix label Nov 11, 2019

stale bot closed this as completed Nov 18, 2019

snyk-bot mentioned this issue Mar 28, 2022

refactor: upgrade apollo-server-express from 2.25.2 to 3.6.3 #7903

Closed

parseplatformorg mentioned this issue May 6, 2024

refactor: Upgrade @graphql-tools/utils from 8.12.0 to 10.1.3 #9119

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separating schema and user/session cache #6060

Separating schema and user/session cache #6060

kcolton commented Sep 17, 2019 •

edited

Loading

dplewis commented Sep 17, 2019 •

edited

Loading

kcolton commented Sep 18, 2019 •

edited

Loading

kcolton commented Sep 18, 2019 •

edited

Loading

dplewis commented Sep 20, 2019 •

edited

Loading

stale bot commented Nov 11, 2019

Separating schema and user/session cache #6060

Separating schema and user/session cache #6060

Comments

kcolton commented Sep 17, 2019 • edited Loading

dplewis commented Sep 17, 2019 • edited Loading

kcolton commented Sep 18, 2019 • edited Loading

kcolton commented Sep 18, 2019 • edited Loading

dplewis commented Sep 20, 2019 • edited Loading

stale bot commented Nov 11, 2019

kcolton commented Sep 17, 2019 •

edited

Loading

dplewis commented Sep 17, 2019 •

edited

Loading

kcolton commented Sep 18, 2019 •

edited

Loading

kcolton commented Sep 18, 2019 •

edited

Loading

dplewis commented Sep 20, 2019 •

edited

Loading