Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separating schema and user/session cache #6060

Closed
kcolton opened this issue Sep 17, 2019 · 5 comments
Closed

Separating schema and user/session cache #6060

kcolton opened this issue Sep 17, 2019 · 5 comments

Comments

@kcolton
Copy link

kcolton commented Sep 17, 2019

Is your feature request related to a problem? Please describe.
We have a large schema which is currently being cached in redis. Our schema cache is using significant resources due to its size (~64KB) and that it gets accessed multiple times per request (we process several hundred requests per second).

Our schema cache is putting an amount of pressure on redis (and even the raw networking) that really isn't sustainable.

Describe the solution you'd like
Being able to have the schema cache be managed in memory, but have the user/session cache in redis.

Describe alternatives you've considered
Using in-memory cache for everything:
We have tried this in the past but had issues with in-memory cache leaking memory. This was several versions ago. Maybe that has been fixed? Any drawbacks with in-memory cache other than memory usage?

Additional context
Overall the implementation of the schema cache, particularly the number of times it is accessed per request and it's potential size, does not seem well tailored for it being stored remotely.

Our schema does not often change w/o the backend getting redeployed anyway so we are not really concerned with schema invalidation.

Looked into how we would be able to use a different cache for schema and it seems like it would require forking and modifying DatabaseController, SchemaCache, and SchemaController to be configured with a different CacheController.

This seems like it would be useful for other users and well and we would be happy to commit it back into the project if there was interest.

Appreciate thoughts/advise on the specific problem we are facing and proposed solution.

Many thanks!


New Relic Graph showing time in redis is taking up 1/3 of all request time. Much more than even mongo.
image

Even with single schema cache enabled we still get thousands of redis operations per second. Most just reading the same ~64KB schema over and over.
image

A large amount of our network IO is from the schema cache being read from redis multiple times per request. It's enough bandwidth that it overwhelmed many of the redis configurations we tried.
image

@dplewis
Copy link
Member

dplewis commented Sep 17, 2019

Which version of Parse Server are you using? Improvements have been made such as less validation of schema and reducing bottleneck on RedisCacheAdapter.

Have you tried enableSingleSchemaCache: true?

@kcolton
Copy link
Author

kcolton commented Sep 18, 2019

@dplewis Thanks for the reply.

Version: 3.7.2

We saw a reduction in total number of redis ops after #5612 and #5616 (thanks btw! was a really nice change) as well as when we changed to enableSingleSchemaCache: true which did dramatically increase our cache hit ratio. We are also using directAccess: true although that did not seem to affect redis ops.

However, even with all of that, there's still a very high volume of "get" "badpanda:__SCHEMA__MAIN_SCHEMA" ops (multiple per request); and due to our schema being pretty large, that key winds up being ~64KB.

Here is MONITOR output from redis-cli for one of our more simple cloud functions + example of schema key set to show size:
https://gist.github.com/kcolton/8999d06c24f0c00d0c24d87857d2ed51

When there is a cache hit on both the schema and session, there aren't that many MAIN_SCHEMA lookups happening. Seems like 1 for the cloud function and 1 for the collection lookup (we are using directAccess: true although that didnt seem to change much w/ redis)

# simple cloud function execution w/ session and schema cache hit 
1568837359.546346 [0 172.24.0.5:40150] "get" "badpanda:user:r:a98661dac0819f408ea2bb7f4bd24e60"
1568837359.550306 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMA__MAIN_SCHEMA"
# cloud function that executes a lookup on `UserData` collection (we are using directAcces: true)
1568837359.552400 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMAUserData"
1568837359.552687 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMA__MAIN_SCHEMA"

However for the same cloud function on a session cache miss (still schema cache hit) we wind up with 6 lookups to MAIN_SCHEMA.

# simple cloud function execution w/ schema cache hit, but session cache miss
1568837495.823021 [0 172.24.0.5:40150] "get" "badpanda:user:r:a98661dac0819f408ea2bb7f4bd24e60" # session cache miss
1568837495.823508 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMA__MAIN_SCHEMA" # each one of these has to transmit 64KB back 
1568837495.827997 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMA_Session" # triggers additional schema lookups for same execution
1568837495.828373 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMA__MAIN_SCHEMA"
1568837495.833024 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMA__MAIN_SCHEMA"
1568837495.835295 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMA_User"
1568837495.835573 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMA__MAIN_SCHEMA"
1568837495.840052 [0 172.24.0.5:40150] "psetex" "badpanda:user:r:a98661dac0819f408ea2bb7f4bd24e60" "30000" "<redacted>"
1568837495.844669 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMA__MAIN_SCHEMA"
1568837495.844291 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMAUserData"
1568837495.844669 [0 172.24.0.5:40150] "get" "badpanda:__SCHEMA__MAIN_SCHEMA"

With hundreds of cloud function executions per second, many of which involve multiple collections and even more schema lookups you can see how this all starts to add up to significant raw throughput between redis and servers.

With how large our schema is, even if it always only made 1 MAIN_SCHEMA lookup per "request" like it does in the cache hit example, it still doesn't seem appropriate to be storing in redis.

@kcolton
Copy link
Author

kcolton commented Sep 18, 2019

Just tried running 3.9.0 locally and I'm no longer seeing the __SCHEMA<Collection> lookups, but seems to be the same number of "get" "badpanda:__SCHEMA__MAIN_SCHEMA" which is the more problematic op which is the main bottleneck.

@dplewis
Copy link
Member

dplewis commented Sep 20, 2019

Do you have a specific query / use case that uses a lot of get lookups.

Can you write a test case? As many as possible. I only added tests for basic queries and writes.. https://github.com/parse-community/parse-server/blob/master/spec/RedisCacheAdapter.spec.js#L187

This way we can look see how many lookups and find a way to deal with the bottleneck.

There should be a minimum of 1 get lookup per request.

@stale
Copy link

stale bot commented Nov 11, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants