-
-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: performance issue when database contains a lot of schemas #480
Conversation
The failing test case shows why we chose the current behavior. Consider the following case:
If the user has permission to access After the change made in this PR, we would fail to recognize the referenced type and it would be marked as Thats admittedly an edge case but its one we've decided to support in the past so we'll need to think through options before merging this. The schema is cached on a per-connection basis across graphql requests and the cache is busted only when a DDL event occurs. I could see it having a big impact on your p99, but am surprised if it was a consistent issue. What kind of performance characteristics are you seeing with ~200 schemas? |
Hey @olirice, Thanks a lot for your answer, in our test we add around 400 schemas, the |
make sense, thanks for the additional info
Yes, that would be safer. Its possible that doing the lookups on references made functions and tables could actually be slower but its definitely worth a test. Are you up for updating this PR with that approach? |
Hey @olirice, Thanks for the swift feedback and the suggestion – much appreciated! We'll definitely take a stab at updating the PR with the approach you mentioned. We'll try to tackle that in the January month. On a slightly different note, our team is also keen on nested query filters. We're practically dreaming of them at this point 😅. Do you have any plans to include this feature in |
Nice, looking forward to it
could you give me a concrete example for this? |
Yes it's exactly this ticket 🙂 |
Hey @olirice and @magrinj we are also experiencing this same issue. We have 3000+ schemas in a multi-tenancy environment and after version 1.2.3 our queries resolve in over 1 minute even when there is no data in the schema. We are currently locked at v1.2.3 because of this issue. We would love to see this fix get committed and could offer additional performance testing for it. For additional context, each schema of ours has approximately 10-15 tables in it. |
@etherealhedgehog @magrinj proposal available in #493 |
@etherealhedgehog 3000 schemas is a pretty serious project! I'd love to hear more about your application and how you're using pg_graphql e.g. public facing API, admin API, etc? |
resolved by #493 |
What kind of change does this PR introduce?
Performance issue
What is the current behavior?
Currently, the query time increases linearly with the number of schemas in our system.
What is the new behavior?
With the proposed changes, the query time remains constant, regardless of the number of schemas.
Additional context
At Twenty, we've identified a performance bottleneck in
pg_graphql
related to our multi-tenancy approach, which involves creating a separate schema for each workspace. In our production environment, which includes hundreds of schemas, we've noticed a significant slowdown in query performance.Upon investigation, we found that the
types
andcomposites
inload_sql_context.sql
are being loaded for all schemas, rather than just the relevant schema. This seems to be the root cause of the performance issue.We have identified a potential fix but are seeking additional insights and suggestions to refine our approach.