Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Timebox] fix: performance degradation of flexible back-end with many workspaces #2722

Closed
magrinj opened this issue Nov 27, 2023 · 2 comments · Fixed by #3204
Closed

[Timebox] fix: performance degradation of flexible back-end with many workspaces #2722

magrinj opened this issue Nov 27, 2023 · 2 comments · Fixed by #3204
Assignees
Labels
scope: backend Issues that are affecting the backend side only type: chore

Comments

@magrinj
Copy link
Member

magrinj commented Nov 27, 2023

Scope & Context

The flexible backend was officially released on November 24th. With the production database accumulating an increasing number of workspaces, we've observed performance degradation directly proportional to the number of workspaces. This issue appears to be associated with pg_graphql and our current method of utilization.

Current behavior

Currently, we assign a specific data source for each workspace. This approach disrupts the caching mechanism of pg_graphql, leading to frequent regeneration of the GraphQL schema. pg_graphql iterates over all database schemas during the generation of the GraphQL schema. Since we create a separate database schema for each workspace, this significantly slows down pg_graphql.

Expected behavior

The optimal solution would involve utilizing a single data source for all workspaces, resorting to a new data source only when querying foreign objects (a functionality that is not yet implemented). Further investigation into the internal workings of pg_graphql is necessary, possibly leading to an improved method for generating and versioning the GraphQL schema based on the database schema.

Technical inputs

  1. Data Source Optimization: Explore consolidating the multiple data sources into a singular, centralized source. This change aims to reduce the load on pg_graphql by minimizing schema generation processes.
  2. Schema Caching Strategy: Investigate alternative caching strategies within pg_graphql. This could involve implementing a more efficient caching mechanism that doesn't require schema regeneration for each workspace.
  3. Performance Profiling: Conduct thorough performance profiling to pinpoint specific bottlenecks associated with pg_graphql when handling multiple schemas. Tools like EXPLAIN ANALYZE in PostgreSQL can maybe be useful for this analysis.
  4. Code Review and Refactoring: Review the current implementation code for pg_graphql integration. Look for any inefficient practices or potential improvements that could be contributing to the performance degradation.
@magrinj magrinj moved this from 🆕 New to 🔖 Planned in Product development ✅ Nov 27, 2023
@magrinj magrinj self-assigned this Nov 27, 2023
@magrinj magrinj added the scope: backend Issues that are affecting the backend side only label Nov 27, 2023
@charlesBochet
Copy link
Member

In scope for this ticket: #2190

@charlesBochet
Copy link
Member

Scope of this ticket:

  • understand the issue and have an idea of how to solve it
  • take a decision on datasources by workspace

@magrinj magrinj moved this from 🔖 Planned to 🏗 In progress in Product development ✅ Dec 20, 2023
@magrinj magrinj linked a pull request Jan 2, 2024 that will close this issue
@magrinj magrinj moved this from 🏗 In progress to ✅ Done in Product development ✅ Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope: backend Issues that are affecting the backend side only type: chore
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants