Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(replays): Improve index page query performance #45098

Merged
merged 9 commits into from
Feb 28, 2023

Conversation

cmanallen
Copy link
Member

@cmanallen cmanallen commented Feb 24, 2023

The /replays index page slows down when presented with large datasets. Given a large enough dataset some queries will OOM. By keeping fewer values in memory and making various performance optimizations to the query's structure we improve the performance of the query by 5x for our largest customers.

  • Uses non-unique counting for the count_errors field.
    • Breaking change from previous design.
    • Reduces memory usage.
  • Removes urls_sorted dependency for count_urls and replaces with simple count.
    • Improves activity performance.
    • Reduces memory usage.
  • For ALL grouped scalar fields we have replaced groupUniqArray with groupArray(1). This is faster (no unique requirements) and more memory efficient (we only ever have one value in memory).
  • project_id is no longer a grouped scalar value. It is included in the GROUP BY clause.
  • Fixed an error where trace and error ids were filtering against the column rather than the aggregation.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Feb 24, 2023
@cmanallen cmanallen requested a review from JoshFerge February 24, 2023 21:25
@JoshFerge
Copy link
Member

@cmanallen yeah i definitely think we can optimize project_id and user too. for project_id, i don't think we need to aggregate them all before selecting the first one. the project_id is guaranteed to be the same for each replay segment. we can likely simply group by project_id, or use a max_size of 1 and groupArray.

for user_id, we can run a quick query to verify but it should be the same on all segments, so the strategy w/ max_size and groupArray should work there.

@cmanallen
Copy link
Member Author

we can likely simply group by project_id

@JoshFerge Wow, yes. Very astute observation. 33% speed-up on large datasets.

@cmanallen
Copy link
Member Author

for user_id, we can run a quick query to verify but it should be the same on all segments, so the strategy w/ max_size and groupArray should work there.

Added. 25% speed-up.

@cmanallen cmanallen marked this pull request as ready for review February 28, 2023 15:48
@cmanallen cmanallen requested a review from a team as a code owner February 28, 2023 15:48
Copy link
Member

@JoshFerge JoshFerge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@cmanallen cmanallen merged commit a80a05f into master Feb 28, 2023
@cmanallen cmanallen deleted the replays-index-page-performance branch February 28, 2023 22:22
jan-auer added a commit that referenced this pull request Mar 1, 2023
* master: (37 commits)
  ref(ppf): Don't use --commit-batch-size for futures queue length (#45182)
  feat(codecov-v2): Add more logging (#45225)
  fix(alerts): Center table items on alert history page (#45226)
  feat(CapMan): Pass `tenant_ids` to Snuba (#44788)
  ref(db): Drop `project_id` from Environment (model state) (#45207)
  chore(profiling): Rename context in profiles task (#45208)
  feat(replays): Improve index page query performance (#45098)
  chore(issue assignment): Add logging for`GroupOwner` auto assignment (#45142)
  fix(hybrid-cloud): Uncache organization when queueing it for deletion (#45213)
  fix(perf): Navigating to Transaction Summary from Trends widget should persist custom date selection (#45190)
  fix(pageFilter): Fix overflow (#45169)
  ref(git hooks): Only suggest autoupdate variable when pulling if not already set (#45179)
  fix(dashboard): Include dashboard filters in widget viewer (#45106)
  fix(alerts): Remove null projects from alerts list (#45202)
  feat(replay): Update Inline replay onboarding img to support dark mode (#45084)
  __iexact reduce call has default value now. (#45206)
  feat(replay): Use SDK value for LCP (#44868)
  chore(hybrid-cloud): breaking foreign keys (#45203)
  Revert "ref(db): Drop `project_id` from Environment (model state) (#45094)"
  ref(db): Drop `project_id` from Environment (model state) (#45094)
  ...
@github-actions github-actions bot locked and limited conversation to collaborators Mar 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants