Skip to content

Avoid DISTINCT on COUNT queries when possible #55029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chrisyuska
Copy link

@chrisyuska chrisyuska commented May 7, 2025

Motivation / Background

Fixes #55027

When calling .count on an eager-loaded query where the base table belongs_to the eager-loaded table, AR switches COUNT(*) to a COUNT(DISTINCT "id"). However, this is unnecessary as there will never be duplicates in a belongs_to relationship like this.

This isn't normally an issue, but when running COUNT queries on larger tables (at least in Postgres), I've seen this turn ~100ms queries into ~30s queries.

This Pull Request has been created because a very common scenario is paginating data that might have an order clause on an included table, which is then paginated with gems like Kaminari that will call count on the relation without additional reworking of the query. e.g.:

Comment.includes(:post).order(posts: { title: :asc }, comments: { created_at: :desc }).count
# => SELECT COUNT(DISTINCT "comments"."id") FROM "comments" LEFT OUTER JOIN "posts" ON "posts"."id" = "comments"."post_id"

Detail

This Pull Request changes ActiveRecord's Calculation logic around COUNT queries to only add a DISTINCT column when any eager-loaded associations are not belongs_to associations.

Additional information

One workaround we've used is to avoid the eager-load and just change the query to:

Comment.left_outer_joins(:post).order(posts: { title: :asc }, comments: { created_at: :desc }).count

This works fine, but we lose the eager-loaded post association in doing so.

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one change. Unrelated changes should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • Tests are added or updated if you fix a bug or add a feature.
  • CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

@chrisyuska chrisyuska force-pushed the hotfix/unnecessary-distinct-on-count branch from 954bbce to a1f0d17 Compare May 7, 2025 02:16
@chrisyuska
Copy link
Author

I suspect this PR will need some work before it can be merged. Totally open to alternative approaches or just closing it too if the complexity trade-off isn't worth it.

@chrisyuska chrisyuska force-pushed the hotfix/unnecessary-distinct-on-count branch from a1f0d17 to ff3c66c Compare May 7, 2025 02:35
* Fixes rails#55027

Previously, all queries with eager-loaded associations resulted in  queries.
However, this addition of DISTINCT is unnecessary and can be expensive on
larger tables when the base table belongs_to all eager-loaded associations.

This change adds a check for any referenced or eager-loaded tables on
calculations and skips the DISTINCT addition if all joined tables ar
belongs_to associations.
@chrisyuska chrisyuska force-pushed the hotfix/unnecessary-distinct-on-count branch from ff3c66c to 8651929 Compare May 7, 2025 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Counts with eager loaded associations include unnecessary DISTINCT in certain scenarios
1 participant