Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect negative conditions for collection associations #645

Merged
merged 11 commits into from
Mar 8, 2016

Conversation

avit
Copy link
Contributor

@avit avit commented Feb 16, 2016

When searching against attributes from a collection association (has_many / :through / has_and_belongs_to_many), a normal JOIN will include false positives when asking to exclude values (!=, NOT IN, IS NOT NULL) and another tuple is matched.

This is the basic case, see specs for details:

  • A book exists with two tags: "medieval" and "fantasy"
  • You search for books that are NOT "fantasy"
  • The result would still include it, because it matches the other tag for NOT "fantasy"

In situations where this would happen, I've moved the LEFT OUTER JOIN into a subquery with inverted conditions. This is implemented with a correlated primary key which should help the query plan to avoid scanning entire tables in the subquery. The conditions now look like this:

WHERE root.id NOT IN (
  SELECT child.parent_id FROM child
   ... WHERE ... AND child.parent_id = root.id
)

Note that root.id is referenced in the inner query. The EXPLAIN query plan for this looks sensible, at least in MySQL where I explored it.

As far as I know this works with all standard ActiveRecord joins: I have not yet verified with composite keys or polymorphic joins or Nth-degree associations but it should be close to working if it isn't already.

I will start testing this against our own real-world data to see how it performs, please check it out too.

Known issues:

  • Fix mongoid tests failures.
  • Errors with not_null predicate: seems like a positive case and should use the original code branch.

@avit
Copy link
Contributor Author

avit commented Feb 16, 2016

Yes I'm fixing mongoid... 🔨

avit added 7 commits February 16, 2016 11:56
When a collection association (has_many, etc) is searched for negative
conditions (NOT...), a JOIN will still include other rows that match.

The implied meaning is that it should only select where *none* of the
associations match, but the actual result still selects records where
*any* of the joined associations match.

This implementation removes joins that were added while building the
conditions and moves them into a subquery if needed.
@avit avit force-pushed the negative-association branch from a131fa2 to 7f1235a Compare February 16, 2016 20:31
jonatack added a commit that referenced this pull request Mar 8, 2016
Respect negative conditions for collection associations + refactoring.
@jonatack jonatack merged commit 3675584 into activerecord-hackery:master Mar 8, 2016
@jonatack
Copy link
Contributor

jonatack commented Mar 8, 2016

Awesome 🎉 great work @avit 💜

@jonatack
Copy link
Contributor

jonatack commented Mar 8, 2016

Fixes #381.

@avit
Copy link
Contributor Author

avit commented Mar 8, 2016

@jonatack Thanks for the merge on this.

I have one more thing to add: the "NOT NULL" predicate still needs to be handled specially. I will add another PR for this.

@janklimo
Copy link

Hey @avit thank you very much for your work on this. I was trying to use this functionality with conditional has_many ... through relationships with little success.

In my case:

  • a Course has_many :user_assigned_content_skills, -> { where(source: 'user') }, class_name: "ContentSkill"
  • and a ContentSkill belongs_to :skill and belongs_to :course

Then Course.ransack({user_assigned_content_skills_skill_name_not_eq: 'ruby'}).result.to_sql returns the following (i.e. false positives again if a course has multiple content_skills):

"SELECT courses.* FROM courses LEFT OUTER JOIN content_skills ON content_skills.course_id = courses.id AND content_skills.source = 'user' LEFT OUTER JOIN skills ON skills.id = content_skills.skill_id WHERE (skills.name != 'ruby')"

I'm sorry to hijack this old thread but I was wondering if:

a) you know of a way to make this query work
b) there is a plan to have this implemented in Ransack?

Many thanks for any insights!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants