Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False Positive for Rails/PluckInWhere #310

Closed
natematykiewicz opened this issue Jul 26, 2020 · 12 comments · Fixed by #317
Closed

False Positive for Rails/PluckInWhere #310

natematykiewicz opened this issue Jul 26, 2020 · 12 comments · Fixed by #317

Comments

@natematykiewicz
Copy link
Contributor

natematykiewicz commented Jul 26, 2020

Rails/PluckInWhere assumes that the pluck is being called on an ActiveRecord relation.

data = Rails.cache.fetch('iterations') { Api.iteration_overview }
indexed_iterations = Iteration.where(number: data.pluck('number')).index_by(&:number)

data is not an ActiveRecord relation, it's an array of hashes. The pluck here is using ActiveSupport's Enumerable#pluck.


Expected behavior

Call pluck on an array of hashes, and not have Rails/PluckInWhere report an issue.

Actual behavior

Call pluck on an array of hashes, and have Rails/PluckInWhere report "Use select instead of pluck within where query method."

Steps to reproduce the problem

arr = [{foo: 1}, {foo: 2}]
Foo.where(val: arr.pluck(:foo))

RuboCop version

Rubocop 0.88.0, Rubocop-Rails 2.6.0

@mobilutz
Copy link
Contributor

I just stumbled over this false positive as well.

Maybe a check needs to be added if the class that pluck is being called on is an ActiveRecord::Relation or something else.

koic added a commit to koic/rubocop-rails that referenced this issue Aug 2, 2020
Fixes rubocop#310.

This PR adds `EnforcedStyle` to `Rails/PluckInWhere`. By default,
it does not register an offense if `pluck` method's receiver is a variable.
koic added a commit to koic/rubocop-rails that referenced this issue Aug 2, 2020
Fixes rubocop#310.

This PR adds `EnforcedStyle` to `Rails/PluckInWhere`. By default,
it does not register an offense if `pluck` method's receiver is a variable.
koic added a commit to koic/rubocop-rails that referenced this issue Aug 2, 2020
Fixes rubocop#310.

This PR adds `EnforcedStyle` to `Rails/PluckInWhere`. By default,
it does not register an offense if `pluck` method's receiver is a variable.
@koic koic closed this as completed in #317 Aug 3, 2020
koic added a commit that referenced this issue Aug 3, 2020
[Fix #310] Add `EnforcedStyle` to `Rails/PluckInWhere`
@tilo
Copy link

tilo commented Jun 9, 2022

we had a situation where this Rubocop rule caused the MySQL query planner to not optimize the query correctly, and overwhelmed the DB.

.pluck(:id) returns an array of numeric IDs
.select(:id) returns the instances with an 'id' attribute - which is incorrect in the context we observed

the correct "select" statement in our case would have been .select(:id).map(&:id)
which is not preferable to .pluck(:id)

@koic I'm not convinced that this is a good and general Rubocop rule

@natematykiewicz
Copy link
Contributor Author

natematykiewicz commented Jun 9, 2022

@tilo can you provide more details?

.select(:id) returns the instances with an 'id' attribute

This is only true if you've called load on the relation, which can happen if you directly call load or any enumerable method such as each or map.

If you're just nesting relations, it does a subquery.

Foo.where(bar_id: Bar.active.select(:id))

This does 1 query:

SELECT *
FROM foos
WHERE bar_id IN (
  SELECT id FROM bars WHERE active
);

If you use pluck, it does 2 queries:

Foo.where(bar_id: Bar.active.pluck(:id))

1 to grab the ids of all the relevant Bars.

SELECT id FROM bars WHERE active;

And 1 to get the Foo records using those Bar ids.

SELECT *
FROM foos
WHERE bar_id IN (1, 2, 3);

@natematykiewicz
Copy link
Contributor Author

Also you said:

the correct "select" statement in our case would have been .select(:id).map(&:id)
which is not preferable to .pluck(:id)

But .pluck(:id) would result in exactly the same query plan and SQL as .select(:id).map(&:id), just the latter will instantiate a bunch of ActiveRecord objects to get the IDs, where the former will just have an array of IDs.

Both of those would do 2 queries -- 1 to select the ids and 1 to grab your rows.

Can you provide some relevant code?

@natematykiewicz
Copy link
Contributor Author

Perhaps what you're meaning to say is that you wanted .pluck(:id) because a subquery of .select(:id) overwhelmed the query planner, but this cop pushed you to use a subquery?

@tilo
Copy link

tilo commented Jun 9, 2022

I wanted to use pluck(:id), but this cop auto-corrected it to use select(:id).

Original PR, using pluck(:id) resulted in this Query Plan:

> HugeTable.where(item_type: 'MyModel', item_id: MyModel.where(user_id: @user.id).pluck(:id)).explain
=>
EXPLAIN for: SELECT `hugetable`.* FROM `hugetable` WHERE `hugetable`.`item_type` = 'MyModel' AND `hugetable`.`item_id` IN (1442298, 2304566, 2304574, 2608158)
+----+-------------+----------+------------+-------+-----------------------------------------+-----------------------------------------+---------+------+------+----------+-----------------------+
| id | select_type | table    | partitions | type  | possible_keys                           | key                                     | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+----------+------------+-------+-----------------------------------------+-----------------------------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | hugetable | NULL       | range | index_hugetable_on_item_type_and_item_id | index_hugetable_on_item_type_and_item_id | 1026    | NULL |   36 |    100.0 | Using index condition |
+----+-------------+----------+------------+-------+-----------------------------------------+-----------------------------------------+---------+------+------+----------+-----------------------+

uses an index

Modified PR, after using rubocop -A, using select(:id), resulted in this Query Plan: (rule Rails/PluckInWhere)

> HugeTable.where(item_type: 'MyModel', item_id: MyModel.where(user_id: @user.id).select(:id)).explain
=>
EXPLAIN for: SELECT `hugetable`.* FROM `hugetable` WHERE `hugetable`.`item_type` = 'MyModel' AND `hugetable`.`item_id` IN (SELECT `my_models`.`id` FROM `my_models` WHERE `my_models`.`user_id` = 1516073) 
+----+-------------+-----------------------+------------+------+------------------------------------------------+-----------------------------------------+---------+----------------------------------------------+------+----------+-------------+
| id | select_type | table                 | partitions | type | possible_keys                                  | key                                     | key_len | ref                                          | rows | filtered | Extra       |
+----+-------------+-----------------------+------------+------+------------------------------------------------+-----------------------------------------+---------+----------------------------------------------+------+----------+-------------+
|  1 | SIMPLE      | my_models | NULL       | ref  | PRIMARY,index_my_models_on_user_id | index_my_models_on_user_id  | 5       | const                                        |    4 |    100.0 | Using index |
|  1 | SIMPLE      | hugetable              | NULL       | ref  | index_hugetable_on_item_type_and_item_id        | index_hugetable_on_item_type_and_item_id | 1026    | const,prod_db.my_models.id |    5 |    100.0 | NULL        |
+----+-------------+-----------------------+------------+------+------------------------------------------------+-----------------------------------------+---------+----------------------------------------------+------+----------+-------------+

This added a sub-query, and the MySQL optimizer did no longer use an index on HugeTable! 😱 e.g. NULL
HugeTable has around 1.6B rows.. 😱 (yes, we need to get rid of that table)

The select which caused the sub-query was added by a RuboCop auto-correct for rule Rails/PluckInWhere

> MyModel.where(user_id: @user.id).select(:id)
   => #<ActiveRecord::Relation [#<MyModel id: 1442298>, #<MyModel id: 2304566>, #<MyModel id: 2304574>, #<MyModel id: 2608158>]>

this is returning instances of MyModel, instead of numeric IDs

Perhaps the auto-corrected query with select should have been: .map(&:id):

> HugeTable.where(item_type: 'MyModel', item_id: MyModel.where(user_id: @user.id).select(:id).map(&:id)).explain
=>
EXPLAIN for: SELECT `hugetable`.* FROM `hugetable` WHERE `hugetable`.`item_type` = 'MyModel' AND `hugetable`.`item_id` IN (1442298, 2304566, 2304574, 2608158) 
+----+-------------+----------+------------+-------+-----------------------------------------+-----------------------------------------+---------+------+------+----------+-----------------------+
| id | select_type | table    | partitions | type  | possible_keys                           | key                                     | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+----------+------------+-------+-----------------------------------------+-----------------------------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | hugetable | NULL       | range | index_hugetable_on_item_type_and_item_id | index_hugetable_on_item_type_and_item_id | 1026    | NULL |   36 |    100.0 | Using index condition |
+----+-------------+----------+------------+-------+-----------------------------------------+-----------------------------------------+---------+------+------+----------+-----------------------+

and pluck(:id) is simpler than .select(:id).map(&:id) so maybe not use this rule..

@tilo
Copy link

tilo commented Jun 9, 2022

PM me for more details :)

@tilo
Copy link

tilo commented Jun 10, 2022

@natematykiewicz it did not result in the same query plan (see above)

@natematykiewicz
Copy link
Contributor Author

natematykiewicz commented Jun 10, 2022

What I was saying is .select(:id).map(&:id) is the same SQL and Query Plan as .pluck(:id), but much less efficient in terms of Ruby object allocations. There's never a time when you want to use .select(:id).map(&:id). You should be using either .pluck(:id) (which does 2 queries) or .select(:id) (which does 1 query using subselect).

I'm a bit shocked that your subselect query performed so poorly (.select(:id)). I wouldn't be surprised if huge_tables.item_id was not the same datatype as my_models.id; perhaps one is bigint and the other is int, and it's doing a poor job typecasting them mid-query. Or maybe you need to ANALYZE or VACUUM the tables at play here.

@koic Thoughts on marking this cop "unsafe", since there's no guarantee that a subquery will actually perform better than 2 queries?

@tilo
Copy link

tilo commented Jun 13, 2022

@koic WDYT?

@koic
Copy link
Member

koic commented Jun 14, 2022

This cop has already been marked as unsafe. OTOH, the doc could be updated.
https://docs.rubocop.org/rubocop-rails/cops_rails.html#railspluckinwhere

@tilo
Copy link

tilo commented Jun 15, 2022

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants