Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't avoid under/overfetching #1697

Closed
sedubois opened this issue Jul 4, 2020 · 16 comments
Closed

Can't avoid under/overfetching #1697

sedubois opened this issue Jul 4, 2020 · 16 comments
Labels
bug breakages in functionality that is implemented guidance-offered models models, associations and fetching the underlying data

Comments

@sedubois
Copy link
Contributor

sedubois commented Jul 4, 2020

What were you trying to do?

I added bullet to my project, which shows that I should add eager loading for my dashboard relationships to avoid N+1 queries when showing the index. For instance I have a Course with program: Field::BelongsTo, and this program is visible on the course index.

Bullet throws:

Bullet::Notification::UnoptimizedQueryError: user: sdubois
GET /admin/courses
USE eager loading detected
  Program => [:translations]
  Add to your query: .includes([:translations])

The issue is gone if I remove program from the Course's COLLECTION_ATTRIBUTES.

So following the documentation, I changed program: to Field::BelongsTo.with_options(scope: -> { Program.eager_load(:translations) }).

What did you end up with (logs, or, even better, example apps are great!)?

The option has no effect. In fact even if I change to Field::BelongsTo.with_options(scope: -> { raise "I got called!" }), it does not raise. It seems that this scope option is not applied.

What versions are you running?

  • Rails 6.0.3.2
  • administrate 0.14.0
@sedubois sedubois added the bug breakages in functionality that is implemented label Jul 4, 2020
@pablobm
Copy link
Collaborator

pablobm commented Jul 4, 2020

Good catch! I think that what you describe should have been caught by BaseDashboard#collection_includes. Maybe worth looking into there to see what's going on?

As for :scope, I'm entertaining the possibility of removing it altogether. I think that the same effect can be reached by other means, using ActiveRecord's own features. I'm not 100% sure yet though.

@sedubois
Copy link
Contributor Author

sedubois commented Jul 6, 2020

Issue 1: BelongsTo in course collection

@pablobm based on your hint, the following workaround in Course dashboard allows to get rid of the N+1 when viewing my course index with a BelongsTo association:

  def collection_includes
    super + [program: :translations]
  end

(The above is necessary because ProgramDashboard#display_resource(program) calls program.title where title is loaded through a translations association by Mobility.)

OK so now the course index works.

Issue 2: HasMany/BelongsTo circular reference in course show

Now I try to view a course. I have a data over-fetching of Course because Course has an associative attribute of type HasMany: course_modules: Field::HasMany, included in SHOW_PAGE_ATTRIBUTES, and CourseModule has a course: Field::BelongsTo attribute included in its COLLECTION_ATTRIBUTES (i.e. the Course and CourseModule show each other).

Bullet::Notification::UnoptimizedQueryError at /admin/courses/my-course
user: sdubois
GET /admin/courses/my-course
AVOID eager loading detected
  CourseModule => [:course]
  Remove from your query: .includes([:course])

As a workaround and because I know no other way to solve the issue, I can live with removing course from the CourseModule's COLLECTION_ATTRIBUTES. So at this point I am able to view a course.

Issue 3: BelongsTo in course form

Now I try to edit a course. I get an N+1 in the form because it has a drop-down program select with the different program titles. The collection_includes override defined above has no effect for the form, so I have to define program: Field::BelongsTo.with_options(scope: -> { Program.eager_load(:translations) }).

Issue 4: HasMany in form

Now I continue to try to edit a course. I get another N+1 problem:

Bullet::Notification::UnoptimizedQueryError at /admin/courses/my-course/edit
user: sdubois
GET /admin/courses/my-course/edit
USE eager loading detected
  CourseModule => [:translations]
  Add to your query: .includes([:translations])

HasMany has no scope option. Instead the controller documentation suggests I should override find_resource with the scope:

    def find_resource(param)
      scoped_resource.eager_load(course_modules: :translations).find_by(slug: param)
    end

However this has no effect, the error about missing eager loading remains. Now I am stuck and don't know how to make the edit form work without having N+1 queries. My ultimate goal is to be able to allow bullet to raise errors whenever it detects an issue in development and test.


@pablobm would you know how to unblock the issues above?

I find the different ways that are required to apply scoping in the different actions very confusing. For example it feels odd that I have to write the same kind of information in the dashboard BelongsTo attribute's scope option, in the dashboard method collection_includes and in the controller method find_resource. How do other projects deal with this issue?

It feels to me the best would be to have a single way to define scoping for each dashboard's associative attributes, which by default would apply to all actions (index/show/edit), with an option to narrow the scope to some actions or to exclude some actions.

@sedubois sedubois changed the title BelongsTo scope option for eager loading has no effect Unable to uniformly fix data under- and over-fetching Jul 6, 2020
@nickcharlton nickcharlton added the models models, associations and fetching the underlying data label Jul 6, 2020
@pablobm
Copy link
Collaborator

pablobm commented Jul 9, 2020

Thank you for that comprehensive information @sedubois. I agree that this part of the code is a bit of a mess at the moment, and it's difficult to work with it. I'm looking into ways to refactor/redesign Administrate to overcome these difficulties.

Regarding each specific issue you have, a main problem is that you have an association, translations, that Administrate is not aware of (and that makes sense), so it doesn't know how to deal with it. I can't be 100% sure of the solutions, but these are the ideas that come to mind for each one:

  1. Would it work if you redefined Admin::CoursesController#scoped_resource to do the eager loading?
  2. Not sure how to proceed on this one. This needs more research. Would you be able to replicate this one by altering the example app, and show it in a branch? This may help shed light into what's going on.
  3. This is actually the only place where the option :scope applies, which comes to show that it's a misnomer and probably should go away. I wonder if we could add a controller method edit_resource, similar to the existing new_resource, and do some eager loading there?
  4. Try this: define a new association has_many :admin_translations, class_name: 'Translation', ->{ eager_load(course_modules: :translations) } , and use this for the field instead. Does this help at all? This is the path I've had in mind lately, for replacing field options with ActiveRecord's own idiomatic tools.

@AFlowOfCode
Copy link

I just spent a while trying to figure out how to avoid N + 1 on a resource's index page, what finally solved it for me was overriding scoped_resource in its controller, like so:

def scoped_resource
  MyResource.includes(:associated_resources)
end

This also worked for the show page. I didn't check to see if it solves all of OP's issues, but this appears to be the best way to quickly fix index & show query problems. I similarly thought that in the dashboard adding the below would help, after finding a related issue & PR, but it did not appear to do anything.

  ATTRIBUTE_TYPES = {
    associated_resources: Field::HasMany.with_options(includes: [:associated_resources]),
    ...
  }.freeze

I then found this issue searching issues for "scoped_resource" to see if anything had been metioned about this, since I couldn't find anything in relation to eager loading issues anywhere except what was said related to with_options.

@pablobm
Copy link
Collaborator

pablobm commented Oct 15, 2020

Thank you for your feedback, @AFlowOfCode. I've been looking into this a bit today, using the Bullet gem and the Administrate demo app.

On first approach, I'm not sure that Administrate can reliably infer what should be loaded eagerly and what shouldn't. For example, in the demo app, on the Customer index page, Bullet advises that we eager-load :line_items (actually orders: :line_items). This is caused by this custom code in the Order model, which Administrate can't predict:

def total_price
line_items.map(&:total_price).reduce(0, :+)
end

Also on Customer index, Bullet advises that we remove an includes for :log_entries. Administrate includes it because it's listed in COLLECTION_ATTRIBUTES, but then it's not used in a way that the eager loading makes sense. Administrate's heuristic fails here.

Other setups will have other use cases, and I don't think it's possible to predict them. What we may have to do is to document this, and provide hooks (perhaps in Administrate::ApplicationController) for integrators to fine-tune the includes for each case if required.

For example, I just gave it a quick go and refactored apply_collection_includes to delegate the detection of includes to a new method called collection_includes, like so:

     def apply_collection_includes(relation)
-      resource_includes = dashboard.collection_includes
+      resource_includes = collection_includes
       return relation if resource_includes.empty?
       relation.includes(*resource_includes)
     end
 
+    def collection_includes
+      dashboard.collection_includes
+    end
+

An integrator could then override collection_includes in their controllers and adjust as necessary.

I'm not sure this is necessarily the solution we need. For one it only works for index actions. It probably should be extended to work for other actions (eg: customers#show also warns of excessive eager loading). I'm not sure right now of what the use cases might be and their implications, so we may want to do more research.

@sedubois sedubois changed the title Unable to uniformly fix data under- and over-fetching Can't avoid under/overfetching Oct 15, 2020
@cguess
Copy link

cguess commented Mar 25, 2021

I would like to voice my need for this fix. Specifically in my use case I have a lot of associated records (~19000 associated with just one object). Loading all of these just for a count is completely breaking all of my RAM and forcing reloads on Heroku. Obviously, this is an edge case, but it seems reasonable for the index page to be able to display this.

Thanks for the excellent project, and I'd be happy to help build up a PR on this.

@pablobm
Copy link
Collaborator

pablobm commented Apr 16, 2021

@cguess - Which count are you referring to? Is this something that can be fixed by overriding methods in the controller?

PRs are welcome. I must admit that there's always the problem that they may inadvertently introduce design decisions that depart from the general direction of the project, or cause other issues that are not obvious. For this reason, sometimes (more often than I'd like) they can end up unmerged and abandoned. However they are useful, as experiments necessary for figuring out solutions.

@cguess
Copy link

cguess commented Apr 29, 2021

@pablobm when there's a has_many association on an object the backend will display a column with the count of the number of associations with each line item. However, instead of doing a COUNT it's loading every object of the association into memory, which, with 19,000 object promptly hits every memory limit and crashes the app.

I'll look into putting together a PR for this.

@jamie
Copy link
Contributor

jamie commented Oct 5, 2021

@cguess did you get anywhere with a PR for this? I'm running into the same problem on an app (Rails 6.1, Administrate 0.16.0):

  • Account has_many :secrets
  • account_dashboard.rb
    • ATTRIBUTE_TYPES includes secrets: Field::HasMany
    • COLLECTION_ATTRIBUTES includes :secrets
  • and then when index.html loads up resources to pass to _collection.html.erb it's set up to preload secrets, kicking off a SELECT "secrets".* FROM "secrets" WHERE "secrets"."account_id" IN ($1, $2) query to just render "350000 secrets" in the table row (actual production numbers, give or take - page takes ~90sec to render).

Problem also occurs on nested pages, eg. User has many Accounts, and I'm rendering a few inline Account entries there, same problem with over-eager loading that final association.

@pablobm
Copy link
Collaborator

pablobm commented Nov 5, 2021

Perhaps this should be solved outside Administrate? I'm thinking of using the counter_cache option to belongs_to, and show this in the index page, instead of the HasMany field. See: https://guides.rubyonrails.org/association_basics.html#options-for-belongs-to-counter-cache

@TomasBarry
Copy link

TomasBarry commented Dec 20, 2021

Having dug into it a little bit, there seems to be two possible improvements:

Field::HasMany.with_options(includes: [:associated_resources]) per @AFlowOfCode

Relevant code:

def includes
  associated_dashboard.collection_includes
end

A handy addition to Field::HasMany would be to allow for a dynamic includes option:

ATTRIBUTE_TYPES = {
    associated_resources: Field::HasMany.with_options(includes: [:associated_resources]),
    ...
  }.freeze

The problem is that the includes method in Field::HasMany does not take any argument and instead reverts to including whatever is defined in COLLECTION_ATTRIBUTES.

A solution here would be to update the includes method in Field::HasMany to take an argument that can override the COLLECTION_ATTRIBUTES and if the argument is not present to revert to using COLLECTION_ATTRIBUTES. That would be my suggestion anyway.

Overfetching a has_many relationship per @cguess

The problem is that if a has_many association is present then all associated records are loaded into memory even though the default behaviour only uses a COUNT of these records. This causes issues when there are a lot of associated records and it is flagged in the default behaviour by Bullet.

Relevant code:

<%= pluralize(field.data.size, field.attribute.to_s.humanize.downcase.singularize) %>

Specifically, the field.data.sze. I believe data in this case comes from here:

def data
  @data ||= associated_class.none
end

I'm not quite clear as to why data is eager loaded. Not quite sure what the fix here would be though since you wouldn't want to assume how the end-user uses the data.

It really does seem up to the end-user to specifically handle how they want to render data and if your has_many relationship has too many associated records then you should deal with that specifically rather than the framework changing the default behaviour?

You as the end user can customize the page yourself - https://administrate-demo.herokuapp.com/customizing_page_views

A solution for those looking to get around the issue, say in the show page where you want to override the default behaviour would be to first run:

# This generates a file
rails generate administrate:views:show MyModel

And then update the section to:

<section class="main-content__body">
  <dl>
    <% page.attributes.each do |attribute| %>
      <dt class="attribute-label" id="<%= attribute.name %>">
      <%= t(
        "helpers.label.#{resource_name}.#{attribute.name}",
        default: page.resource.class.human_attribute_name(attribute.name),
      ) %>
      </dt>

      <% case attribute
      when Administrate::Field::HasMany %>
        <dd class="attribute-data attribute-data--number"
            ><%= attribute.data.size %></dd>
      <% else %>
        <dd class="attribute-data attribute-data--<%=attribute.html_class%>"
            ><%= render_field attribute, page: page %></dd>
      <% end %>
    <% end %>
  </dl>
</section>

(note the case statement that explicitly handles the Field::HasMany as a number (to show the size of the association) while implementing the default behaviour for all other fields. Also not the odd syntax for the case statement. This is valid ERB and is different to what I had assumed it would be).

This could be abstracted away to a partial that accepts a lambda or something else if you really wanted.

@pablobm
Copy link
Collaborator

pablobm commented Jan 4, 2022

Hello all. I've been looking into this in the last few days. I think I may have a solution, but let's see what you think first.

1. Administrate's implementation of eager loads

I think that the first problem is that Administrate eager-loads all associative fields, and this is not useful. The default behaviour in collections (ie: index page and has_many tables) is the following:

  • BelongsTo, Polymorphic, HasOne: show the display_resource of the data.
  • HasMany: show the size of the data.

Of these behaviours, it only makes sense to eager-load by default in the first case (display_resource). In the second case, ActiveRecord is going to SELECT COUNT(*) automatically, ignoring the eager-loaded data.

So a first fix would be to change the following:

def self.associative?
self < Associative
end

To the following:

      def self.associative?
        self <= BelongsTo || self <= HasOne
      end

(The name of the method should change too, and also be moved elsewhere, but this is the general idea).

That should fix @jamie's issue at #2088

2. Counter caches

The above should improve things, but they could be further improved by using counter caches in your models: https://guides.rubyonrails.org/association_basics.html#options-for-belongs-to-counter-cache However the over-eager load has to be removed first as per above.

3. An :includes options to HasMany

@ThomasBarry, @AFlowOfCode: after my research above, I'm not sure of an option Field::HasMany.with_options(includes: [:associated_resources]) would help. Now it looks to me as though the change I propose above should be enough. Please show me use cases to prove me wrong here :-)

4. sedubois's case

After re-reading @sedubois's case, my hypothesis now is that several of his display_resource methods reach out to the translations association, causing a lazy load.

To turn this into an eager load, I think the best option would be to override #scoped_resource and #requested_resource in the appropriate controllers (children of Admin::ApplicationController) to do the eager load there. It's not possible for Administrate to automatically detect this.

5. CALL TO ACTION!

Would someone be able to provide a PR with an improvement as proposed in point 1? :-)
No need to provide a fully-fledged PR with tests, etc just yet. For now let's have just a prototype to explore this and see if it solves the problems we are discussing, then we can polish it if we think it's good.

@jamie
Copy link
Contributor

jamie commented Mar 24, 2022

Finally got some time to look at this again - @pablobm I think you're on the nose. I can confirm that on my reproduction app if I make that change to Administrate::Field::Base.associative? to self < BelongsTo || self < HasOne that it turns the associated select * into a series of select count(*), but I'll take that bounded N+1 to a huge data load. (Timing comparisons with 2 users, having 50k and 40k messages each: 1.7s total, 221ms ActiveRecord, vs 55ms total, 4.1ms ActiveRecord)

I'd argue that a proper fix for this use case should run counts without needing N queries for it, maybe leverage https://github.com/k0kubun/activerecord-precounter (or just borrow the technique)?

Relevant code flow that might need tweaking:

  • Administrate::ApplicationController#index
  • Administrate::ApplicationController#apply_collection_includes (this, or a sibling method, is the place to add precounts)
  • Administrate::BaseDashboard#collection_includes
  • Administrate::BaseDashboard#attribute_includes
  • Administrate::Field::Base.associative?

@jamie
Copy link
Contributor

jamie commented Mar 24, 2022

As a workaround for anyone following this thread, you can spot fix this in the current administrate version by tweaking the impacted dashboards in your app:

  def collection_includes
    super.without(:messages, :logs)
    # arguments are the COLLECTION_ATTRIBUTES entries you want to suppress eager loading
  end

@pablobm
Copy link
Collaborator

pablobm commented Mar 31, 2022

That precounter gem looks neat. On one hand, I think that Administrate should use Rails's own default, conventional choices which in this case would mean using counter caches, but it would be neat if we could offer some sort of hook for users to plug this as an alternative. I haven't looked into it, but I suspect it won't be trivial 🙁 Having said that, if someone wants to experiment I'd be happy to review.

@pablobm
Copy link
Collaborator

pablobm commented Apr 6, 2023

Closing this as I think that #2211 solves some of the problems, while others are out of scope. Happy to continue discussing this if you think there's more to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug breakages in functionality that is implemented guidance-offered models models, associations and fetching the underlying data
Projects
None yet
Development

No branches or pull requests

7 participants