Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid DB hits for JSONAPI relationships #1325

Closed
richmolj opened this issue Nov 10, 2015 · 23 comments
Closed

Avoid DB hits for JSONAPI relationships #1325

richmolj opened this issue Nov 10, 2015 · 23 comments

Comments

@richmolj
Copy link
Contributor

I've been searching through the issues, and this may be relevant to bringing filter to 0.10.x, but I couldn't find anything on this specifically.

If I use JSONAPI adapter, and put has_many :comments in my PostSerializer, there will be a DB hit to fill out all the ids for the comment relationships. Even when no include is specified.

I thought of avoiding this by something like

def comments
  object.comments.loaded ? object.comments : []
end

Unfortunately the key comments will still appear in the response. This means if my client (Ember) has already loaded the post with its comments, then hits this separate endpoint, it will think all the comments for the post have been deleted (versus just not included in the response).

Currently I've been creating a base serializer for every model, then subclassing and adding relationships for each specific endpoint. This not only gets really tedious, but every time I add a relationship I am also specifying includes, which makes me think the default should be to not load relationships unless include is specified. Finally, it makes it quite hard to support an API like /posts/1?include=comments since I need to conditionally select a serializer that actually includes the requested relationships.

Is the correct solution here to wait for filter, do the subclassing-style, or something else? Thanks!

@beauby
Copy link
Contributor

beauby commented Nov 11, 2015

You could use the fields option to filter out your comments association. The db should not get hit.

@richmolj
Copy link
Contributor Author

This would work, and the upcoming filter would make it even better. My question is, shouldn't this be a default? This would make it much easier to build a ?include=comments-style API that would not hit the database when the relationship is not included.

@richmolj
Copy link
Contributor Author

Also worth noting if you're using the JSONAPI links feature for your relationships (#834, #1282) then putting the relationships in the response is not needed and may be invalid. Maybe that should be the logic - if the relationship has links then don't include relationships in the response?

@beauby
Copy link
Contributor

beauby commented Nov 15, 2015

@richmolj the include parameter of the JSON API spec does only specify which associated resources should be returned in the included part of the document, not which relationship field should appear in the primary resource (for this, as mentioned, the fields option is what you are looking for). So it is expected behavior that the db gets hit, and it is expected behavior that the key appears under relationships, even when the relationship is not included.

@richmolj
Copy link
Contributor Author

@beauby I agree with you. I think I did not communicate the problem with the ?include=-style API well. My point was more that this still makes a default case (an ?include=-style API) pretty difficult. You'd have to parse the included relationships, gather all possible fields from the serializer, then remove fields that are relationship names that are not part of the include array. This gets particularly hard when you have nested includes. Something like self.eager_relationships = false would make this a lot easier.

And not only easier, but possibly more correct. If you're specifying links for your relationships, then my understanding is hitting the DB like the default case is at best excessive and at worst incorrect.

This all indicates to me that it should be easier to avoid always hitting the DB when a serializer has a relationship specified.

@beauby
Copy link
Contributor

beauby commented Nov 16, 2015

I do not understand what you mean by ?include=-style API. Could you elaborate? Also, I do not understand what your eager_relationships option would do.

As far as "links-only" relationships are concerned, this will be dealt with in the "relationship-level links" PR, which will provide a mechanism for not specifying the data attribute, and hance not hitting the DB.

@richmolj
Copy link
Contributor Author

By ?include=-style API I mean what is referenced in the readme. For a URL like:

/posts?include=author.city.state,comments

Lets say I make the request:

/posts?include=author.city

My controller would have to do something like:

ALL_POSSIBLE_INCLUDES = [:comments, :author]

# within #index
includes = params[:include].split(',')
not_included = ALL_POSSIBLE_INCLUDES - includes
# Not totally sure what the API is here once 'filter' is applied
fields = MySerializerClass.all_possible_fields
render json, posts: posts, fields: fields

In addition to just being cumbersome, I'm not sure how this logic would work for nested includes, where author should include city but not city.state...a nested fields would have to be computed and passed to each relationship. This gets trickier when supporting * versus ** as well. The current AMS functionality for included works great here - but relationships kind of subverts it since I'm paying for the DB hit anyway.

One way to make this all easily avoidable is something like an eager_relationships = false flag. In this case we would avoid hitting the DB and adding to relationships unless the include parameter is specified. Or this could be something at the relationship level:

has_many :comments, default: false # Only load this relationship when specifically 'included'

The "relationship-level links" PR could use this same flag, since in that case you would want the same behavior - don't hit the DB for relationship IDs since links are provided. A flag saying we should only process relationship IDs when include'd - would make a basic use case, and a link-style API much easier.

@beauby
Copy link
Contributor

beauby commented Nov 16, 2015

Ok, that's what I thought. Thing is, the JSON API spec does not state that, when explicitly not including a resource, the corresponding relationships should not be present. It is perfectly compliant (and I believe intended) to return resources with relationships referencing (by id / type) an other resource that is not present in the document.

The behavior you are describing is not part of the spec (plus it is not well defined, as you could possibly have different inclusion constraints referencing the same relationship fields, think of a Person with a to-many friends relationships – you might want to include the friends of the requested person, but not the whole connected component of the friendship graph, and you cannot specify different field sets for the same resource depending on whether it is primary or included).

@richmolj
Copy link
Contributor Author

Yep, we totally agree on what the spec does and does not support. AMS is definitely compliant here as far as JSONAPI is concerned. I believe this is more about how the internals of AMS operate. Allowing something like has_many :comments, default: false or the other alternatives above would

  • Make this use case, which is pretty common, much easier to support
  • You could use the same code for relationship links
  • Support a separate, valid JSONAPI use case - relationships MAY be present but AMS is treating them like they MUST be present. IMO I don't think you should have to reach for a separate feature (fields) to accommodate that.

I believe your last point would work with existing AMS functionality: has_many :friends, serializer: Person, include: []. include is already handling these scenarios, I'd just like relationships to be similarly flexible.

@natesholland
Copy link

I just ran into this in my app and it caused a surprise N+1 query for me. I would really like to be able to have the serializer just create the JSON without reaching out to the DB.

@bf4
Copy link
Member

bf4 commented Dec 4, 2015

@richmolj asks:

If I use JSONAPI adapter, and put has_many :comments in my PostSerializer, there will be a DB hit to fill out all the ids for the comment relationships. Even when no include is specified.

and @natesholland asks similarly:

I just ran into this in my app and it caused a surprise N+1 query for me. I would really like to be able to have the serializer just create the JSON without reaching out to the DB.

I answer:

This isn't surprising. If your serializer maps comments, then when the record is serialized, it will serialize comments. The fact that the record hits the DB is a property of the record, not of AMS.

However, I do think we might want to reconsider the behavior of the JSON API adapter to only serialize relationships when included, i.e. to treat serialized attributes differently. That would be a change in how it works now, where relationships are treated (more or less) as attributes, but probably is consistent with the spirit of the JSON API spec. I think I'd support a PR for that.

As @beauby wrote:

You could use the fields option to filter out your comments association. The db should not get hit.

This would work even without any change in how we treat relationships, except that it would be pretty inconvenient to exclude attributes via the URL, since, in practice, that means specifying all of them except the excluded one. Excluding relationships in the controller would require manipulating the include option to render.

However, also with any changes, you could customize a serializer by overwriting Serializer#associations to have a different default 'include_tree'

      DEFAULT_INCLUDE_TREE = ActiveModel::Serializer::IncludeTree.from_string('*')

      # @param [IncludeTree] include_tree (defaults to all associations when not provided)
      # @return [Enumerator<Association>]
      #
      def associations(include_tree = DEFAULT_INCLUDE_TREE)
        return unless object
class PostSerializer
  has_many :comments

+   def associations(include_tree = {})
+     super(include_tree)
+   end
end

(@beauby can correct any details in what I said above. I'm still foggy on what IncludeTree means, does, inputs, or outputs)

@richmolj
Copy link
Contributor Author

richmolj commented Dec 4, 2015

@bf4 thanks for the detailed response. I agree with most of what you've said, except the last bit on include_tree. This is an important distinction - included is pretty easy to modify and adjust. I'm cool with that, and I would just create a mixin if it were that easy. But the DB hit I'm concerned with is caused by relationships, not included.

Please correct me if I'm misunderstanding, but:

included: opt-in, easy to filter
relationships: always gets called, no easy way to opt-out (though possible with fields), causes DB hit as side-effect

However, I do think we might want to reconsider the behavior of the JSON API adapter to only serialize relationships when included

This sounds good, but keep in mind the scenario where you do want to serialize relationship links, you just don't want to include their IDs. This may be an opportunity to two-birds-one-stone with one of the links issues (#1282, #834). Similarly, bringing back filter (#1058) might do the trick depending on the implementation.

@beauby
Copy link
Contributor

beauby commented Jan 24, 2016

@richmolj There's a PR in the pipe (#1454) that supports relationship-level links (as well as skipping the data attribute of a relationship, which avoids hitting the DB).

@richmolj
Copy link
Contributor Author

Looks like that will solve my problem 👍

@phcoliveira
Copy link
Contributor

I admit I am not going to help ayone with my comment because, even if everyone agrees with me, it is too late do change things.

I don't mean to sound ungrateful, as I am using AMS freely, but I am quite unhappy with it. IMHO, there is a paradox concerning how AMS is meant to be used and why AMS is used at all.

The how part regards its conception: AMS was built with the model in mind, so it was modeled like a model, with class methods like "has_one", "attributes" and so forth.

The why part regards its usage: AMS is used to produce representations of a model, which may vary under the circuntances.

The paradox resides in offering an API meant for a whole-thing instead of a particular view of a thing.

The API offered for producing a particular view of such whole-thing is cumbersome and feels difficult to grasp. To name a few: "scope", " instance_options", "scope_name", the recent "if" option for the class method "attribute" and so forth. IMHO, these things make clear that the tools for producing particular views of a whole-thing are secondary, accessory.

The reason why AMS is used should have defined how it is supposed to be used. AMS should have been modeled after controllers, not models, because controllers are responsible for handling out particular views of whole-things. For index actions, a few attributes should suffice. For show actions, most attributes are needed. For editing, private-viewing, even more attributes. Some actions require the relationships of a whole-thing, some don't.

So, instead of a UserSerializer, I would like to have a UsersSerializer, with actions matching a the ones of a controller. In practice, I ended up having a serializer class for action for every controller because modeling the data is too cumbersome.

@bf4
Copy link
Member

bf4 commented Feb 11, 2016

@phcoliveira I'm sort of surprised by your comment. Are you responding to something here? Is there an issue we can address? This history of AMS is around serializing models aka resources. https://github.com/rails-api/active_model_serializers/blob/master/CHANGELOG.md#first-commit-as-rails-serializers-001

Now, if you have a RESTful API, the url represents the resource, and controller handles the action at that url, at some point you need to define how you represent the resource. Since resources in a CRUD app are models/records, it makes sense to base the serialization on the model/record.

Please open a new issue if you'd like to discuss.

@bf4
Copy link
Member

bf4 commented Feb 11, 2016

@richmolj closing as it appears this issue is resolved.

Use something like

has_many :comments do
  object.comments.loaded ? object.comments : Comment.none
end

@bf4 bf4 closed this as completed Feb 11, 2016
@richmolj
Copy link
Contributor Author

Yep think this fixes my issue 👍

@richmolj
Copy link
Contributor Author

richmolj commented Mar 4, 2016

For anyone following this, the suggestion above won't work since the comments key will still be in the JSON response, meaning it looks like the post has no comments versus the comments weren't included in the response. Instead:

has_many :comments, if: :comments_loaded?

def comments_loaded?
  object.comments.loaded?
end

will work

@halorium
Copy link

halorium commented Mar 8, 2016

This doesn't solve it for me as the comments are not loaded at that point until I call object.comments and actually load them. So I guess passing the include option doesn't load them?

@richmolj
Copy link
Contributor Author

richmolj commented Mar 8, 2016

@boobooninja that's why I'm trying to get the include option exposed: #1555

@fogisland
Copy link

fogisland commented May 26, 2016

@richmolj @boobooninja
Check object.comments.loaded? in PostSerializer works for me well, thanks.

And also I found a way to ensure loading associations.

Firstly, transform the include option in request to ActiveRecord::QueryMethods includes format, for example: ?includes='comments' to { :comments }

Then use this includes option to query records, like: Post.includes(:comments).find(1) which ensures comments are side-loaded.

At last , call the serializer.

This proc ensures that associations required by client will always be loaded, also N+1 issue is avoided.

I think in this way, server will have no need to concern client data requirement in different views of the same model(define different serializers of the same model). Client should pass the correct include option.

@codertcet111
Copy link

codertcet111 commented Nov 24, 2020

For anyone following this, the suggestion above won't work since the comments key will still be in the JSON response, meaning it looks like the post has no comments versus the comments weren't included in the response. Instead:

has_many :comments, if: :comments_loaded?

def comments_loaded?
  object.comments.loaded?
end

will work

@richmolj your solution worked for me, Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants