Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change split query implementation to fetch dependents by keys, without reevaluating principal query #12776

Open
smitpatel opened this issue Jul 24, 2018 · 5 comments

Comments

@smitpatel
Copy link
Contributor

smitpatel commented Jul 24, 2018

Currently for collection include queries, we do split queries. In order to select only related data from 2nd query, we do inner join with first query. This works fine for all cases when first query is simple. If the first query has any client evaluation then in second query, we actually end up doing inner join on client. Which causes us to fetch the first table twice. Even in the absence of client eval, if the first query involves multiple joins due to filtering & ordering then we would recompute the same thing on server.

At least for the scenarios with single column FK, an alternative could be to use key values for filtering the related data table. It is a bargain between N + 1 queries (which uses single key value) & 2 queries which uses whole first query for key values.
Since we don't want to do N+1, this involves some buffering of results on client side. To give details with an example, suppose our default for buffering size is 100. Then for Customer-Orders query,

  • We run first query to fetch Customers.
  • We iterate over 100 records and generate Customer objects and buffer them internally.
  • Use key values from those buffered results to run 2nd query on Orders table.
  • Combine results from Orders while iterating (the way we do split query include right now) and give back results to customer.

It gives benefit of reusing same SelectExpression for 2nd query multiple times without missing 2nd level Cache(#12777). We don't run out of memory because we would buffer only a chunk of results. For this example we would run N/100 + 1 queries to get all results.
For the scenarios described in first paragraph, it avoids all the issues they causes.

@ajcvickers
Copy link
Member

Added to #12795

@smitpatel
Copy link
Contributor Author

With #12098 , this may not be relevant for relational providers. This could be useful for cosmos providers when doing cross-collection includes
cc: @AndriySvyryd

@ajcvickers ajcvickers modified the milestones: 3.0.0, Backlog Jan 25, 2019
@smitpatel smitpatel removed their assignment Jan 29, 2019
@AndriySvyryd AndriySvyryd changed the title Query: Use of key Values for Collection include query Query: Batched split collection include query (N/B + 1) Nov 27, 2019
@ajcvickers ajcvickers removed this from the Backlog milestone Feb 8, 2024
@ajcvickers
Copy link
Member

@roji is this still relevant?

@roji
Copy link
Member

roji commented Mar 4, 2024

Yeah, this is still relevant - it's a possible optimization for split query. In a nutshell, instead of embedding the principal query in the dependent query (can be very expensive), this would instead just use the keys fetched from the principal query to fetch the dependents.

Note that this would make it impossible to batch the two queries, which is something we can do with the current approach (#10878). So there's a trade-off here between reevaluating the principal query twice and doing two roundtrips - IMHO reevaluating the principal query is pretty clearly worse than the additional roundtrip (given that it can be arbitrarily heavy, and overloads the database rather than just adding transfer times).

We should probably sit down and think more strategically about where we want to go with related entity loading etc.

Note also that this approach is likely the only one possible with at least some non-relational databases, where a complex subquery may not be possible.

@roji roji changed the title Query: Batched split collection include query (N/B + 1) Change split query implementation to fetch dependents by keys, without reevaluating principal query Mar 4, 2024
@ajcvickers ajcvickers added this to the Backlog milestone Mar 6, 2024
@roji
Copy link
Member

roji commented Mar 31, 2024

Note: an advantage of this approach is that there's no more need to specify fully deterministic ordering, since EF would be fetching the dependents directly by their IDs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants