Change split query implementation to fetch dependents by keys, without reevaluating principal query #12776

smitpatel · 2018-07-24T19:16:54Z

Currently for collection include queries, we do split queries. In order to select only related data from 2nd query, we do inner join with first query. This works fine for all cases when first query is simple. If the first query has any client evaluation then in second query, we actually end up doing inner join on client. Which causes us to fetch the first table twice. Even in the absence of client eval, if the first query involves multiple joins due to filtering & ordering then we would recompute the same thing on server.

At least for the scenarios with single column FK, an alternative could be to use key values for filtering the related data table. It is a bargain between N + 1 queries (which uses single key value) & 2 queries which uses whole first query for key values.
Since we don't want to do N+1, this involves some buffering of results on client side. To give details with an example, suppose our default for buffering size is 100. Then for Customer-Orders query,

We run first query to fetch Customers.
We iterate over 100 records and generate Customer objects and buffer them internally.
Use key values from those buffered results to run 2nd query on Orders table.
Combine results from Orders while iterating (the way we do split query include right now) and give back results to customer.

It gives benefit of reusing same SelectExpression for 2nd query multiple times without missing 2nd level Cache(#12777). We don't run out of memory because we would buffer only a chunk of results. For this example we would run N/100 + 1 queries to get all results.
For the scenarios described in first paragraph, it avoids all the issues they causes.

ajcvickers · 2018-07-30T18:03:37Z

Added to #12795

smitpatel · 2019-01-25T06:58:36Z

With #12098 , this may not be relevant for relational providers. This could be useful for cosmos providers when doing cross-collection includes
cc: @AndriySvyryd

ajcvickers · 2024-02-15T15:06:15Z

@roji is this still relevant?

roji · 2024-03-04T10:08:19Z

Yeah, this is still relevant - it's a possible optimization for split query. In a nutshell, instead of embedding the principal query in the dependent query (can be very expensive), this would instead just use the keys fetched from the principal query to fetch the dependents.

Note that this would make it impossible to batch the two queries, which is something we can do with the current approach (#10878). So there's a trade-off here between reevaluating the principal query twice and doing two roundtrips - IMHO reevaluating the principal query is pretty clearly worse than the additional roundtrip (given that it can be arbitrarily heavy, and overloads the database rather than just adding transfer times).

We should probably sit down and think more strategically about where we want to go with related entity loading etc.

Note also that this approach is likely the only one possible with at least some non-relational databases, where a complex subquery may not be possible.

roji · 2024-03-31T04:03:56Z

Note: an advantage of this approach is that there's no more need to specify fully deterministic ordering, since EF would be fetching the dependents directly by their IDs.

smitpatel added the type-enhancement label Jul 24, 2018

smitpatel mentioned this issue Jul 24, 2018

Warning when query materializes an entity outside the top projection #12667

Closed

ajcvickers added this to the 3.0.0 milestone Jul 30, 2018

ajcvickers assigned smitpatel Jul 30, 2018

ajcvickers mentioned this issue Jul 30, 2018

Outline guiding principles and decide on direction for queries in 3.0 #12795

Closed

smitpatel mentioned this issue Aug 27, 2018

OrderBy Random issue #13082

Closed

smitpatel added the query-design label Sep 1, 2018

smitpatel mentioned this issue Nov 16, 2018

Optimize include queries to not rerun original query #13971

Closed

smitpatel removed the query-design label Jan 25, 2019

ajcvickers modified the milestones: 3.0.0, Backlog Jan 25, 2019

smitpatel removed their assignment Jan 29, 2019

AndriySvyryd mentioned this issue Aug 20, 2019

Cosmos: Add Include support #16920

Open

smitpatel added the area-query label Nov 19, 2019

AndriySvyryd changed the title ~~Query: Use of key Values for Collection include query~~ Query: Batched split collection include query (N/B + 1) Nov 27, 2019

smitpatel added the area-cosmos label Mar 16, 2020

smitpatel mentioned this issue May 9, 2020

Significant Query Slowdown When Using Multiple Joins Due To Changes In 3.0 #18022

Closed

AndriySvyryd added the area-perf label Oct 19, 2022

ajcvickers removed this from the Backlog milestone Feb 8, 2024

roji changed the title ~~Query: Batched split collection include query (N/B + 1)~~ Change split query implementation to fetch dependents by keys, without reevaluating principal query Mar 4, 2024

roji mentioned this issue Mar 4, 2024

Batch split queries #10878

Open

ajcvickers added this to the Backlog milestone Mar 6, 2024

roji mentioned this issue Apr 13, 2024

How to check if LINQ query will be translated to one or multiple SQL queries/statements when using AsSplitQuery in application runtime, before executing query in database #33530

Closed

roji mentioned this issue Aug 7, 2024

Possible optimization of split querys with window functions in included entities #34362

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change split query implementation to fetch dependents by keys, without reevaluating principal query #12776

Change split query implementation to fetch dependents by keys, without reevaluating principal query #12776

smitpatel commented Jul 24, 2018 •

edited

Loading

ajcvickers commented Jul 30, 2018

smitpatel commented Jan 25, 2019

ajcvickers commented Feb 15, 2024

roji commented Mar 4, 2024

roji commented Mar 31, 2024

Change split query implementation to fetch dependents by keys, without reevaluating principal query #12776

Change split query implementation to fetch dependents by keys, without reevaluating principal query #12776

Comments

smitpatel commented Jul 24, 2018 • edited Loading

ajcvickers commented Jul 30, 2018

smitpatel commented Jan 25, 2019

ajcvickers commented Feb 15, 2024

roji commented Mar 4, 2024

roji commented Mar 31, 2024

smitpatel commented Jul 24, 2018 •

edited

Loading