Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outline guiding principles and decide on direction for queries in 3.0 #12795

Closed
ajcvickers opened this issue Jul 25, 2018 · 32 comments
Closed

Outline guiding principles and decide on direction for queries in 3.0 #12795

ajcvickers opened this issue Jul 25, 2018 · 32 comments

Comments

@ajcvickers
Copy link
Member

ajcvickers commented Jul 25, 2018

There are several query-related efforts being worked on or considered for the 3.0 release. This issue is a single place to track and discuss these efforts at a high level as we make progress and decide on direction.

Initial areas to consider:

@divega
Copy link
Contributor

divega commented Aug 3, 2018

Including here the seminal client-side evaluation proposal Colin Meek wrote:

Implicit boundaries in LINQ to Entities (client-side evaluation)

Overview

LINQ forces us to blur the boundary between the server and the client. For a provider like LINQ to Entities, this means that a query supplied to the stack is always partly evaluated in the client application and partly in the database server. For instance, the query

int productId = 1;
var q =
    from p in context.DbSet<Product>()
    where p.ProductID == productId
    where DetermineProductPriority(p) == "High"
    select new XElement(
        "Product",
        new XAttribute("Name", p.Name),
        new XAttribute("ID", p.ProductId));

includes a selection evaluated by the server (p.ProductId == @productId), but also client expressions, e.g. the binding of the free variable productId and materialization of XML nodes in the projection. It also includes a call to a client-side method on a predicate over a server correlated expression, something that is not supported by either LINQ to Entities or in most LINQ to * implementations.

While we have found it convenient to talk about LINQ to Entities as a strict implementation – all server or nothing – and other implementations such as LINQ to SQL as hybrid implementations – splitting the query into client and server expressions – these implementations really exist along a continuum, and it makes sense to examine this continuum in more detail to clarify the current behavior and how we could improve EF.

It is convenient to discuss this continuum with respect to the following expression scopes:

  • Independent sub-expressions: in the above query, the access to the free variable productId is compiled into something like Expression.Field(Expression.Constant(CS$<>8__locals6), fieldof(<>c__DisplayClass5.productId)), which does not depend on the current scope of the query (i.e. it is not correlated to the contents of any table in the database). The process of identifying and evaluating these independent sub-expressions has become known as expression funcletization at Microsoft.
  • Client sources: LINQ queries are typically bootstrapped by IQueryable roots, e.g. dbContext.DbSet<Product>() in the above example.
  • Client projections: while client sources introduce typed iterators as the root or roots for remote queries, client projections close the loop by shaping typed query results. In the above example, entity results are shaped into XML nodes, new XElement…
  • Dependent sub-expressions: certain sub-expressions can only be evaluated by the client, e.g. DetermineProductPriority…, but depend on intermediate results from the server, e.g. DetermineProductPriority(<u>p</u>).

Note that these categories are somewhat arbitrary. A client source is a kind of independent sub-expression and a dependent sub-expression is in some ways just a generalization of client projections. The categories are still intuitively useful however: independent sub-expressions often map to parameters while client sources mostly map to scans, and; client projections are frequently benign while dependent sub-expressions are often cause for concern (consider high selectivity filters).

Implicit boundaries are dangerous but also an essential feature of LINQ. Finding the appropriate balance is important. From feedback we have received over the years, we know that users expect magic and may be disappointed if we either throw because we’re overly strict or we end up streaming 1,000,000 rows from the database into the client to find the 10 rows matching a client predicate because we were too loose.

Independent sub-expressions and client sources

Options:

  • Necessary: free variables and constants are unavoidable. We need to allow expressions that represent access to field, property and constant in order to implement a viable LINQ provider.
  • Literals: the LINQ equivalent of value literals. Allow constants (1), primitive type constructors (new DateTime(2008, 5, 28)) and even array initialization patterns ( new int[] {…}).
  • Root construction: currently LINQ to Entities only supports some forms of roots inside the query, e.g. context.Products is recognized, but unfortunately inline construction of query roots is not, e.g. dbContext.DbSet<T>() and context.CreateQuery<T>(string) are not recognized.
  • Server-unsupported expressions that can be converted to server parameters: currently LINQ to Entities will throw if it finds any expression that it cannot evaluate on the server in a query. If we could turn an independent expression into a query parameter and we know that no part of the expression could ever be evaluated by the server, at least with the same semantics, we could funcletize it. There is an interesting challenge on deciding how much of a sub-expression we can funcletize. For instance, consider the expression stringBuilder.ToString().Length. While the stringBuilder.ToString() part can only be evaluated on the client, .Length could either be evaluated on the client alongside the rest of the expression or translated to LEN(@param) in the store. LEN() in SQL Server has subtly different semantics form string.Lenght in the CLR in that it ignores trailing blanks. We have three options on what we can funcletize:
    a. The minimal sub-expression that cannot be evaluated on the server
    b. The minimal sub-expression that cannot be evaluated on the client plus any expression that can be evaluated on either the client or the server with identical semantics
    c. The maximal sub-expression that can be evaluated on the client
  • Options (a) and (b) guarantee that at least for a particular server, all occurrences of such expressions would have consistent semantics, regardless of where they appear in the query. The alternative to this is to evaluate. Option (c) would imply that any sub-expression that can be turned into a parameter would be evaluated on the client. This is the most flexible approach but means we apply inconsistent semantics to some operators depending on which side of the boundary they find themselves on.
  • Server-unsupported expressions that cannot be turned into server parameters: We should also explore these. There are interesting solutions where you pipe values through the query to the result, which works when the value is never cracked or cracked at need on the client. This greatly increases the cost of the feature for LINQ to Entities which would need to introduce its own intermediate metadata representation for such values.

Client projection

Options:

  • Composable: projections that can be composed within a query are supported as client projections. For instance, entities, complex types and “rows” can be projected but arbitrary method calls or constructors cannot.
  • Non-composable: top-level projections could include arbitrary method calls and constructors. For efficiency, reverse funcletization occurs for the client projection, e.g. select ClientMethod1(ClientMethod2(e.X, e.Y), e.Z) becomes select new { e.X, e.Y, e.Z } into f select ClientMethod1(ClientMethod2(f.X, f.Y), f.Z).

Note that method calls may introduce additional round-trips to the server. Before people start shouting about “nanny state APIs”, consider that users are not complaining that they wanted the round-trips but that we failed to crack the methods to figure out how to avoid them…

Dependent sub-expressions

What happens when a sub-expression cannot be evaluated by the server but depends on intermediate results?

  • Whenever an unsupported expression is encountered, we could simply split the query at that point (modulo the kinds of local optimizations described for client optimization).
  • We should attempt to push as much server logic “down” the tree as possible to minimize the amount of work in the client. This is critical where joins, selections and even some projections are involved.

Interface considerations

If we implement support for these patterns, we should also consider allowing the user to disable them. The user can exercise whatever level of control they want over the client-server partitioning of the query. In addition, we should make the partitioned plan visible to the user, either by using documented boundary expressions or through a debugger visualizer.

Implementation considerations

We can include a separate pass to identify supported and unsupported expressions in the query tree, similar to other LINQ implementations.

@tuespetre
Copy link
Contributor

I love the example with new XElement(...) and immediately see the potential translation into FOR XML PATH. 😉

As for architectural changes and managing query bugs, I feel like there is a lot I could say but I don't know how effective I would be at communicating it.

@ajcvickers
Copy link
Member Author

@tuespetre I'm pretty sure we will want to talk to you about some of this stuff, so stay tuned. :-)

@pmiddleton
Copy link
Contributor

@ajcvickers - In regards to looking at ReLinq. Is #12048 the driving issue behind that, or are there other things driving it as well?

It seems like a major undertaking to remove/replace it given how the tightly the query system is architected around it with a lot of risks for breaking things.

@ajcvickers
Copy link
Member Author

@pmiddleton Other things too, And yes, I agree that it is risky; that is one of the considerations. Sorry for being a bit ambiguous here. Like I said, stay tuned. Don't be impatient. 😉

@pmiddleton
Copy link
Contributor

@ajcvickers - I have the open PR for TVF and am currently working on a pivot feature. Both have tie-ins to ReLinq so the possible change peaked my interest as there might be some rework required on my part. :)

@ajcvickers
Copy link
Member Author

@pmiddleton Agreed.

@tuespetre
Copy link
Contributor

@ajcvickers Heyyy, I’m onto you 👀

@divega
Copy link
Contributor

divega commented Sep 7, 2019

@smitpatel, @ajcvickers any need to keep this open now?

@smitpatel
Copy link
Contributor

I will write certain notes overall about things and close it. Or I can add it to the docs and close the issue with reference to it.

@divega
Copy link
Contributor

divega commented Sep 7, 2019

Adding a version of this with your notes to a “query architecture” section in the docs sounds great. I guess we can create a docs issue for that and close this anyway.

@divega divega removed their assignment Sep 18, 2019
@jjxtra
Copy link

jjxtra commented Oct 12, 2019

Even simple group by is broken in ef 3, why is this? Seems like a group by should translate to sql statement just fine...

@linkerro
Copy link

I second that question and also wonder how this situation makes any sort of sense.
This makes me feel miffed, very miffed.

@smitpatel
Copy link
Contributor

@jjxtra @linkerro - Did you guys look at #17068?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants