Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query : client-side result operators may cause extensive client-side evaluation for queries with GroupJoins (and/or optional navigations) in a subquery #7476

Closed
maumar opened this issue Jan 25, 2017 · 6 comments
Assignees
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-bug
Milestone

Comments

@maumar
Copy link
Contributor

maumar commented Jan 25, 2017

query:

from o in context.Orders
where (
   from od in context.OrderDetails
   where o.Customer.Country == od.Order.Customer.Country
   select od).Distinct().Count() > 0
select o;

produces the following query plan:

TRACKED: True
(QueryContext queryContext) => IEnumerable<Order> _Select(
    source: IEnumerable<TransparentIdentifier<Order, Customer>> _Where(
        source: IEnumerable<TransparentIdentifier<Order, Customer>> _ShapedQuery(
            queryContext: queryContext, 
            shaperCommandContext: SelectExpression: 
                SELECT [o].[OrderID], [o].[CustomerID], [o].[EmployeeID], [o].[OrderDate], [o.Customer].[CustomerID], [o.Customer].[Address], [o.Customer].[City], [o.Customer].[CompanyName], [o.Customer].[ContactName], [o.Customer].[ContactTitle], [o.Customer].[Country], [o.Customer].[Fax], [o.Customer].[Phone], [o.Customer].[PostalCode], [o.Customer].[Region]
                FROM [Orders] AS [o]
                LEFT JOIN [Customers] AS [o.Customer] ON [o].[CustomerID] = [o.Customer].[CustomerID]
                ORDER BY [o].[CustomerID]
            , 
            shaper: CompositeShaper`3
        )
        , 
        predicate: (TransparentIdentifier<Order, Customer> t1) => int Count(
            source: IEnumerable<ValueBuffer> Distinct(
                source: IEnumerable<ValueBuffer> _Select(
                    source: IEnumerable<TransparentIdentifier<TransparentIdentifier<ValueBuffer, ValueBuffer>, IEnumerable<Customer>>> _Where(
                        source: IEnumerable<TransparentIdentifier<TransparentIdentifier<ValueBuffer, ValueBuffer>, IEnumerable<Customer>>> _GroupJoin(
                            queryContext: (RelationalQueryContext) queryContext, 
                            source: IEnumerable<ValueBuffer> _Query(
                                queryContext: queryContext, 
                                shaperCommandContext: SelectExpression: 
                                    SELECT [od.Order.Customer1].[CustomerID], [od.Order.Customer1].[Address], [od.Order.Customer1].[City], [od.Order.Customer1].[CompanyName], [od.Order.Customer1].[ContactName], [od.Order.Customer1].[ContactTitle], [od.Order.Customer1].[Country], [od.Order.Customer1].[Fax], [od.Order.Customer1].[Phone], [od.Order.Customer1].[PostalCode], [od.Order.Customer1].[Region]
                                    FROM [Order Details] AS [od0]
                                    INNER JOIN [Orders] AS [od.Order0] ON [od0].[OrderID] = [od.Order0].[OrderID]
                                    LEFT JOIN [Customers] AS [od.Order.Customer1] ON [od.Order0].[CustomerID] = [od.Order.Customer1].[CustomerID]
                                    ORDER BY [od.Order0].[CustomerID]
                                , 
                                queryIndex: default(System.Nullable`1[System.Int32])
                            )
                            , 
                            outerShaper: CompositeShaper`3, 
                            innerShaper: BufferedOffsetEntityShaper<Customer>, 
                            innerKeySelector: (Customer od.Order.Customer) => string GetValueFromEntity(
                                clrPropertyGetter: ClrPropertyGetter`2, 
                                entity: od.Order.Customer
                            )
                            , 
                            resultSelector: (TransparentIdentifier<ValueBuffer, ValueBuffer> t0 | IEnumerable<Customer> od.Order.Customer_group) => TransparentIdentifier<TransparentIdentifier<ValueBuffer, ValueBuffer>, IEnumerable<Customer>> CreateTransparentIdentifier(
                                outer: t0, 
                                inner: od.Order.Customer_group
                            )
                            , 
                            outerGroupJoinInclude: default(Internal.GroupJoinInclude), 
                            innerGroupJoinInclude: default(Internal.GroupJoinInclude)
                        )
                        , 
                        predicate: (TransparentIdentifier<TransparentIdentifier<ValueBuffer, ValueBuffer>, IEnumerable<Customer>> t1) => (?t1.Inner.Country?) == (?od.Order.Customer.Country?)
                    )
                    , 
                    selector: (TransparentIdentifier<TransparentIdentifier<ValueBuffer, ValueBuffer>, IEnumerable<Customer>> t1) => t1.Outer.Outer
                )
            )
        )
         > 0
    )
    , 
    selector: (TransparentIdentifier<Order, Customer> t1) => t1.Outer
)

However, very similar query (Distinct swapped for Take):

from o in context.Orders
where (
   from od in context.OrderDetails
   where o.Customer.Country == od.Order.Customer.Country
   select od).Take(10).Count() > 0
select o;

produces much more effective query plan:

TRACKED: True
(QueryContext queryContext) => IEnumerable<Order> _Select(
    source: IEnumerable<TransparentIdentifier<Order, Customer>> _ShapedQuery(
        queryContext: queryContext, 
        shaperCommandContext: SelectExpression: 
            SELECT [o].[OrderID], [o].[CustomerID], [o].[EmployeeID], [o].[OrderDate], [o.Customer].[CustomerID], [o.Customer].[Address], [o.Customer].[City], [o.Customer].[CompanyName], [o.Customer].[ContactName], [o.Customer].[ContactTitle], [o.Customer].[Country], [o.Customer].[Fax], [o.Customer].[Phone], [o.Customer].[PostalCode], [o.Customer].[Region]
            FROM [Orders] AS [o]
            LEFT JOIN [Customers] AS [o.Customer] ON [o].[CustomerID] = [o.Customer].[CustomerID]
            WHERE (
                SELECT COUNT(*)
                FROM (
                    SELECT TOP(10) [od.Order.Customer].[CustomerID], [od.Order.Customer].[Address], [od.Order.Customer].[City], [od.Order.Customer].[CompanyName], [od.Order.Customer].[ContactName], [od.Order.Customer].[ContactTitle], [od.Order.Customer].[Country], [od.Order.Customer].[Fax], [od.Order.Customer].[Phone], [od.Order.Customer].[PostalCode], [od.Order.Customer].[Region]
                    FROM [Order Details] AS [od]
                    INNER JOIN [Orders] AS [od.Order] ON [od].[OrderID] = [od.Order].[OrderID]
                    LEFT JOIN [Customers] AS [od.Order.Customer] ON [od.Order].[CustomerID] = [od.Order.Customer].[CustomerID]
                    WHERE ([o.Customer].[Country] = [od.Order.Customer].[Country]) OR ([o.Customer].[Country] IS NULL AND [od.Order.Customer].[Country] IS NULL)
                    ORDER BY [od.Order].[CustomerID]
                ) AS [t]
            ) > 0
            ORDER BY [o].[CustomerID]
        , 
        shaper: CompositeShaper`3
    )
    , 
    selector: (TransparentIdentifier<Order, Customer> t1) => t1.Outer
)
@maumar
Copy link
Contributor Author

maumar commented Jan 25, 2017

Problem here is that when we visit subquery model we optimize GroupJoins out (remove SelectMany-DefaultIfEmpty clauses that follow GroupJoin), however when we encounter client side (result) operator throw all that work away and do a second pass (on the already modified query model). Now, because the shape of query model is different, we fail to recognize the GroupJoin-SelectMany-DefaultIfEmpty pattern, which results in GroupJoin being performed on the client.

This issue was likely present before, but was exacerbated by #7348 (where we forced client evaluation on some result operators for GroupJoin scenarios)

@maumar maumar changed the title Query : client-side result operators may cause extensive client-side evaluation for queries with GroupJoins (and/or optional navigations) Query : client-side result operators may cause extensive client-side evaluation for queries with GroupJoins (and/or optional navigations) in a subquery Jan 25, 2017
maumar added a commit that referenced this issue Jan 27, 2017
…ve client-side evaluation for queries with GroupJoins (and/or optional navigations) in a subquery

When we translate GroupJoin-SelectMany-DefaultIfEmpty we modify query model - we remove additional from clause containing DefaultIfEmpty subquery.
However, if we later discover that the subquery (or a subquery higher in the stack) needs client evaluation - e.g. by having client result operator, or calling a client-side method - we need to re-visit this subquery with client evaluation in mind.

Problem was that now the query model has been modified - in this specific case SelectMany-DefaultIfEmpty clause has been removed, so we can no longer recognize that this pattern can be simplified and we introduce client GroupJoin.

Fix is to save the structure of query model (and all subquery models that it contains) before we visit it - and then, if client evaluation is needed, restore the query model to it's original shape.
maumar added a commit that referenced this issue Jan 27, 2017
…ve client-side evaluation for queries with GroupJoins (and/or optional navigations) in a subquery

When we translate GroupJoin-SelectMany-DefaultIfEmpty we modify query model - we remove additional from clause containing DefaultIfEmpty subquery.
However, if we later discover that the subquery (or a subquery higher in the stack) needs client evaluation - e.g. by having client result operator, or calling a client-side method - we need to re-visit this subquery with client evaluation in mind.

Problem was that now the query model has been modified - in this specific case SelectMany-DefaultIfEmpty clause has been removed, so we can no longer recognize that this pattern can be simplified and we introduce client GroupJoin.

Fix is to save the structure of query model (and all subquery models that it contains) before we visit it - and then, if client evaluation is needed, restore the query model to it's original shape.
maumar added a commit that referenced this issue Jan 27, 2017
…ve client-side evaluation for queries with GroupJoins (and/or optional navigations) in a subquery

When we translate GroupJoin-SelectMany-DefaultIfEmpty we modify query model - we remove additional from clause containing DefaultIfEmpty subquery.
However, if we later discover that the subquery (or a subquery higher in the stack) needs client evaluation - e.g. by having client result operator, or calling a client-side method - we need to re-visit this subquery with client evaluation in mind.

Problem was that now the query model has been modified - in this specific case SelectMany-DefaultIfEmpty clause has been removed, so we can no longer recognize that this pattern can be simplified and we introduce client GroupJoin.

Fix is to save the structure of query model (and all subquery models that it contains) before we visit it - and then, if client evaluation is needed, restore the query model to it's original shape.
@maumar maumar self-assigned this Jan 27, 2017
@maumar
Copy link
Contributor Author

maumar commented Jan 27, 2017

Adding to 1.1.1 since we need to fix this before we can tackle #7290

@maumar maumar modified the milestone: 1.1.1 Jan 27, 2017
@rowanmiller rowanmiller added this to the 1.1.1 milestone Jan 31, 2017
@maumar
Copy link
Contributor Author

maumar commented Jan 31, 2017

@divega
Justification to include this in 1.1.1 release:

Impact: Medium. This is effectively the same issue as #7198, which is a breaking change introduced in 1.1.0 and was reported by a customer. Issue results is inefficient queries for simple cases and compilation errors for complex ones (no data corruption).

Risk: Low. Fix is local, we are undoing modifications to query model that we perform during query processing, if the same query model needs to be processed again (effectively “repairing” query model from its corrupted state)

maumar added a commit that referenced this issue Feb 3, 2017
…ve client-side evaluation for queries with GroupJoins (and/or optional navigations) in a subquery

When we translate GroupJoin-SelectMany-DefaultIfEmpty we modify query model - we remove additional from clause containing DefaultIfEmpty subquery.
However, if we later discover that the subquery (or a subquery higher in the stack) needs client evaluation - e.g. by having client result operator, or calling a client-side method - we need to re-visit this subquery with client evaluation in mind.

Problem was that now the query model has been modified - in this specific case SelectMany-DefaultIfEmpty clause has been removed, so we can no longer recognize that this pattern can be simplified and we introduce client GroupJoin.

Fix is to save the structure of query model (and all subquery models that it contains) before we visit it - and then, if client evaluation is needed, restore the query model to it's original shape.
@maumar
Copy link
Contributor Author

maumar commented Feb 3, 2017

fixed in c8d984e

@maumar maumar added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Feb 3, 2017
@Eilon
Copy link
Member

Eilon commented Feb 8, 2017

This patch bug is approved. Please use the normal code review process w/ a PR and make sure the fix is in the correct branch, then close the bug and mark it as done.

@anpete
Copy link
Contributor

anpete commented Feb 23, 2017

✅ Verified

maumar added a commit that referenced this issue Mar 22, 2017
…or complex queries with with subqueries, if the same subquery is present multiple times in the query model

Problem was that when we fixed #7476 we introduced a mechanism to copy and then restore query models to prevent them from being corrupted during (failed) attempt at translating them to SQL.
However we have not accounted for the case where the same subquery would be present in the query model more than once (same reference).

Fix is to add defensive check to the logic that creates a copy of a query model. If we already copied that particular (sub)query model, we don't need to do it twice. It is safe to do, because all occurrences of the query model that share the same reference are guaranteed to have the same body clauses and therefore can later be re-created from the same copy.
maumar added a commit that referenced this issue Mar 23, 2017
…or complex queries with with subqueries, if the same subquery is present multiple times in the query model

Problem was that when we fixed #7476 we introduced a mechanism to copy and then restore query models to prevent them from being corrupted during (failed) attempt at translating them to SQL.
However we have not accounted for the case where the same subquery would be present in the query model more than once (same reference).

Fix is to add defensive check to the logic that creates a copy of a query model. If we already copied that particular (sub)query model, we don't need to do it twice. It is safe to do, because all occurrences of the query model that share the same reference are guaranteed to have the same body clauses and therefore can later be re-created from the same copy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-bug
Projects
None yet
Development

No branches or pull requests

5 participants