-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query: Left Join (GroupJoin) always materializes elements resulting in unnecessary data pulling #6647
Comments
This is due to requiring materialization of both the sides in group join. |
@maumar to check if we already have work items covering this |
#7543 covers this. |
…ated SQL - Resolves dotnet#2341 - Resolves dotnet#5085 - Resolves dotnet#5230 - Resolves dotnet#6618 - Resolves dotnet#6647 - Resolves dotnet#6782 - Resolves dotnet#7080 - Resolves dotnet#7220 - Resolves dotnet#7417 - Resolves dotnet#7497 - Resolves dotnet#7523 - Resolves dotnet#7525
…ated SQL - Resolves dotnet#2341 - Resolves dotnet#5085 - Resolves dotnet#5230 - Resolves dotnet#6618 - Resolves dotnet#6647 - Resolves dotnet#6782 - Resolves dotnet#7080 - Resolves dotnet#7220 - Resolves dotnet#7417 - Resolves dotnet#7497 - Resolves dotnet#7523 - Resolves dotnet#7525
…ated SQL - Resolves dotnet#2341 - Resolves dotnet#5085 - Resolves dotnet#6618 - Resolves dotnet#6647 - Resolves dotnet#6782 - Resolves dotnet#7080 - Resolves dotnet#7220 - Resolves dotnet#7417 - Resolves dotnet#7497 - Resolves dotnet#7523 - Resolves dotnet#7525
…ated SQL - Resolves dotnet#2341 - Resolves dotnet#5085 - Resolves dotnet#6618 - Resolves dotnet#6647 - Resolves dotnet#6782 - Resolves dotnet#7080 - Resolves dotnet#7220 - Resolves dotnet#7417 - Resolves dotnet#7497 - Resolves dotnet#7523 - Resolves dotnet#7525
…ated SQL - Resolves dotnet#2341 - Resolves dotnet#5085 - Resolves dotnet#6618 - Resolves dotnet#6647 - Resolves dotnet#6782 - Resolves dotnet#7080 - Resolves dotnet#7220 - Resolves dotnet#7417 - Resolves dotnet#7497 - Resolves dotnet#7523 - Resolves dotnet#7525
…ulting in unnecessary data pulling Problem was that for GroupJoin we would always force materialization on participating query sources. We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups. However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway. Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters. We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query. This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high. This change also addresses #7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery. Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre). Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic) Also fixed several minor issues that were encountered once no longer do extensive materialization.
…ulting in unnecessary data pulling Problem was that for GroupJoin we would always force materialization on participating query sources. We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups. However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway. Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters. We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query. This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high. This change also addresses #7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery. Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre). Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic) Also fixed several minor issues that were encountered once no longer do extensive materialization.
…ulting in unnecessary data pulling Problem was that for GroupJoin we would always force materialization on participating query sources. We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups. However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway. Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters. We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query. This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high. This change also addresses #7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery. Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre). Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic) Also fixed several minor issues that were encountered once no longer do extensive materialization.
…ulting in unnecessary data pulling Problem was that for GroupJoin we would always force materialization on participating query sources. We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups. However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway. Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters. We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query. This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high. This change also addresses #7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery. Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre). Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic) Also fixed several minor issues that were encountered once no longer do extensive materialization.
…ulting in unnecessary data pulling Problem was that for GroupJoin we would always force materialization on participating query sources. We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups. However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway. Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters. We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query. This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high. This change also addresses #7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery. Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre). Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic) Also fixed several minor issues that were encountered once no longer do extensive materialization.
…ulting in unnecessary data pulling Problem was that for GroupJoin we would always force materialization on participating query sources. We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups. However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway. Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters. We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query. This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high. This change also addresses #7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery. Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre). Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic) Also fixed several minor issues that were encountered once no longer do extensive materialization.
…ulting in unnecessary data pulling Problem was that for GroupJoin we would always force materialization on participating query sources. We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups. However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway. Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters. We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query. This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high. Other bugs fixed alongside the main change: #7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery. Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre). Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic) #7497 - Error calling Count() after GroupJoin() (and removed the temporary fix to #7348 and implemented a proper one) This turned out to be a chicken-and-egg problem - we had discrepancy between query sources that we marked for materialization in case of GroupJoin + result operator, and the actual client side operation that had to be performed. We had to fix the group join materialization to allow for translation of those result operators, but without fixing this issue also we would get invalid queries in some cases. Proper fix is to force client result operator only if the GroupJoin cannot be flattened. We also now do this for all result operators, not just a subset because it was possible to generate invalid queries with operaors like Count(). This is now consistent with the behavior of RequiresMaterializationExpressionVisitor. Also fixed several minor issues that were encountered once no longer do extensive materialization.
…ulting in unnecessary data pulling Problem was that for GroupJoin we would always force materialization on participating query sources. We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups. However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway. Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters. We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query. This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high. Other bugs fixed alongside the main change: #7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery. Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre). Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic) #7497 - Error calling Count() after GroupJoin() (and removed the temporary fix to #7348 and implemented a proper one) This turned out to be a chicken-and-egg problem - we had discrepancy between query sources that we marked for materialization in case of GroupJoin + result operator, and the actual client side operation that had to be performed. We had to fix the group join materialization to allow for translation of those result operators, but without fixing this issue also we would get invalid queries in some cases. Proper fix is to force client result operator only if the GroupJoin cannot be flattened. We also now do this for all result operators, not just a subset because it was possible to generate invalid queries with operaors like Count(). This is now consistent with the behavior of RequiresMaterializationExpressionVisitor. Also fixed several minor issues that were encountered once no longer do extensive materialization.
…ulting in unnecessary data pulling Problem was that for GroupJoin we would always force materialization on participating query sources. We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups. However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway. Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters. We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query. This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high. Other bugs fixed alongside the main change: #7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery. Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre). Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic) #7497 - Error calling Count() after GroupJoin() (and removed the temporary fix to #7348 and implemented a proper one) This turned out to be a chicken-and-egg problem - we had discrepancy between query sources that we marked for materialization in case of GroupJoin + result operator, and the actual client side operation that had to be performed. We had to fix the group join materialization to allow for translation of those result operators, but without fixing this issue also we would get invalid queries in some cases. Proper fix is to force client result operator only if the GroupJoin cannot be flattened. We also now do this for all result operators, not just a subset because it was possible to generate invalid queries with operaors like Count(). This is now consistent with the behavior of RequiresMaterializationExpressionVisitor. Also fixed several minor issues that were encountered once no longer do extensive materialization.
…ulting in unnecessary data pulling Problem was that for GroupJoin we would always force materialization on participating query sources. We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups. However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway. Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters. We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query. This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high. Other bugs fixed alongside the main change: #7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery. Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre). Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic) #7497 - Error calling Count() after GroupJoin() (and removed the temporary fix to #7348 and implemented a proper one) This turned out to be a chicken-and-egg problem - we had discrepancy between query sources that we marked for materialization in case of GroupJoin + result operator, and the actual client side operation that had to be performed. We had to fix the group join materialization to allow for translation of those result operators, but without fixing this issue also we would get invalid queries in some cases. Proper fix is to force client result operator only if the GroupJoin cannot be flattened. We also now do this for all result operators, not just a subset because it was possible to generate invalid queries with operaors like Count(). This is now consistent with the behavior of RequiresMaterializationExpressionVisitor. Also fixed several minor issues that were encountered once no longer do extensive materialization.
Fixed in 03a990c |
Steps to reproduce
The issue
We expect that the following query returns values for two columns
but in reality, EF generates SQL query that returns all columns.
Further technical details
EF Core version: 1.0.1
Operating system: Windows 10
Visual Studio version: VS 2015
The text was updated successfully, but these errors were encountered: