Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: Left Join (GroupJoin) always materializes elements resulting in unnecessary data pulling #6647

Closed
qmicron opened this issue Sep 30, 2016 · 5 comments
Assignees
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-bug
Milestone

Comments

@qmicron
Copy link

qmicron commented Sep 30, 2016

Steps to reproduce

public class Budget
{
    public Guid Id { get; set; }
    public Guid? DescriptionTextId { get; set; }
    public decimal PostCost { get; set; }
    public string PostNr { get; set; }
    public decimal PostPrice { get; set; }
    public decimal Quantity { get; set; }
    public decimal UnitCost { get; set; }
    public string UnitName { get; set; }
    public decimal UnitPrice { get; set; }
    public decimal VatAmount { get; set; }
    public decimal VatBasis { get; set; }
    public decimal VatFreeAmount { get; set; }
    public decimal VatFreePercent { get; set; }
    public decimal VatIncluded { get; set; }
    public decimal VatRate { get; set; }

    public Text DescriptionText { get; set; }
}

public class Text
{
    public Guid Id { get; set; }
    public string Plain { get; set; }
    public string Rtf { get; set; }

    public ICollection<Budget> BudgetDescriptionText { get; set; } = new List<Budget>();
}

public class DataContext : DbContext
{
    public DataContext(DbContextOptions options)
        : base(options)
    {
    }

    public DbSet<Budget> Budget => Set<Budget>();
    public DbSet<Text> Text => Set<Text>();

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Budget>(entity =>
        {
            entity.HasKey(e => e.Id);
            entity.Property(e => e.Id).ValueGeneratedNever();
            entity.Property(e => e.DescriptionTextId);
            entity.Property(e => e.PostCost);
            entity.Property(e => e.PostNr).IsRequired().HasMaxLength(1000);
            entity.Property(e => e.PostPrice);
            entity.Property(e => e.Quantity);
            entity.Property(e => e.UnitCost);
            entity.Property(e => e.UnitName).HasMaxLength(100);
            entity.Property(e => e.UnitPrice);
            entity.Property(e => e.VatAmount);
            entity.Property(e => e.VatBasis);
            entity.Property(e => e.VatFreeAmount);
            entity.Property(e => e.VatFreePercent);
            entity.Property(e => e.VatIncluded);
            entity.Property(e => e.VatRate);
            entity.HasOne(e => e.DescriptionText)
                .WithMany(e => e.BudgetDescriptionText).HasForeignKey(e => e.DescriptionTextId);
        });

        modelBuilder.Entity<Text>(entity =>
        {
            entity.HasKey(e => e.Id);
            entity.Property(e => e.Id).ValueGeneratedNever();
            entity.Property(e => e.Plain);
            entity.Property(e => e.Rtf);
        });
    }
}

The issue

We expect that the following query returns values for two columns

var result = context.Budget
    .Select(p => new
    {
        p.PostCost,
        p.DescriptionText.Plain
    })
    .ToArray();

but in reality, EF generates SQL query that returns all columns.

SELECT 
    [p].[Id], 
    [p].[DescriptionTextId], 
    [p].[PostCost], 
    [p].[PostNr], 
    [p].[PostPrice], 
    [p].[Quantity], 
    [p].[UnitCost], 
    [p].[UnitName], 
    [p].[UnitPrice], 
    [p].[VatAmount], 
    [p].[VatBasis], 
    [p].[VatFreeAmount], 
    [p].[VatFreePercent], 
    [p].[VatIncluded], 
    [p].[VatRate], 
    [p.DescriptionText].[Id], 
    [p.DescriptionText].[Plain], 
    [p.DescriptionText].[Rtf]
FROM 
    [Budget] AS [p]
    LEFT JOIN [Text] AS [p.DescriptionText] ON [p].[DescriptionTextId] = [p.DescriptionText].[Id]
ORDER BY 
    [p].[DescriptionTextId]

Further technical details

EF Core version: 1.0.1
Operating system: Windows 10
Visual Studio version: VS 2015

@smitpatel
Copy link
Contributor

This is due to requiring materialization of both the sides in group join.
DescriptionText is optional navigation hence it will generate a group join query. In group join query changes in the outer elements need to be recorded (bug #6360). This was fixed in #6387 by pulling all columns to figure out the changes in outer element.

@rowanmiller
Copy link
Contributor

@maumar to check if we already have work items covering this

@maumar
Copy link
Contributor

maumar commented Oct 5, 2016

Another commit related to this issue: e514b49 (fix to #4858)

@maumar maumar changed the title Left Join causes unnecessary data pulling Left Join (GroupJoin) always materializes elements resulting in unnecessary data pulling Oct 5, 2016
@maumar maumar removed their assignment Oct 5, 2016
@rowanmiller rowanmiller modified the milestones: 1.2.0, 1.1.0 Oct 6, 2016
@tuespetre
Copy link
Contributor

#7543 covers this.

tuespetre added a commit to tuespetre/EntityFramework that referenced this issue Feb 6, 2017
…ated

SQL

- Resolves dotnet#2341
- Resolves dotnet#5085
- Resolves dotnet#5230
- Resolves dotnet#6618
- Resolves dotnet#6647
- Resolves dotnet#6782
- Resolves dotnet#7080
- Resolves dotnet#7220
- Resolves dotnet#7417
- Resolves dotnet#7497
- Resolves dotnet#7523
- Resolves dotnet#7525
tuespetre added a commit to tuespetre/EntityFramework that referenced this issue Feb 6, 2017
…ated

SQL

- Resolves dotnet#2341
- Resolves dotnet#5085
- Resolves dotnet#5230
- Resolves dotnet#6618
- Resolves dotnet#6647
- Resolves dotnet#6782
- Resolves dotnet#7080
- Resolves dotnet#7220
- Resolves dotnet#7417
- Resolves dotnet#7497
- Resolves dotnet#7523
- Resolves dotnet#7525
tuespetre added a commit to tuespetre/EntityFramework that referenced this issue Feb 6, 2017
…ated

SQL

- Resolves dotnet#2341
- Resolves dotnet#5085
- Resolves dotnet#6618
- Resolves dotnet#6647
- Resolves dotnet#6782
- Resolves dotnet#7080
- Resolves dotnet#7220
- Resolves dotnet#7417
- Resolves dotnet#7497
- Resolves dotnet#7523
- Resolves dotnet#7525
tuespetre added a commit to tuespetre/EntityFramework that referenced this issue Feb 6, 2017
…ated

SQL

- Resolves dotnet#2341
- Resolves dotnet#5085
- Resolves dotnet#6618
- Resolves dotnet#6647
- Resolves dotnet#6782
- Resolves dotnet#7080
- Resolves dotnet#7220
- Resolves dotnet#7417
- Resolves dotnet#7497
- Resolves dotnet#7523
- Resolves dotnet#7525
tuespetre added a commit to tuespetre/EntityFramework that referenced this issue Feb 7, 2017
…ated

SQL

- Resolves dotnet#2341
- Resolves dotnet#5085
- Resolves dotnet#6618
- Resolves dotnet#6647
- Resolves dotnet#6782
- Resolves dotnet#7080
- Resolves dotnet#7220
- Resolves dotnet#7417
- Resolves dotnet#7497
- Resolves dotnet#7523
- Resolves dotnet#7525
maumar added a commit that referenced this issue Mar 3, 2017
…ulting in unnecessary data pulling

Problem was that for GroupJoin we would always force materialization on participating query sources.
We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups.
However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway.

Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters.
We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query.
This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high.

This change also addresses #7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery.

Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre).
Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic)

Also fixed several minor issues that were encountered once no longer do extensive materialization.
maumar added a commit that referenced this issue Mar 4, 2017
…ulting in unnecessary data pulling

Problem was that for GroupJoin we would always force materialization on participating query sources.
We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups.
However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway.

Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters.
We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query.
This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high.

This change also addresses #7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery.

Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre).
Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic)

Also fixed several minor issues that were encountered once no longer do extensive materialization.
maumar added a commit that referenced this issue Mar 10, 2017
…ulting in unnecessary data pulling

Problem was that for GroupJoin we would always force materialization on participating query sources.
We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups.
However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway.

Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters.
We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query.
This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high.

This change also addresses #7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery.

Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre).
Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic)

Also fixed several minor issues that were encountered once no longer do extensive materialization.
maumar added a commit that referenced this issue Mar 10, 2017
…ulting in unnecessary data pulling

Problem was that for GroupJoin we would always force materialization on participating query sources.
We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups.
However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway.

Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters.
We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query.
This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high.

This change also addresses #7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery.

Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre).
Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic)

Also fixed several minor issues that were encountered once no longer do extensive materialization.
maumar added a commit that referenced this issue Mar 10, 2017
…ulting in unnecessary data pulling

Problem was that for GroupJoin we would always force materialization on participating query sources.
We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups.
However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway.

Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters.
We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query.
This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high.

This change also addresses #7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery.

Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre).
Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic)

Also fixed several minor issues that were encountered once no longer do extensive materialization.
maumar added a commit that referenced this issue Mar 10, 2017
…ulting in unnecessary data pulling

Problem was that for GroupJoin we would always force materialization on participating query sources.
We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups.
However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway.

Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters.
We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query.
This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high.

This change also addresses #7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery.

Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre).
Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic)

Also fixed several minor issues that were encountered once no longer do extensive materialization.
maumar added a commit that referenced this issue Mar 10, 2017
…ulting in unnecessary data pulling

Problem was that for GroupJoin we would always force materialization on participating query sources.
We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups.
However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway.

Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters.
We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query.
This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high.

Other bugs fixed alongside the main change:

#7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery.

Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre).
Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic)

#7497 - Error calling Count() after GroupJoin()

(and removed the temporary fix to #7348 and implemented a proper one)
This turned out to be a chicken-and-egg problem - we had discrepancy between query sources that we marked for materialization in case of GroupJoin + result operator, and the actual client side operation that had to be performed.
We had to fix the group join materialization to allow for translation of those result operators, but without fixing this issue also we would get invalid queries in some cases.
Proper fix is to force client result operator only if the GroupJoin cannot be flattened. We also now do this for all result operators, not just a subset because it was possible to generate invalid queries with operaors like Count().
This is now consistent with the behavior of RequiresMaterializationExpressionVisitor.

Also fixed several minor issues that were encountered once no longer do extensive materialization.
maumar added a commit that referenced this issue Mar 11, 2017
…ulting in unnecessary data pulling

Problem was that for GroupJoin we would always force materialization on participating query sources.
We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups.
However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway.

Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters.
We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query.
This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high.

Other bugs fixed alongside the main change:

#7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery.

Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre).
Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic)

#7497 - Error calling Count() after GroupJoin()

(and removed the temporary fix to #7348 and implemented a proper one)
This turned out to be a chicken-and-egg problem - we had discrepancy between query sources that we marked for materialization in case of GroupJoin + result operator, and the actual client side operation that had to be performed.
We had to fix the group join materialization to allow for translation of those result operators, but without fixing this issue also we would get invalid queries in some cases.
Proper fix is to force client result operator only if the GroupJoin cannot be flattened. We also now do this for all result operators, not just a subset because it was possible to generate invalid queries with operaors like Count().
This is now consistent with the behavior of RequiresMaterializationExpressionVisitor.

Also fixed several minor issues that were encountered once no longer do extensive materialization.
maumar added a commit that referenced this issue Mar 14, 2017
…ulting in unnecessary data pulling

Problem was that for GroupJoin we would always force materialization on participating query sources.
We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups.
However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway.

Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters.
We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query.
This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high.

Other bugs fixed alongside the main change:

#7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery.

Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre).
Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic)

#7497 - Error calling Count() after GroupJoin()

(and removed the temporary fix to #7348 and implemented a proper one)
This turned out to be a chicken-and-egg problem - we had discrepancy between query sources that we marked for materialization in case of GroupJoin + result operator, and the actual client side operation that had to be performed.
We had to fix the group join materialization to allow for translation of those result operators, but without fixing this issue also we would get invalid queries in some cases.
Proper fix is to force client result operator only if the GroupJoin cannot be flattened. We also now do this for all result operators, not just a subset because it was possible to generate invalid queries with operaors like Count().
This is now consistent with the behavior of RequiresMaterializationExpressionVisitor.

Also fixed several minor issues that were encountered once no longer do extensive materialization.
maumar added a commit that referenced this issue Mar 15, 2017
…ulting in unnecessary data pulling

Problem was that for GroupJoin we would always force materialization on participating query sources.
We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups.
However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway.

Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters.
We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query.
This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high.

Other bugs fixed alongside the main change:

#7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery.

Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre).
Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic)

#7497 - Error calling Count() after GroupJoin()

(and removed the temporary fix to #7348 and implemented a proper one)
This turned out to be a chicken-and-egg problem - we had discrepancy between query sources that we marked for materialization in case of GroupJoin + result operator, and the actual client side operation that had to be performed.
We had to fix the group join materialization to allow for translation of those result operators, but without fixing this issue also we would get invalid queries in some cases.
Proper fix is to force client result operator only if the GroupJoin cannot be flattened. We also now do this for all result operators, not just a subset because it was possible to generate invalid queries with operaors like Count().
This is now consistent with the behavior of RequiresMaterializationExpressionVisitor.

Also fixed several minor issues that were encountered once no longer do extensive materialization.
@maumar maumar assigned maumar and unassigned anpete Mar 15, 2017
@maumar
Copy link
Contributor

maumar commented Mar 15, 2017

Fixed in 03a990c

@maumar maumar closed this as completed Mar 15, 2017
@maumar maumar added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Mar 15, 2017
@ajcvickers ajcvickers changed the title Left Join (GroupJoin) always materializes elements resulting in unnecessary data pulling Query: Left Join (GroupJoin) always materializes elements resulting in unnecessary data pulling May 9, 2017
@divega divega added closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. and removed closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. labels May 10, 2017
@ajcvickers ajcvickers modified the milestones: 2.0.0-preview1, 2.0.0 Oct 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-bug
Projects
None yet
Development

No branches or pull requests

8 participants