Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/8.0] Fix to #32911 - Incorrect apply projection for complex properties #33212

Merged
merged 1 commit into from
Mar 7, 2024

Conversation

maumar
Copy link
Contributor

@maumar maumar commented Mar 1, 2024

Port of #33020 and #33200
Fixes #32911

Description

Some context:
pushdown is process we do during query translation, where a query itself becomes a source of another query above.

SELECT c.Id, c.Name
FROM Customers AS c
WHERE c.Name = 'John'

becomes

SELECT subquery.Name
FROM (
    SELECT c.Id, c.Name
    FROM Customers AS c
    WHERE c.Name = 'John') AS subquery

We need to do this for various operations like Skip/Take/Distinct. Lifting projection is the process of extracting elements in the SELECT from the inner query (SELECT c.Id, c.Name) to the outer (SELECT subquery.Name).

Applying projection is the process of actually populating SELECT with the column values. Initially we generate empty projection (SELECT 1) and try to populate values as late as possible - so we avoid populating with columns that are not needed in the final result.

Problem was that when we lift structural type projection during pushdown, if that projection contains complex types with the same column names (e.g. cross join of same entities - it's perfectly fine to do in a vacuum, we just alias the columns whose names are repeated) we would lift the projection incorrectly. What we do is go through all the properties, apply the corresponding columns to the selectExpression if needed and generate StructuralTypeProjection object if the projection needs to be applied one level up. For complex types we would generate a shaper expression and then run it through the same process, BUT the nested complex properties would be added to a flat structure along with the primitive properties, rather than in separate cache dedicated for complex property shapers. This was wrong and not what we expected to see, when processing this structure one level up (i.e. when applying projection to the outer select)

SELECT [applying_this_projection_was_wrong]
FROM
(
    SELECT c.Id, c.Name, c.ComplexProp, o.ComplexProp as ComplexProp0
    FROM Customers as c
    JOIN Orders as o ON ...
) as s

i.e. applying projection once worked fine, but doing it second time did not. The reason why is that we expected to see information about the complex type shape in the complex property shaper cache, rather than flat structure for primitives, but it wasn't there. So we assumed this is the first time we the projection is being applied, so we conjure up the complex type shaper based on table alias and IColumn metadata. This results in a situation, where complex property that was aliased is never picked. So we end up with:

SELECT s.Id, s.Name, s.ComplexProp -- we would also try to add s.ComplexProp again, instead of s.ComplexProp0 but of course we don't add same thing twice FROM
(
    SELECT c.Id, c.Name, c.ComplexProp, o.ComplexProp as ComplexProp0
    FROM Customers as c
    JOIN Orders as o ON ...
) as s

This leads to bad data - two different objects with distinct data in them are mapped to the same column in the database.

Fix is to properly build a complex type shaper structure when applying projection instead, so the structure we generate matches expectations.

Also modified VisitChildren and MakeNullable methods on StructuralTypeProjectionExpression to process/preserve complex type cache information, which was previously gobbled up/ignored.

Customer impact
Customers using new "complex types" feature in their queries can be affected by data corruption in several different scenarios. The bug involves two or more complex types with the same structure and the query must be nested. For general case, there is no good workaround, apart from significantly rewriting the query. However some scenarios can be worked-around by slightly altering the model (renaming columns on the complex types so that they differ). Multiple customers were hit by this problem.

"Please consider backporting the fix as this bug is a showstopper for our large, long-standing LOB application. This is the last issue related to our EF6 -> EF Core port."

How found
Multiple customer reports on 8.

Regression
No, complex type support is a new feature in EF Core 8.

Testing
Extensive tests added.

Risk
Medium. This fix is definitely riskier than what we usually try to patch for EF queries. There was a design flaw in how we processed complex types, so the design had to be altered. Fix is isolated to scenarios involving complex types. When no complex types are present the code behaves exactly the same as before. However, majority of complex types scenarios are going through the new code path and potentially could be affected by unintended consequences. Added quirk as extra precaution.

@maumar maumar requested a review from roji March 1, 2024 02:27
@AndriySvyryd
Copy link
Member

Add back the examples from the description in #33020 and try to make the description more readable, even if it's longer

Problem was that when we lift structural type projection during pushdown, if that projection contains complex types with the same column names (e.g. cross join of same entities - it's perfectly fine to do in a vacuum, we just alias the columns whose names are repeated) we would lift the projection incorrectly.
What we do is go through all the properties, apply the corresponding columns to the selectExpression if needed and generate StructuralTypeProjection object if the projection needs to be applied one level up. For complex types we would generate a shaper expression and then run it through the same process, BUT the nested complex properties would be added to a flat structure along with the primitive properties, rather than in separate cache dedicated for complex property shapers.
This was wrong and not what we expected to see, when processing this structure one level up (i.e. when applying projection to the outer select)

SELECT [applying_this_projection_was_wrong]
FROM
(
    SELECT c.Id, c.Name, c.ComplexProp, o.ComplexProp as ComplexProp0
    FROM Customers as c
    JOIN Orders as o ON ...
) as s

i.e. applying projection once worked fine, but doing it second time did not. The reason why is that we expected to see information about the complex type shape in the complex property shaper cache, rather than flat structure for primitives, but it wasn't there. So we assumed this is the first time we the projection is being applied, so we conjure up the complex type shaper based on table alias and IColumn metadata. This results in a situation, where complex property that was aliased is never picked. So we end up with:

SELECT s.Id, s.Name, s.ComplexProp -- we would also try to add s.ComplexProp again, instead of s.ComplexProp0 but of course we don't add same thing twice
FROM
(
    SELECT c.Id, c.Name, c.ComplexProp, o.ComplexProp as ComplexProp0
    FROM Customers as c
    JOIN Orders as o ON ...
) as s

This leads to bad data - two different objects with distinct data in them are mapped to the same column in the database.

Fix is to property build a complex type shaper structure when applying projection instead, so the structure we generate matches expectations. Also modified VisitChildren and MakeNullable methods on StructuralTypeProjectionExpression to process/preserve complex type cache information, which was previously gobbled up/ignored.

Fixes #32911
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants