-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query compilation error or data corruption for queries with GroupJoin wrapped in a subquery when the subset of columns is being projected #7290
Comments
@maumar this could be related to another 1.1.1 issue you are working on. |
turns out to be unrelated to #7367 |
Problem here is that our current mechanism to find the groupjoin sources that needs to be materialized only works on top level query. If the query has subqueries containing groupjoins, they will not be marked for materialization and cause the error. The simple fix would be to introduce the visitor that walks the entire QueryModel and finds all the necessary query sources, however the problem with that approach is that we would regress perf in some scenarios - we would bring out even more data (which we currently don't do). Better solution is to be smart about which sources we need - only mark for materialization sources that are part of the client side GroupJoin. If the GroupJoin is eliminated (flattened using SelectMany) - we don't need to materialize the main from clause. This effectively could address #6647. Problem is that there is significant risk associated with doing so, and it might not fit into 1.1.1 |
clearing milestone for re-triage |
Moving to 1.1.1. again since this bug may cause data corruption in some cases |
… with GroupJoin wrapped in a subquery when the subset of columns is being projected Problem was that we were incorrectly reasoning about which query sources need to be materialized in the GroupJoin scenario. Previously, when we encountered GroupJoin, we marked its main from clause and the joined clause for materialization, but we would not do that recursively. This causes problems if the GroupJoin (e.g. optional navigation) is in a subquery - we would not mark it for materialization. This would lead to compilation errors in some cases, and in others (when there was a client-evaluation component) to data corruption. Naive implementation of the fix - marking every query source of the GroupJoin for materialization would result in performance regression (we would start to materialize more unnecessary columns - in many scenarios they are not needed). Therefore we also implemented an enhancement to the logic that chooses whether query source (only MainFromClause for now) needs to be materialized. We don't need to materialize MainFromClause for the following case: ( (if we encounter GroupJoinClause followed by AdditionalFromClause, which references the group join and contains subquery with a DefaultIfEmpty - basically GroupJoin-SelectMany-DefaultIfEmpty pattern that can be flattened to LOJ and client GroupJoin will be removed) AND (reference to the group is NOT present anywhere else in the query model - if the group is referenced then we need to preserve client-side GroupJoin operation, which requires MainFromClause to be materialized) ) AND ( (reference to MainFromClause is NOT projected in the selector - we need to materialize because it might be needed in the parent query model) OR (queryModel is wrapped in a result operator that returns a scalar - e.g. Count, even if we project MainFromClause, the wrapping result operator makes it inaccessible from the parent query, meaning that it's not needed after all) ) Second part of the fix is that in LiftSubquery, when we update query sources, we would sometimes try to rewrite EntityShapers into ValueBufferShapers (if the lifted query itself is not marked for materialization). This however is not safe to do in case of GroupJoin. Fix is to only perform this conversion shapers inside ShapedQuery method.
We are OK with taking a minimal fix to this in 1.1.1. A more complete fix (better perf, cleaner SQL, etc.) will go into 2.0. |
… with GroupJoin wrapped in a subquery when the subset of columns is being projected Problem was that we were incorrectly reasoning about which query sources need to be materialized in the GroupJoin scenario. Previously, when we encountered GroupJoin, we marked its main from clause and the joined clause for materialization, but we would not do that recursively. This causes problems if the GroupJoin (e.g. optional navigation) is in a subquery - we would not mark it for materialization. This would lead to compilation errors in some cases, and in others (when there was a client-evaluation component) to data corruption. Fix is to use a QueryModelVisitor to walk the entire tree which allows all necessary query sources related to GroupJoin to be marked for materialization. This incurs a performance hit for some scenarios (groupjoin/optional navigation inside a subquery) because now we bring more columns from the database. Second part of the fix is that in LiftSubquery, when we update query sources, we would sometimes try to rewrite EntityShapers into ValueBufferShapers (if the lifted query itself is not marked for materialization). This however is not safe to do in case of GroupJoin. Fix is to only perform this conversion shapers inside ShapedQuery method.
@divega Impact: High. Issue was reported by a customer and is a regression introduced in 1.1.0. It can also cause data corruption in specific cases. Risk: Low/Medium. We already have the same logic that applied to GroupJoin clauses, but we were not doing it recursively. However, the fix incurs a performance hit - we will now yield more unnecessary data from database and materialize them as entities for the affected scenarios (groupjoins in a subquery). For simpler cases its just extra columns, more complicated ones can turn into N+1 queries. |
… with GroupJoin wrapped in a subquery when the subset of columns is being projected Problem was that we were incorrectly reasoning about which query sources need to be materialized in the GroupJoin scenario. Previously, when we encountered GroupJoin, we marked its main from clause and the joined clause for materialization, but we would not do that recursively. This causes problems if the GroupJoin (e.g. optional navigation) is in a subquery - we would not mark it for materialization. This would lead to compilation errors in some cases, and in others (when there was a client-evaluation component) to data corruption. Fix is to use a QueryModelVisitor to walk the entire tree which allows all necessary query sources related to GroupJoin to be marked for materialization. This incurs a performance hit for some scenarios (groupjoin/optional navigation inside a subquery) because now we bring more columns from the database. Second part of the fix is that in LiftSubquery, when we update query sources, we would sometimes try to rewrite EntityShapers into ValueBufferShapers (if the lifted query itself is not marked for materialization). This however is not safe to do in case of GroupJoin. Fix is to only perform this conversion shapers inside ShapedQuery method.
… with GroupJoin wrapped in a subquery when the subset of columns is being projected Problem was that we were incorrectly reasoning about which query sources need to be materialized in the GroupJoin scenario. Previously, when we encountered GroupJoin, we marked its main from clause and the joined clause for materialization, but we would not do that recursively. This causes problems if the GroupJoin (e.g. optional navigation) is in a subquery - we would not mark it for materialization. This would lead to compilation errors in some cases, and in others (when there was a client-evaluation component) to data corruption. Fix is to use a QueryModelVisitor to walk the entire tree which allows all necessary query sources related to GroupJoin to be marked for materialization. This incurs a performance hit for some scenarios (groupjoin/optional navigation inside a subquery) because now we bring more columns from the database. Second part of the fix is that in LiftSubquery, when we update query sources, we would sometimes try to rewrite EntityShapers into ValueBufferShapers (if the lifted query itself is not marked for materialization). This however is not safe to do in case of GroupJoin. Fix is to only perform this conversion shapers inside ShapedQuery method.
Fixed in c99151d |
Great, thanks! Any idea when 1.1.1 will be released? |
We don't have exact dates yet, but most likely sometime this month |
This patch bug is approved. Please use the normal code review process w/ a PR and make sure the fix is in the correct branch, then close the bug and mark it as done. |
This is a pretty specific corner case, but I've managed to repro it several times using different tables.
You need two tables, one with a non-mandatory FK to the other, e.g.
Now you write a query to get the IDs of the top 100 Customers whose salesperson is named Fred:
Result: a nasty error with sharp, pointy teeth:
This error goes away if you change any of the following:
It's a perfect storm of a corner case, but it did, nonetheless, show up in my code, and it is a bug. I discovered it in EFCore 1.1.0, but even in the latest prerelease of 1.2.0 it is still a problem.
The text was updated successfully, but these errors were encountered: