-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix to #6647 - Left Join (GroupJoin) always materializes elements resulting in unnecessary data pulling #7772
Conversation
@tuespetre fyi, feel free to chime in as you have significant knowledge in this area |
} | ||
} | ||
|
||
private Expression ExtractSourceExpression(Expression expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code has been copied and modified from the EntityQueryModelVisitor.IterateCompositePropertyExpression Might be worthwhile to add a "debug"/"mock" binding to EQMV. Wasn't sure if its worth to add public surface to that class - this approach is sort of a hack, ideally we would fix the binding itself to work for those scenarios. That's why I opted for copying the code into a private method that can be removed without issue once a proper fix is implemented. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not much need for this, once inside the BindMemberExpression
/BindMethodCallExpression
lambda, there can only have been a single property, so RemoveConvert()
on node.Expression
or node.Arguments[0]
suffices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call!
@@ -318,6 +318,8 @@ var entityEqualityRewritingExpressionVisitor | |||
_navigationRewritingExpressionVisitorFactory | |||
.Create(this).Rewrite(queryModel, parentQueryModel: null); | |||
|
|||
entityEqualityRewritingExpressionVisitor.Rewrite(queryModel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid rewrite twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try, but IIRC we were missing some cases if the equality rewrite happens only after nav rewrite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does nav rewrite depends on equality rewrite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently some scenarios involving navigation entity comparison is broken if entity equality doesn't run before nav rewrite. We could probably refactor that code by adding some intermediate step between nav rewrite and entity comparison, filed #7773 to track this
} | ||
} | ||
|
||
private class RequiresMaterializationCompensatingVisitorState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can split this into individual properties and pass them directly. In earlier iterations there was a boolean state that was shared between query model visitor and expression visitor so passing a reference was needed. Now it can be rewritten, but not sure if it's worth - this "nicely" encapsulates all the data/helpers that those visitors need. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to pass the state in -- the QueryModelVisitor and the ExpressionVisitor each can have a public HashSet<IQuerySource> QuerySources { get; }
that they add to. In FindQuerySourcesRequiringMaterialization
, the resulting sets from all three visitors (the original + two compensating) can be combined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, simplifies this code a lot, now that both visitors don't share state
|
||
protected virtual IQuerySource PreProcessQuerySource(IQuerySource querySource) | ||
{ | ||
var newQuerySource = ExtractGroupJoinFromSelectManySubquery(querySource); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the same result be achieved by assigning the appropriate QuerySource
to joinExpression
in OptimizeJoinClause
(assign it the joinClause
, and then if flattening a )?DefaultIfEmpty
, assign it the additionalFromClause
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maumar these changes work but keep the 'hairiness' out of TableExpressionBase
:
It should be noted that the extra-hair-tastic bit (the else if
block) is only required because we don't lift all the subqueries we could yet (i.e. lifting _Select
using a ProjectionShaper
) and thus the group joins aren't flattened as they could be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smitpatel @anpete any thoughts on this? - you guys know this area better than me ;) One thing worth pointing out is that we can't rely on ProjectionShaper optimization to simplify the code, since there can always be client method involved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting before I forget: The block itself was only needed to make two tests pass: GroupJoin_on_left_side_being_a_subquery and its twin.
if (sourceExpression is SubQueryExpression subQueryExpression) | ||
{ | ||
var queryModel = subQueryExpression.QueryModel; | ||
if (queryModel.ResultOperators.ToList().OfType<ChoiceResultOperatorBase>().Any()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't need to check the result operators -- if it's a subquery bound to a property, you can just promote the selector as a QSRE, using HandleUnderlyingQuerySources
pointing at a MarkForMaterialization
method that adds them to the collection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is selector guaranteed to be QSRE? Might be safer to just delegate to the requiresmaterialization visitor, but I agree that there is no need to check for result operators - (I think) there should always be one if we end up in this situation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's an entity property access, I would think so (or it's another subquery that returns an entity, which HandleUnderlyingQuerySources
should handle) BUT now that you bring it up, what about a subquery that selects an entity via a constructor, like (from c in customers select new Customer() { SingleProp = c.SingleProp }).FirstOrDefault(),SingleProp
? I'm not sure I've seen a test that covers that already.
EDIT: Thought occurred, and that case wouldn't matter -- because it would return a Customer object, not a value buffer.
@maumar I had to confirm all of my suspicions before making comments, here are the results in case it's of any help to you: fix7290...tuespetre:fix7290 |
new version up |
private void CompensateForUnboundPropertyAccess(Expression expression) | ||
{ | ||
var sourceExpression = expression.RemoveConvert(); | ||
if (sourceExpression is SubQueryExpression subQueryExpression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛏ : if (expression.RemoveConvert() is SubQueryExpression subQueryExpression)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hold up, new developments coming in
@tuespetre the change is dangerously growing in complexity ;) I thinking to keep the hack for now as its very simple and localized and will leave #7722 open. Once this change is in, you could submit a new PR with the proper fix and remove the hack. |
I thought so too, especially the The |
@tuespetre agreed, especially that it's related to this change - we introduced this flag because it was messing with out groupjoins. Will incorporate this into the PR |
@tuespetre after doing some more digging, simply removing
For those we still materialize extra stuff, so distinct needs to happen on the client still.
|
Oh, ok. I suppose |
new version up (adding more tests) |
new version up, rebased on the latest changes and fixed some more bugs found along the way |
/// This API supports the Entity Framework Core infrastructure and is not intended to be used | ||
/// directly from your code. This API may change or be removed in future releases. | ||
/// </summary> | ||
public class RecursiveQueryModelVisitor : ExpressionTransformingQueryModelVisitor<RecursiveQueryModelExpressionVisitor> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why hello there beautiful 💖
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maumar - Is it visitor to visit a visitor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminds me, we can reuse this in NondeterministicResultCheckingVisitor
61f2b80
to
f20423b
Compare
@maumar I have an idea that might eliminate the need for not only the |
/// This API supports the Entity Framework Core infrastructure and is not intended to be used | ||
/// directly from your code. This API may change or be removed in future releases. | ||
/// </summary> | ||
public class RecursiveQueryModelVisitor : QueryModelVisitorBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yo @maumar would you mind splitting this into its own PR so it can be merged sooner? I've got multiple places in multiple branches where I'd like to use this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maumar - We talked about it yesterday only. 😄 I had implemented a visitor which would apply QueryModelVisitor recursively. Lets discuss about it.
@tuespetre - PR #7822 can utilize such recursive visitor & detect all subquery expressions but there is one test which is regressing on that due to how parsed querymodel is generated and order which optimizations are applied. If you want to tackle that issue in that way. I do not want you to block you on that so the current changes in that PR is also fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#7843 is merged so you can simply this. Can you also post reference where this is being used? (I suppose there is derived class for this else this 2 visitor does nothing other than calling each other. So much Love)
@@ -129,6 +137,21 @@ protected override Expression VisitMethodCall(MethodCallExpression node) | |||
return base.VisitMethodCall(node); | |||
} | |||
|
|||
protected override Expression VisitExtension(Expression node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Base VisitExtension calls VisitChildren on the extension node. I know we have been inconsistent with this but I'd like to fix that 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we skip over NullConditionalExpression
in ExpressionVisitorBase
. However we do process NullConditional in DefaultQueryExpressionVisitor
so it can be safely removed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we skip over NullConditionalExpression in ExpressionVisitorBase
Yeah, this needs to change, too. We are breaking the pattern by doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do this in a separate PR
propertyExpression, | ||
constantNull); | ||
return addNullCheck | ||
? new NullConditionalExpression(target, target, propertyExpression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this reason, I wonder what is difference between caller
and nullableCaller
in NullConditionalExpression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its relevant for the nested NCEs - nullable caller can be another NCE, whereas caller and access operation will always be normal expressions (methods/member access)
private void MaterializeOptionalNavigationSource(QuerySourceReferenceExpression sourceQsre) | ||
{ | ||
if (sourceQsre != null | ||
&& sourceQsre.ReferencedQuerySource is AdditionalFromClause additionalFromClause |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DRY this conditions with the GroupJoin visitor. You can put it in QuerySourceExtensions if it is used outside of this file. Or private function in this file. This thing basically takes AdditionalFromClause and find the pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PreProcessQuerySource
uses this. Make extension method on AdditionalFromClause. Can also used in GroupJoin visitor if you override VisitAdditionalFromClause method.
Name this pattern something good in extension method like IsMaumerJoin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrt GroupJoinMaterializationCompensatingVisitor
I need to process GroupJoin clauses, not AdditionalFrom clauses because there can be a groupjoin without select many, and we would not catch those cases (i.e. wouldn't mark them for materialization)
public override void Optional_navigation_inside_nested_method_call_translated_to_join_keeps_original_nullability_also_for_arguments() | ||
{ | ||
base.Optional_navigation_inside_nested_method_call_translated_to_join_keeps_original_nullability_also_for_arguments(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove these.
@@ -71,8 +74,13 @@ protected override Expression VisitSubQuery(SubQueryExpression expression) | |||
|
|||
var queryModelVisitor = (RelationalQueryModelVisitor)CreateQueryModelVisitor(); | |||
|
|||
var queryModelMapping = new Dictionary<QueryModel, QueryModel>(); | |||
expression.QueryModel.PopulateQueryModelMapping(queryModelMapping); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we not require this before and what changes brought this requirement now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was an existing bug, we just didn't catch it because for the relevant tests we have, LOJ and IJ were returning the same results.
requiresMaterializationExpressionVisitor); | ||
|
||
groupJoinMaterializationExpressionVisitor.QueryModelVisitor = groupJoinMaterializationQueryModelVistor; | ||
groupJoinMaterializationQueryModelVistor.VisitQueryModel(queryModel); | ||
var unboundPropertyAccessCompensatingVisitor = new UnboundPropertyAccessCompensatingVisitor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 🔥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
new version up |
@@ -666,7 +653,7 @@ private static Expression HandleSum(HandlerContext handlerContext) | |||
|
|||
if (!(expression.RemoveConvert() is SelectExpression)) | |||
{ | |||
var sumExpression = new SqlFunctionExpression("SUM", expression.Type, new [] { expression }); | |||
var sumExpression = new SqlFunctionExpression("SUM", handlerContext.QueryModel.SelectClause.Selector.Type, new [] { expression }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, problematic scenario is entities.Select(e => (int?)e.Optional.Id)).Sum()
which we now fully translate to sql. When converting single result element to a collection, the expected element TResult is int? but looking at selectExpression, we think its int (because we nuke all converts when translating). Fix is to look at query model instead which has appropriate typing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget Min/Max! 🌵
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Wait for @anpete to final sign off on this,
return _tables.Any(te => te.QuerySource == querySource || te.HandlesQuerySource(querySource)); | ||
var processedQuerySource = PreProcessQuerySource(querySource); | ||
|
||
return _tables.Any(te => te.QuerySource == processedQuerySource || te.HandlesQuerySource(processedQuerySource)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be simplified (only needs to check if HandlesQuerySource)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also wouldn't hurt to fallback via OrElse to base.HandlesQuerySource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that initially, but some queries deteriorate - CROSS JOIN gets converted to CROSS APPLY, where no correlation is present. Will leave as is.
&& subQueryExpression.QueryModel.ResultOperators.Count == 1 | ||
&& subQueryExpression.QueryModel.ResultOperators[0] is DefaultIfEmptyResultOperator | ||
&& subQueryExpression.QueryModel.MainFromClause.FromExpression is QuerySourceReferenceExpression subqueryQsre | ||
&& subqueryQsre.ReferencedQuerySource is GroupJoinClause groupJoinClause) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably explicitly check if the GroupJoinClause is the immediately preceding body clause in the query model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this actually requires query model, which we don't have in some contexts. Currently we are doing this check in the caller code, will keep as is.
@anpete new version up |
var inputType = expression.Type; | ||
var outputType = expression.Type; | ||
var inputType = handlerContext.QueryModel.SelectClause.Selector.Type; | ||
var outputType = handlerContext.QueryModel.SelectClause.Selector.Type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputType = inputType?
var groupJoinMaterializationQueryModelVistor = new RequiresMaterializationForGroupJoinQueryModelVisitor( | ||
groupJoinMaterializationExpressionVisitor, | ||
_querySourcesRequiringMaterialization, | ||
var groupJoinCompensatingVisitor = new GroupJoinMaterializationCompensatingVisitor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway to unify all of this inside RequiresMaterializationExpressionVisitor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented them intentionally outside - RequiresMaterializationExpressionVisitor should just find query sources that are reachable from the final projection and/or ones that need to be materialized due to client evals. Those visitors here are just hacks/workarounds for existing issues and once those issues are resolved, they can be removed - without tinkering with the RMEV, which is itself quite complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracking issues to remove hacks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#7787 is already tracking OptionalCollectionNavigationCompensatingVisitor, will file a new issue to track GroupJoinMaterializationCompensatingVisitor once this is checked in (and 7290 gets closed) - to track the remaining work on groupjoin materialization
…ulting in unnecessary data pulling Problem was that for GroupJoin we would always force materialization on participating query sources. We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups. However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway. Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters. We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query. This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high. Other bugs fixed alongside the main change: #7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery. Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre). Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic) #7497 - Error calling Count() after GroupJoin() (and removed the temporary fix to #7348 and implemented a proper one) This turned out to be a chicken-and-egg problem - we had discrepancy between query sources that we marked for materialization in case of GroupJoin + result operator, and the actual client side operation that had to be performed. We had to fix the group join materialization to allow for translation of those result operators, but without fixing this issue also we would get invalid queries in some cases. Proper fix is to force client result operator only if the GroupJoin cannot be flattened. We also now do this for all result operators, not just a subset because it was possible to generate invalid queries with operaors like Count(). This is now consistent with the behavior of RequiresMaterializationExpressionVisitor. Also fixed several minor issues that were encountered once no longer do extensive materialization.
Problem was that for GroupJoin we would always force materialization on participating query sources.
We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups.
However, if the GroupJoin clause is wrapped around SelectMany clause then groups don't matter because they are getting flattened by SelectMany anyway.
Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters.
We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query.
This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high.
This change also addresses #7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery.
Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre).
Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic)
Also fixed several minor issues that were uncovered since we no longer do extensive materialization.