Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to force reusing a parameter for inlined queries? #23271

Closed
joakimriedel opened this issue Nov 11, 2020 · 15 comments · Fixed by #23437
Closed

How to force reusing a parameter for inlined queries? #23271

joakimriedel opened this issue Nov 11, 2020 · 15 comments · Fixed by #23437
Assignees
Labels
area-perf area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. community-contribution customer-reported type-enhancement
Milestone

Comments

@joakimriedel
Copy link
Contributor

I've started to inline queries more often to reduce round-trips to the server for fetching IDs that later are used in other queries.

This is partly due to the limitations of using .Contains() on an int-array.

Instead I use something similar to the following (stupid) example, of course my real queries are different in terms of complexity. Still, this demonstrates the issue clearly.

                var offerId = 1;
                var inlineQuery = context.Offers.Where(o => o.Id == offerId);

                var firstQuery = context.OfferActions.Where(oa => inlineQuery.Contains(oa.Offer) && oa.Action == OfferActions.Established);
                var secondQuery = context.OfferActions.Where(oa => inlineQuery.Contains(oa.Offer) && oa.Action != OfferActions.Established);

                var query = firstQuery.Concat(secondQuery);
                await query.ToListAsync();
SELECT [o].[Id], [o].[Action], [o].[OfferId], [o].[TimeCreatedUtc]
      FROM [OfferActions] AS [o]
      INNER JOIN [Offers] AS [o0] ON [o].[OfferId] = [o0].[Id]
      WHERE EXISTS (
          SELECT 1
          FROM [Offers] AS [o1]
          WHERE ([o1].[Id] = @__offerId_0) AND ([o1].[Id] = [o0].[Id])) AND ([o].[Action] = 1)
      UNION ALL
      SELECT [o2].[Id], [o2].[Action], [o2].[OfferId], [o2].[TimeCreatedUtc]
      FROM [OfferActions] AS [o2]
      INNER JOIN [Offers] AS [o3] ON [o2].[OfferId] = [o3].[Id]
      WHERE EXISTS (
          SELECT 1
          FROM [Offers] AS [o4]
          WHERE ([o4].[Id] = @__offerId_1) AND ([o4].[Id] = [o3].[Id])) AND ([o2].[Action] <> 1)

As you can see in the above SQL output, I now have two parameters __offerId_0 and __offerId_1 even though I used the same local variable offerId.

Is there any way to force EF Core to reuse the parameters so that I only get a single parameter __offerId used in both subqueries?

In my real-world example, sometimes I end up with >10 parameters of the exact same value and I think this gives the SQL server a poor chance to optimize the query properly.

Include provider and version information

EF Core version: 5.0 rc2
Database provider: Microsoft.EntityFrameworkCore.SqlServer

@smitpatel
Copy link
Contributor

If you inline the inlineQuery then it would generate one parameter I believe.
The reason it generated different parameters is the way visitation worked for parameterizing.
query causes parameterization firstQuery/secondQuery tree. At this point, inlineQuery is actually a parameter since it is coming from closure. But since this parameter value is IQueryable, we inline it and extract parameters recursively, each of this has no details about the other trees so ends up generating different parameters.
It would also depend on what expression tree we are getting, if the expression for offerId is same in both then may be we can generate one parameter.

@joakimriedel
Copy link
Contributor Author

The inlineQuery is in real life a query that is used for authorization, so it's pretty verbose and would like to benefit from code reuse. Still, for experimenting with this, I tried inlining the inlineQuery, but the result is actually the same - I still get two parameters.

                var firstQuery = context.OfferActions.Where(oa => context.Offers.Where(o => o.Id == offerId).Contains(oa.Offer) && oa.Action == OfferActions.Established);
                var secondQuery = context.OfferActions.Where(oa => context.Offers.Where(o => o.Id == offerId).Contains(oa.Offer) && oa.Action != OfferActions.Established);

I'm not enough versed in the inner workings of EF Core, but could there be a way to have a special constant parameter expressed somehow in the query? Like (pseudocode)

var inlineQuery = context.Offers.Where(o => o.Id == EF.MakeConstantParameter(offerId));

and it would somehow hash the argument and maintain some sort of hashmap to check if the parameter is exactly the same as some other parameter in the tree and then link them together? Or just check by reference if they point to the same memory or something similar?

@joakimriedel
Copy link
Contributor Author

Funny, I even inlined the inlines;

var query = context.OfferActions.Where(oa => context.Offers.Where(o => o.Id == offerId).Contains(oa.Offer) && oa.Action == OfferActions.Established).Concat(context.OfferActions.Where(oa => context.Offers.Where(o => o.Id == offerId).Contains(oa.Offer) && oa.Action != OfferActions.Established));

and still got

SELECT [o].[Id], [o].[Action], [o].[OfferId], [o].[TimeCreatedUtc]
      FROM [OfferActions] AS [o]
      INNER JOIN [Offers] AS [o0] ON [o].[OfferId] = [o0].[Id]
      WHERE EXISTS (
          SELECT 1
          FROM [Offers] AS [o1]
          WHERE ([o1].[Id] = @__offerId_0) AND ([o1].[Id] = [o0].[Id])) AND ([o].[Action] = 1)
      UNION ALL
      SELECT [o2].[Id], [o2].[Action], [o2].[OfferId], [o2].[TimeCreatedUtc]
      FROM [OfferActions] AS [o2]
      INNER JOIN [Offers] AS [o3] ON [o2].[OfferId] = [o3].[Id]
      WHERE EXISTS (
          SELECT 1
          FROM [Offers] AS [o4]
          WHERE ([o4].[Id] = @__offerId_1) AND ([o4].[Id] = [o3].[Id])) AND ([o2].[Action] <> 1)

So my first guess that it was about the inlined IQueryable seems wrong, it seems to be wider than that.

@smitpatel
Copy link
Contributor

Seems like this is happening because Concat argument is not a lambda. so generating a separate tree.

@joakimriedel
Copy link
Contributor Author

Would that be the same for a similar construct to the following, which is more like the one I have in real life where it builds an anonymous dto object;

(the Concat was just for the simple case)

                var query = context.Ads
                .Where(ad => ad.Id == 1)
                .Select(ad => new
                {
                    HasSomeOffers = firstQuery.Where(oa => oa.Offer.AdId == ad.Id).Any(),
                    HasSomeOtherOffers = secondQuery.Where(oa => oa.Offer.AdId == ad.Id).Any()
                });

                var item = await query.SingleOrDefaultAsync();

This also generates two different offerId parameters:

         SELECT 1
              FROM [OfferActions] AS [o]
              INNER JOIN [Offers] AS [o0] ON [o].[OfferId] = [o0].[Id]
              WHERE (EXISTS (
                  SELECT 1
                  FROM [Offers] AS [o1]
                  WHERE ([o1].[Id] = @__offerId_0) AND ([o1].[Id] = [o0].[Id])) AND ([o].[Action] = 1)) AND ([o0].[AdId] = [a].[Id])) THEN CAST(1 AS bit)
          ELSE CAST(0 AS bit)
      END AS [HasSomeOffers], CASE
          WHEN EXISTS (
              SELECT 1
              FROM [OfferActions] AS [o2]
              INNER JOIN [Offers] AS [o3] ON [o2].[OfferId] = [o3].[Id]
              WHERE (EXISTS (
                  SELECT 1
                  FROM [Offers] AS [o4]
                  WHERE ([o4].[Id] = @__offerId_1) AND ([o4].[Id] = [o3].[Id])) AND ([o2].[Action] <> 1)) AND ([o3].[AdId] = [a].[Id])) THEN CAST(1 AS bit)
          ELSE CAST(0 AS bit)
      END AS [HasSomeOtherOffers]
      FROM [Ads] AS [a]
      WHERE [a].[Id] = 1

But still, do you think it would be possible to get around this parameter generation somehow?

@smitpatel
Copy link
Contributor

Let me investigate the expression trees. My understanding is, anytime a queryable object from closure is used, a different scope for parameters starts generating new parameters always. In all above example that is consistent. I will verify and see if it can be changed to generate one parameter only.

@joakimriedel
Copy link
Contributor Author

@smitpatel out of curiosity, started experimenting a bit on main branch.

Changing row 331 to 337 in https://github.com/dotnet/efcore/blob/0b3165096d6b55443fc06ae48404c2b037dd73e7/src/EFCore/Query/Internal/ParameterExtractingExpressionVisitor.cs

            parameterName
                = QueryCompilationContext.QueryParameterPrefix
                + parameterName
                + "_"
                + _parameterValues.ParameterValues.Count;

            _parameterValues.AddParameter(parameterName, parameterValue);

            var parameter = Expression.Parameter(expression.Type, parameterName);

to the following

var parameterPrefix = QueryCompilationContext.QueryParameterPrefix
                + parameterName
                + "_";

            ParameterExpression parameter;

            var matchingParameters = _parameterValues.ParameterValues.Where(p => p.Key.StartsWith(parameterPrefix, StringComparison.InvariantCulture) && p.Value.Equals(parameterValue));
            if (matchingParameters.Any())
            {
                var lastParameter = matchingParameters.Last();

                parameter = Expression.Parameter(expression.Type, lastParameter.Key);
            }
            else
            {
                parameterName
                    = parameterPrefix + _parameterValues.ParameterValues.Count;

                _parameterValues.AddParameter(parameterName, parameterValue);

                parameter = Expression.Parameter(expression.Type, parameterName);
            }

.. will actually resolve this issue.

Also added this test to GearsOfWarQueryTestBase.cs to prove it

        [ConditionalTheory]
        [MemberData(nameof(IsAsyncData))]
        public virtual Task Query_reusing_parameter_with_concat_doesnt_declare_duplicate_parameter(bool async)
        {
            var squadId = 1;

            return AssertQuery(
                async,
                ss =>
                {
                    var innerQuery = ss.Set<Squad>().Where(s => s.Id == squadId);
                    var outerQuery = ss.Set<Gear>().Where(g => innerQuery.Contains(g.Squad));
                    return outerQuery.Concat(outerQuery).OrderBy(g => g.FullName);
                },
                assertOrder: true);
        }

        public override async Task Query_reusing_parameter_with_concat_doesnt_declare_duplicate_parameter(bool async)
        {
            await base.Query_reusing_parameter_with_concat_doesnt_declare_duplicate_parameter(async);

            AssertSql(
                @"@__squadId_0='1'

SELECT [t].[Nickname], [t].[SquadId], [t].[AssignedCityName], [t].[CityOfBirthName], [t].[Discriminator], [t].[FullName], [t].[HasSoulPatch], [t].[LeaderNickname], [t].[LeaderSquadId], [t].[Rank]
FROM (
    SELECT [g].[Nickname], [g].[SquadId], [g].[AssignedCityName], [g].[CityOfBirthName], [g].[Discriminator], [g].[FullName], [g].[HasSoulPatch], [g].[LeaderNickname], [g].[LeaderSquadId], [g].[Rank]
    FROM [Gears] AS [g]
    INNER JOIN [Squads] AS [s] ON [g].[SquadId] = [s].[Id]
    WHERE EXISTS (
        SELECT 1
        FROM [Squads] AS [s0]
        WHERE ([s0].[Id] = @__squadId_0) AND ([s0].[Id] = [s].[Id]))
    UNION ALL
    SELECT [g0].[Nickname], [g0].[SquadId], [g0].[AssignedCityName], [g0].[CityOfBirthName], [g0].[Discriminator], [g0].[FullName], [g0].[HasSoulPatch], [g0].[LeaderNickname], [g0].[LeaderSquadId], [g0].[Rank]
    FROM [Gears] AS [g0]
    INNER JOIN [Squads] AS [s1] ON [g0].[SquadId] = [s1].[Id]
    WHERE EXISTS (
        SELECT 1
        FROM [Squads] AS [s2]
        WHERE ([s2].[Id] = @__squadId_0) AND ([s2].[Id] = [s1].[Id]))
) AS [t]
ORDER BY [t].[FullName]");
        }

And it succeeds, with most other tests except for a failing test due to Swedish culture (see #22901). Only other remaining failing test seems to be Ternary_should_not_evaluate_both_sides but I think it could be made working with just a little bit more effort.

Is this a totally wrong angle to resolve this issue, or should I try pursuing further and perhaps try to wrap it together in a PR?

@smitpatel
Copy link
Contributor

It doesn't look incorrect fix but there is also opposite side of it in #22524 where 2 values are kinda same but should generate different parameters. Parameter extraction is a "best-guess" solution and what and how to reuse the query. Without having any particular example in mind, I feel trying to find existing parameter matching value to be somewhat fragile. A more robust solution should be to identify that both expression trees (member access on closure) are the same, so we don't need multiple parameters for it. I need to inspect actual tree to evaluate how to achieve those.

Perhaps the reentrancy case should not clear out evaluated values?

@smitpatel
Copy link
Contributor

Reentrancy happens here

return ExtractParameters(innerQueryable.Expression);

@joakimriedel
Copy link
Contributor Author

Could experiment a bit further on that track instead. Pursued a bit further on parameter matching solution, but seems like Ternary_should_not_evaluate_both_sides fails due to some optimization possibly in RelationalShapedQueryCompilingExpressionVisitor.ShaperProcessingExpressionVisitor.cs when several fields are mapped to the same parameter, but debugging the expression trees are a bit overwhelming for me. It should produce Data1, Data2 and Data3 all equal to @__p_0, but the last two are dropped.

@joakimriedel
Copy link
Contributor Author

joakimriedel commented Nov 22, 2020

@smitpatel only removing the line where evaluated values are cleared in

will resolve this issue and successfully run my new test for inlined queries. Sometimes removing code is very satisfactory. 😄

But now I get problems with Microsoft.EntityFrameworkCore.Query.NorthwindIncludeQuerySqlServerTest.Include_duplicate_collection where the constants in OFFSET 2 ROWS FETCH NEXT 2 ROWS ONLY suddenly gets translated to OFFSET @__p_0 ROWS FETCH NEXT @__p_0 ROWS ONLY for some reason. @__p_0='2' so it probably won't affect actual results - but it was a bit unexpected side-effect. Is this an acceptable change?

edit: ah, it's probably because there is also a SELECT TOP(@__p_0) in the query so the constant 2 value is probably reused from there.

@smitpatel
Copy link
Contributor

Likely that part shouldn't get reused. Though no functional impact. But bottom line is that we need to clear that dict. Just skip it in reentrancy case.

@joakimriedel
Copy link
Contributor Author

I've changed behaviour to actually clear the dictionary when not leaving a re-entry, but the tests I mentioned still failed since they actually are re-entrant and affected by this new code.

I believe this new implementation is correct and suggest changing the tests. Since the row limiting operation is parameterized for TOP, I think it should also be for CROSS APPLY otherwise a lot of different query plans will be cached in SQL for different row counts.

PR draft linked.

Eager to solve this since I've done perf tests on my solution and queries that heavily inline queries will benefit a query time decrease around 30% by using the same parameter instead of introducing 10+ parameters with the same value.

@smitpatel smitpatel modified the milestone: Backlog Nov 23, 2020
@smitpatel
Copy link
Contributor

@joakimriedel - This won't be released before 6.0 (would certainly be in earlier preview). But 6.0 RTM is next year November timeframe.

@joakimriedel
Copy link
Contributor Author

@smitpatel So I've heard, but I still hold my thumbs for a 5.1 before next summer. :)

smitpatel pushed a commit to joakimriedel/efcore that referenced this issue Feb 22, 2021
smitpatel pushed a commit to joakimriedel/efcore that referenced this issue Feb 23, 2021
smitpatel pushed a commit to joakimriedel/efcore that referenced this issue Feb 23, 2021
smitpatel pushed a commit that referenced this issue Feb 23, 2021
@smitpatel smitpatel self-assigned this Feb 23, 2021
@smitpatel smitpatel removed this from the Backlog milestone Feb 23, 2021
@smitpatel smitpatel added this to the 6.0.0 milestone Feb 23, 2021
@ajcvickers ajcvickers modified the milestones: 6.0.0, 6.0.0-preview3 Mar 5, 2021
@ajcvickers ajcvickers added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Apr 28, 2021
@ajcvickers ajcvickers modified the milestones: 6.0.0-preview3, 6.0.0 Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-perf area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. community-contribution customer-reported type-enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants