Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate embedded IQueryables to common table expressions (CTEs) #29918

Open
willnationsdev opened this issue Dec 22, 2022 · 14 comments
Open

Translate embedded IQueryables to common table expressions (CTEs) #29918

willnationsdev opened this issue Dec 22, 2022 · 14 comments

Comments

@willnationsdev
Copy link

willnationsdev commented Dec 22, 2022

When reusing a separate IQueryable in operations, e.g. .Where(obj => otherQuery.Contains(obj)), EF Core generates a subquery (as expected). However, in these cases, manually reviewing generated SQL in debug logs becomes more difficult to visually/mentally parse and damages the readability of the code, especially when a given IQueryable is injected multiple times and/or the generated SQL is especially large.

I would like to have the ability to request that applicable queries replace all injected subqueries of this nature with common table expressions named after the injected IQueryable's symbol (like otherQuery in the previous example). This would in turn make reading the debug logs of executed SQL statements more readable.

This could just be an opt-in optional behavior specifically for use in debug contexts. One could apply this strategy individually via something like AsNoTracking() or app-wide via the DbContextOptionsBuilder like with EnableDetailedErrors().

Related Issues: #26486 (CTEs via a special LINQ operator, including recursive)

@roji
Copy link
Member

roji commented Dec 22, 2022

@willnationsdev are you suggesting we generate a completely different SQL (with CTE instead of subqueries) specifically for logging SQL, and continue using the current subquery SQL for what's actually executed against the database?

@willnationsdev
Copy link
Author

willnationsdev commented Dec 22, 2022

@roji

Hmmm...as far as I know, CTEs have an equivalent performance impact compared to copy/pasting subqueries, yes? Regardless, differing logged vs. executed queries would defeat the purpose of "improving debug log analysis" in the first place, so I wouldn't advocate for that. What shows up in logs should always be what is actually sent to the database to keep logs accurate, so provided the implementation work involved is the same, I think it would be better to change the query string for both logging and execution.

If however including CTEs actually does have some detrimental impact on query handling for some reason, then perhaps this request should involve logging-specific query string generation on an as-requested, per-query basis. Then users can generate ad-hoc CTE-based queries when they need to review an especially complex query, but not employ the CTEs in the average use case.

@roji
Copy link
Member

roji commented Dec 22, 2022

So you're asking to change EF to replace all subqueries in its SQL with CTEs? That seems like quite a request, especially if the motivation here is finding CTEs a bit more readable.

In fact, since CTEs effectively extract out entire subtrees of the query tree, doing this systematically for all subqueries is very likely to degrade readability; especially since EF would have to mechanically generate the CTE identifiers, so they'd be meaningless (cte1, cte2...). Try figuring out which CTE means what in a complex query with many of these.

Don't get me wrong - I think CTEs are great, and there are some specific situations where they're very helpful (I'm especially thinking of recursive CTEs, which allow expressing queries which are otherwise inexpressible). But the readability of simple subqueries vs. CTE seems largely a matter of personal taste, and I don't see us changing everything here as you suggest.

(and all this before having thought or investigated any potential perf issues resulting from such a switch)

@willnationsdev
Copy link
Author

@roji That makes sense. I was really looking for a way to more clearly distinguish/segregate which parts of a large & complex query were derived from which IQueryable instances, and when I've moved around my own subqueries into CTEs while writing SQL manually, I found it helpful; however, if doing so in EF is a lot of work and/or if the generated SQL doesn't even assign aliases reminiscent of the originally provided queries, then it's definitely not worth the effort.

I just wasn't sure how hard it would be to do, and I knew that the few generated query strings I had looked at thus far included DECLARE @... statements that had names closely matching those of the symbols passed as parameters to the C# Expressions, so I figured the same could be done with the IQueryables via CTEs.

@roji
Copy link
Member

roji commented Dec 23, 2022

[...] the few generated query strings I had looked at thus far included DECLARE @... statements that had names closely matching those of the symbols passed as parameters to the C# Expressions [...]

Yeah, but parameters are very, very different from subqueries.

I think the way forward here is for us to implement explicit support for CTEs, where the user uses a LINQ operator to express SQL WITH (#26486). Once that happens, the query writer has the option to decide exactly what they want as a CTE and what name to give it. That would be generally better than just turning all subqueries into "anonymous" CTE blocks.

I'm going to go ahead and close this, but please feel free to continue posting here and discussing.

@roji roji closed this as not planned Won't fix, can't repro, duplicate, stale Dec 23, 2022
@roji roji added closed-no-further-action The issue is closed and no further action is planned. and removed type-enhancement labels Dec 23, 2022
@stevendarby
Copy link
Contributor

Can you explain how parameters are very very different to sub queries in terms of getting the name?

If I've understood correctly, this request is not about turning all subqueries into a CTE but specifically about turning those generated from IQueryable variables into CTEs. Wouldn't the IQueryable variable name be captured in the expressions and so could be used as the CTE name?

I feel there's quite good symmetry between, in the code/LINQ, defining an IQueryable separately to the main query and potentially using it multiple times in the main query, and, in the SQL, doing the same with a CTE.

@roji
Copy link
Member

roji commented Dec 23, 2022

Can you explain how parameters are very very different to sub queries in terms of getting the name?

Oh, I wasn't referring specifically to the name - they're different in various other ways.

Looking again at the above, I think I missed the fact that the request was specifically about when an external IQueryable variable is integrated into the query, rather than about subqueries in general (sorry). For that case we can indeed know the name and give it to a CTE, similar to how we do that for parameters.

I do kinda like this idea - integrating an external IQueryable is a complex, "independent" query is being integrated into the main LINQ query, so it makes some sense to do the same on the SQL side via CTEs. But I'm still not sure doing this automatically is the right thing - at the very least a serious cross-database investigation would need to make sure the change to CTEs doesn't affect perf etc. And the gain is purely in more readable SQL.

I'll reopen so we can at least discuss this on the team.

@roji roji reopened this Dec 23, 2022
@roji
Copy link
Member

roji commented Dec 23, 2022

[...] potentially using it multiple times in the main query [...]

This is a good point, and would be a possible perf improvement, as the subquery is only evaluated once rather than multiple times.

On the downside, any sort of dynamic query construction (such as the integration of an IQueryable) is basically incompatible with any sort of static analysis, and so won't work with precompiled queries and AOT. Of course, whether the user integrates IQueryable or not is orthogonal to what SQL we produce from it (it's dynamic and AOT-incompatible in any case) - but it might mean it's less interesting to invest efforts into this.

@ajcvickers ajcvickers changed the title Optionally replace subqueries with common table expressions (CTEs) to improve readability of debug logs. Translated embedded IQueryables to common table expressions (CTEs) Jan 5, 2023
@ajcvickers ajcvickers changed the title Translated embedded IQueryables to common table expressions (CTEs) Translate embedded IQueryables to common table expressions (CTEs) Jan 5, 2023
@ajcvickers ajcvickers added type-enhancement blocked area-query and removed closed-no-further-action The issue is closed and no further action is planned. labels Jan 5, 2023
@ajcvickers ajcvickers added this to the Backlog milestone Jan 5, 2023
@joakimriedel
Copy link
Contributor

Until we would have full CTE support, would it be possible to have tagging support in embedded IQueryables? For logging purposes it would be nice if the existing TagWith would output the tag next to the subquery.

.Where(obj => otherQuery.TagWith("subquery for other stuff").Contains(obj))

@roji
Copy link
Member

roji commented Jan 10, 2023

@joakimriedel query tags are implemented at the entire query level, and cannot be attached to one part of the query or another; so unfortunately this isn't possible at this point.

@joakimriedel
Copy link
Contributor

@joakimriedel query tags are implemented at the entire query level, and cannot be attached to one part of the query or another; so unfortunately this isn't possible at this point.

I see, I thought it would be possible to do some rewriting magic since it would always be part of an expression tree. Alas, even stronger vote for CTE, I'm also heavily invested in sub queries like OP.

@Pepsi1x1
Copy link

Pepsi1x1 commented Dec 6, 2024

Strongly in favour of this feature, we have queries on the hot path that would benefit from the perf gain here

@willnationsdev
Copy link
Author

@Pepsi1x1 I may be wrong, but if you take a look at roji's comment...

On the downside, any sort of dynamic query construction (such as the integration of an IQueryable) is basically incompatible with any sort of static analysis, and so won't work with precompiled queries and AOT.

...he seems to insinuate that embedding an IQueryable within another IQueryable makes performance optimizations impossible, if that's what you were looking for. My proposal was more for the sake of improving the readability of generated SQL queries. Or did you mean the SQL-side optimization of just reducing the amount of SQL being parsed for repeatedly used sub-queries?

@roji
Copy link
Member

roji commented Dec 7, 2024

@willnationsdev my comment was only about the incompatibility of IQueryable composition (which is a form of dynamic querying) with NativeAOT/precompilation - that's not something that interests everyone. There's nothing wrong with using this in regular, non-NativeAOT programs.

(in fact, we may be able to support this very specific form of dynamic querying even in NativeAOT/precompilation - but that's not sure).

But in any case, readability and performance align here: when you remove duplicated subqueries by introducing a CTE, you both make the SQL more readable and remove additional work for the database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants