Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Substituting table reference for a CTE #30602

Closed
AntonC9018 opened this issue Mar 31, 2023 · 7 comments
Closed

Substituting table reference for a CTE #30602

AntonC9018 opened this issue Mar 31, 2023 · 7 comments
Labels
closed-no-further-action The issue is closed and no further action is planned. customer-reported

Comments

@AntonC9018
Copy link

Problem description

Say I generate my SQL with EF Core. Is there a way to make it do joins on the CTE rather than the original table? So for example if I do a

context.Owners
   .Select(o => new
   {
       Id = o.Id,
       Projects = o.Projects,
   });

Can I make the join with projects use a CTE, so it looks kinda like this:

context.Owners
   .SubstituteJoins(context.Projects, p => p.Where(p => p.Name.Contains(" ")))
   .Select(o => new
   {
       Id = o.Id,
       Projects = o.Projects,
   })

Generating a query that's similar to this:

WITH view AS (
  SELECT *
  FROM projects
  WHERE name LIKE '% %'
)
SELECT *
FROM owner
LEFT JOIN view ON view.ownerId = owner.id;

To be clear, what I want is for all joins with the projects table to be substituted for a join to a CTE.

Due to the fact that, in my understanding, EF Core cannot generate CTE's, is there a way to just force it to replace joins with the Projects table to view, which I would then append to the SQL query as text? I guess an idea is to replace the strings which reference the Projects table after it generates the SQL, but is there a way to do this at one higher level of abstraction?

Is my idea flawed, or is there a simpler way, you think?

The same can kinda be achieved with nested queries on all nested fields, but my problem with this is that it requires redoing the projections. Thinking of the Projects table as a view feels simpler in this case.

Hack the query after generation without much extra work

In the example above, EF Core by default generates the following query (note that it does not prefix Projects with dbo):

SELECT [p].[Id], [p0].[Id], [p0].[PersonId], [p0].[ProjectName]
FROM [People] AS [p]
LEFT JOIN [Projects] AS [p0] ON [p].[Id] = [p0].[PersonId]
ORDER BY [p].[Id]

Now If I just prepend a CTE definition, effectively shadowing the name of the projects table, making it now refer to that new CTE in the query:

WITH [Projects] AS (SELECT * FROM [dbo].[Projects] AS [x] WHERE [x].[ProjectName] LIKE '% %')

I get a valid query as a result.

Is this approach flawed? Like, does it have problems with security? Are there cases when EF Core renames the tables in the generated queries, or prepends [dbo]? If not, it seems to me like it would always have the result what I want it to.

Now, in order to prepend this string with the CTE definition, I have to define a DbCommandInterceptor. Do you know of any other way?

Let's suppose I went the DbCommandInterceptor route. The documentation provides a few examples of how you can tag your queries with strings to add some context for the interceptor. Since the problem with queries is a bit more dynamic, requiring attaching at least the CTE definition string to the executing query, I could make this approach viable if there was a way to pass some more data than a simple string tag with the context of the query.

More context

I'm actually working with GraphQL and trying to implement filtering of records of certain types at all levels of nesting. Otherwise there appear security issues. The problem is similar to row-level security in DB design.

The fact that I'm using a library (HotChocolate) for projections, rather than doing my own, makes this required filtering impossible, without redoing the projection logic manually. And even if I do it that route, it will result in nested queries rather than CTE's. While I am not convinced CTE's impact the query performance, they are definitely more readable and will result in somewhat less expression rewriting in the projection logic.

Joining CTE's as substitutes of the actual tables feels easier conceptually.

@AntonC9018 AntonC9018 changed the title Prefiltering tables in SQL queries Substituting CTE for table reference Mar 31, 2023
@AntonC9018 AntonC9018 changed the title Substituting CTE for table reference Substituting table reference for a CTE Mar 31, 2023
@roji
Copy link
Member

roji commented Mar 31, 2023

Duplicate of #26486

@roji roji marked this as a duplicate of #26486 Mar 31, 2023
@roji
Copy link
Member

roji commented Mar 31, 2023

EF doesn't currently support CTEs (WITH), #26486 tracks that.

Reading your post, I don't really understand what "rewriting all joins as CTEs" would achieve, and why you're trying to do this... CTEs don't magically make your query faster; they can definitely do that when e.g. the same subquery is referenced twice in different places, but just moving a single-used subquery out to a CTE is very unlikely to improve anything. CTEs also don't always increase readability; sometimes they do, and sometimes they make the query far more complicated to understand (e.g. as otherwise simple inlined subqueries are now moved outside and referenced via aliases).

So aside from the general lack of CTE support in EF, I'd need more context to understand what you're trying to achieve and why, and what this has to do with GraphQL.

Also, doing any sort of complex SQL manipulation in a command interceptor is generally discouraged; aside from very simple changes, you need to actually parse SQL to do things reliably, which is obviously not possible.

@AntonC9018
Copy link
Author

AntonC9018 commented Mar 31, 2023

Reading your post, I don't really understand what "rewriting all joins as CTEs" would achieve, and why you're trying to do this...

As I said, I need to replace joins to table X with a join to a CTE expression based on X, EVERYWHERE that I have it in the query. Effectively making the name of this table refer to the local CTE does that pretty elegantly.

Please, answer my question on whether simply doing that by prepending a CTE definition would entail any security risks. Quoting the relevant part of the post here:

Is this approach flawed? Like, does it have problems with security? Are there cases when EF Core renames the tables in the generated queries, or prepends [dbo]?

Also, please tell me if there's a way to pass context data to the interceptor.

So aside from the general lack of CTE support in EF, I'd need more context to understand what you're trying to achieve and why, and what this has to do with GraphQL.

I'm trying to implement automatic nested filters. Effectively what I want to do is implement a filter on all expressions of a particular type, even in nested expressions. An example could be trying to query companies by countries. The user should only see the companies that belong to them, so a query like this:

context.Countries.Select(c => c.Companies);

Should be rewritten to something like this (current user comes e.g. from the HttpContext):

context.Countries.Select(
    c => c.Companies.Where(
        c1 => c1.OwnerId == currentUser.Id))

Note 1: The filter (Where) has to be applied on projection. It's impossible to do this in a normal where clause. Also, doing this in an Include, and then doing a projection with Select removes the Include logic. Hence, this logic has to be combined with projections, This would force me to reimplement the projection generation logic from the GraphQL library, which I'd rather avoid;

Note 2: While it's trivial to write such projection for any particular query, doing it for a general case, adding the filter at any level of nesting, involves some pretty complicated work with expression trees;

Note 3: Even if I do write that logic, it will produce nested queries instead of CTE's when converted into SQL by EF Core.

Ideally, I want a piece of syntax like this:

context.Countries
    .SubstituteTableForCte<Company>(
        cteExpression: (IEnumerable<Company> companies) => companies.Where(c => c.OwnerId == currentUser.Id))
 // do the query ...

Or at least like this:

context.Countries
    .AddCte($"WITH [Company] AS (SELECT * FROM [dbo].[Company] AS [c] WHERE [c].[ownerId] == {currentUser.Id})")
    // do the query

Which would then get prepended to the query text, effectively replacing references to Company for this CTE.

Also, doing any sort of complex SQL manipulation in a command interceptor is generally discouraged; aside from very simple changes, you need to actually parse SQL to do things reliably, which is obviously not possible.

I don't need to parse anything if I'm just gonna prepend a string. The only thing I need for that is to get the said string passed to the interceptor somehow.

@roji
Copy link
Member

roji commented Mar 31, 2023

Please, answer my question on whether simply doing that by prepending a CTE definition would entail any security risks.

By itself, I can't see how prepending your SQL with a CTE definition would involve any specific security risks, assuming that definition itself is secure. That really isn't a question that's related to EF itself in any way, since you're simply prepending a string to your SQL.

Regardless, the complexity here isn't prepending a CTE definition, it's changing the EF-generated SQL to actually refer to that CTE. You can try to do a string search-and-replace over the SQL, but that's generally very brittle and discouraged as I wrote above.

In general, manipulations of the SQL are better done by inserting an expression visitor into EF's query pipeline. This allows you to visit the query as an expression tree, and change it, e.g. by inserting new nodes. I'm still not sure what inserting a filter everywhere has to do with a CTE, but you seem to know what you're trying to do here.

Also, please tell me if there's a way to pass context data to the interceptor.

It's a bit difficult to answer this kind of question with no details - where would this context data come from? Is it per-query? This example in our docs shows how to use a query tag to pass information that gets picked up by an interceptor.

@AntonC9018
Copy link
Author

In general, manipulations of the SQL are better done by inserting an expression visitor into EF's query pipeline. This allows you to visit the query as an expression tree, and change it, e.g. by inserting new nodes.

But this is pointless for CTE's, right? Because there's no CTE representation currently.

It's a bit difficult to answer this kind of question with no details - where would this context data come from? Is it per-query? This example in our docs shows how to use a query tag to pass information that gets picked up by an interceptor.

As mentioned, I am aware of the ability to pass a tag I'm asking if one can pass more than just a tag. And yes, it should come per query.

@AntonC9018
Copy link
Author

By itself, I can't see how prepending your SQL with a CTE definition would involve any specific security risks, assuming that definition itself is secure. That really isn't a question that's related to EF itself in any way, since you're simply prepending a string to your SQL.

I meant the fact that I shadow a table name with that CTE name. Quoting a part of my question here again:

Are there cases when EF Core renames the tables in the generated queries, or prepends [dbo]?

@roji
Copy link
Member

roji commented Apr 1, 2023

In general, manipulations of the SQL are better done by inserting an expression visitor into EF's query pipeline. This allows you to visit the query as an expression tree, and change it, e.g. by inserting new nodes.

But this is pointless for CTE's, right? Because there's no CTE representation currently.

I was referring to stuff like inserting additional filters into the tree. With CTE you're right that it's not supported yet.

As mentioned, I am aware of the ability to pass a tag I'm asking if one can pass more than just a tag. And yes, it should come per query.

No, nothing like this exists today - it hasn't been requested AFAIK.

Are there cases when EF Core renames the tables in the generated queries, or prepends [dbo]

EF uses aliases in the SQL it generates, so a table [People] would be represented by an alias [p]. If two tables get the same alias (e.g. People and Parties), one gets deduplicated (so [p] and [p0]). You can try out a few queries to see how exactly it works.

I'm still lacking more general context on exactly why you're attempting to do what you're attempting. If the goal is to simply add filters on entities being queried (as you seemed to write above), then manipulating the tree to add Where clauses should be sufficient. This is generally done by users in the model using the query filters feature. If query filters don't work for you, you should be able to use the new EF 7.0 query interceptor to find all roots and Where nodes on top of them. Or you can do this later in the query pipeline by inserting a visitor at some point.

I'm not clear on why just inserting Where in this way isn't sufficient; you wrote below that this would result in "nested queries", but I'm not sure why that's a problem and why CTEs would be better here. In other words, if you just want CTEs for personal preference (because you find them nicer and more readable), that's one thing; it's definitely not likely to be easy to force this on EF, which doesn't support them. Otherwise, if you're looking to inject filters, I'd concentrate on doing that rather than thinking about CTEs.

@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Apr 13, 2023
@ajcvickers ajcvickers added the closed-no-further-action The issue is closed and no further action is planned. label Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-no-further-action The issue is closed and no further action is planned. customer-reported
Projects
None yet
Development

No branches or pull requests

3 participants