Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow standalone set operations (and ValuesExpression) without a wrapping SelectExpression #32890

Open
Tracked by #34524
roji opened this issue Jan 22, 2024 · 3 comments

Comments

@roji
Copy link
Member

roji commented Jan 22, 2024

Our SQL query tree architecture currently represents each set operation as wrapped by a SelectExpression; in SQL generation, if the SelectExpression is detected to be a simple wrapper with no other clauses (code), the SelectExpression isn't outputted.

We should change our SQL tree to exactly match the SQL it will output, allowing bare set operations/ValuesExpressions to appear directly as a table in a select expression (and even as the top-level node in the tree).

Having the SelectExpression wrapper in the tree causes some complications, such as having to manage aliases for SelectExpressions that will never actually appear in the output, etc.

Note that to do this, we'll likely need to empty SelectExpression entirely out of private state (otherwise e.g. UnionExpression will also need to have _projectionMapping, identifiers etc.). This also depends on switching to immutability during translation (at least for this), since applying a set operation would no longer mutate a select (pushing itself down), but rather return a new set operation expression wrapping that select.

@ajcvickers
Copy link
Contributor

we'll likely need to empty SelectExpression entirely out of private state

How the mighty fall...

@Charlieface
Copy link

This essentially comes back to #32927 in that SelectExpression has a huge amount of private state and logic.

This is compounded by the fact that the shaper.QueryExpression is expected to be a SelectExpression all over the place, despite it being declared as Expression, so that all of that logic can be accessed, where it should have been in visitors. (I think this is what this issue tracks, correct?)

So it's impossible to use an arbitrary TableExpressionBase as the top-level SQL expression. You also can't use a custom TableExpressionBase in the list of SelectExpression.Tables (because private state again), and you can't even replace the whole list with a new one (because the Update function sets _mutable = false which breaks everything).

I'm hitting this again and again in my provider extension (where I'm trying to implement recursive CTEs), and it's become a completely blocking issue. I ended up using reflection to just call the private AddTable function to do what I wanted, and just sorting the mess out further down the pipeline.

Essentially SelectExpression should probably be about 100-200 lines of code: just the various expressions, equality and quoting. Everything else goes in the visitors. It shouldn't care what, why or how you create that SQL, it's just a straight representation of what you want that SQL to look like.

@roji
Copy link
Member Author

roji commented Jul 25, 2024

@Charlieface I completely agree with your comments - the current design is problematic in many ways and indeed hampers our ability to build features and have more extensibility. SelectExpression really needs to become a plain immutable expression with no actual logic in it.

This is something I've started working on but it's quite a huge rearchitecting effort that will take a long time; I hope to do substantial progress on this in 10.

BTW CTE support - including recursive - is also something I hope we'll implement for 10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants