-
Notifications
You must be signed in to change notification settings - Fork 740
Enable SELECT from views #795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note This is an automated comment that will be appended during run. 🔴 linux-x86_64-relwithdebinfo: some tests FAILED for commit 586058f.
🔴 linux-x86_64-release-asan: some tests FAILED for commit 586058f.
|
| const auto readNode = node->ChildPtr(0); | ||
| const auto worldBeforeThisRead = readNode->ChildPtr(0); | ||
|
|
||
| TExprNode::TPtr queryGraph = FindSavedQueryGraph(readNode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I save the query graph as the last child of the Read! from the view node to reuse it for both:
- rewriting of the Left! callable from the Read! from the view node,
- rewriting of the Right! callable from the Read! from the view node.
I need all the generated Read! (and other nodes) of the view graph to be the same ("same" = same object) for both Left! and Right! rewrites, because CheckTx expects all the Read! nodes to be seen on the path from the root node of the YQL statement by the first child.
| CompareResults(etalonResults, selectFromViewResults); | ||
| } | ||
|
|
||
| Y_UNIT_TEST(ReadTestCasesFromFiles) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrote my own .sql tests "framework", because it was faster at the moment. Will change them to canonical tests in the next iteration of the development.
3e9cd42 to
712ecfa
Compare
| return nullptr; | ||
| } | ||
|
|
||
| YQL_ENSURE(CheckTopLevelness(*lastReadInTopologicalOrder, queryGraph), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is a good idea. This message is for developers, not for the user. However, I would like to check this assumption in the code somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, if I messed up in choosing the correct Read! node in this function, then the user will get an even more cryptic error message:
"Failed to execute callable with name: ResWrite!, you possibly used cross provider/cluster operations or pulled not materialized result in refselect mode"
This message appears whenever KQP failed to substitute some ResWrite! node to a TKiExecDataQuery! node, which usually happens when there are nodes with side-effects (like Read!) not seen on the path by the first child from the root of the query graph (see CheckTx function).
|
briefly pr looks okay to me, btw I'll let peers to take a look a on this, if there are no other comments I will approve request. |
| const TString tablePath = key.GetTablePath(); | ||
| auto& tableDesc = SessionCtx->Tables().GetTable(cluster, tablePath); | ||
| if (key.GetKeyType() == TKikimrKey::Type::Table) { | ||
| if (tableDesc.Metadata->Kind == EKikimrTableKind::External) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it will improve readability here. I'm only interested in 2 out of 6 cases of this enum. I don't think switch will be more concise.
| } | ||
|
|
||
| ctx.Step | ||
| .Repeat(TExprStep::ExprEval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we allow expression evaluation inside VIEW query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took this line from yt implementation of views.
Do you think expression evaluation inside views should be forbidden? I'm not sure, what it is exactly, but in my opinion the example from the docs:
$now = CurrentUtcDate();
SELECT EvaluateExpr(
DateTime::MakeDate(DateTime::StartOfWeek($now)
)
);looks fine and it is reasonable to allow it inside views.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bug in the way VIEWs evaluate expressions, see this ticket. In particular, SELECT EvaluateExpr(CurrentUtcTimestamp()) will be evaluated only once, if saved inside a VIEW.
| }; | ||
|
|
||
| struct TViewPersistedData { | ||
| TString QueryText; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it SQL or AST?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is SQL. Ideally we would like to save the original text that the user specified at the moment of the CREATE VIEW call. However, I didn't find a way to save the original text (with the original formatting). I save the query text that is recovered from the protobuf of the select_stmt message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should save original query text, since you need to preserve user comments - we support optimiser hints via comments.
You can save original user query as following
- remember position of first and last SELECT token
- There is class TTextWalker somewhere in parser code that can help to translate position offsets to byte offset in original query
Also you need to save some parsers settings together with query text. At least ansi_lexer flag (otherwise you can not recompile query correctly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be done in a separate ticket to speed up the development and split the big task into smaller ones for better workflow
| YQL_CLOG(TRACE, ProviderKqp) << "Expression graph of the query stored in the view:\n" | ||
| << NCommon::ExprToPrettyString(ctx, *queryGraph); | ||
|
|
||
| InsertExecutionOrderDependencies(queryGraph, worldBeforeThisRead); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit like a hack. Parser should be able to return you all "root" world dependences, so you don't have to search for the in query graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know of a way to do it. I asked Andrey Neporada (nepal) and he doesn't know about such a function either. He said in a private message that my approach (talking about FindTopLevelRead) seems ok. InsertExecutionOrderDependencies looks even more straight forward to me, so I don't imagine there could be a ready-made function for this in the parser.
| << NCommon::ExprToPrettyString(ctx, *queryGraph); | ||
|
|
||
| InsertExecutionOrderDependencies(queryGraph, worldBeforeThisRead); | ||
| SaveQueryGraph(readNode, ctx, queryGraph); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure why do we need to save view AST to Read? Why not just rewrite the query immediately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explained the motivation in this comment. The idea is that in most cases we use nodes that express Read!s from views in two ways:
- call Left! on the read node,
- call Right! on that same read node.
I noticed that in order for all the nodes with side-effects (like Read! from tables and such) in the rewritten query graph of the select from the view statement to be linearly stacked on top of each other, we need to ensure that the query graph of the view query is exactly the same in both rewrites: a) of the Left! call and b) of the Right! call to the node reading from the view. Otherwise, the rewritten query will fail on CheckTx function.
I didn't find a better way to ensure that we call CompileExpr on the view query only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| return queryGraph; | ||
| } | ||
|
|
||
| const auto topLevelRead = FindTopLevelRead(queryGraph); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should at least combine InsertExecutionOrderDependencies and FindTopLevelRead methods. They serve the same purpose: get world topology roots and leafs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the them (InsertExecutionOrderDependencies) is needed for both rewrites (rewrite of the Left! call to the read node and rewrite of the Right! call to the read node), while the other one (FindTopLevelRead) is needed only for the rewrite of the Left! call the read node. In addition, one of them does change the query graph, while the other is only searching for a node in it. I would like to keep them separated.
| } | ||
|
|
||
| const auto topLevelRead = FindTopLevelRead(queryGraph); | ||
| if (!topLevelRead) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is that possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is actually pretty easy to have a query stored in a view that does not contain any reads. For example,
SELECT 1712ecfa to
586058f
Compare
| }; | ||
|
|
||
| struct TViewPersistedData { | ||
| TString QueryText; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should save original query text, since you need to preserve user comments - we support optimiser hints via comments.
You can save original user query as following
- remember position of first and last SELECT token
- There is class TTextWalker somewhere in parser code that can help to translate position offsets to byte offset in original query
Also you need to save some parsers settings together with query text. At least ansi_lexer flag (otherwise you can not recompile query correctly)
Selecting from a view currently works for the following queries written in the view: - SELECT 1 - SELECT * FROM SomeTable - SELECT a, b FROM SomeTable (you can specify which columns you need) - SELECT * FROM FirstTable JOIN SecondTable ON ... - SELECT * FROM FirstTable UNION SELECT * FROM SecondTable - SELECT * FROM SomeOtherView - SELECT * FROM FirstView JOIN SecondView ON ... The idea of the implementation is the following: whenever we encounter TExprNode corresponding to a Right! read from a view, we change this node to a TExprNode, which corresponds to the compiled query, stored in the view. This is done on RewriteIO stage of the pipeline. The biggest challenge was to meet all the expectations of the CheckTx function applied to the rewritten query.
…= TRUE) option. Add TablePrefixPath pragma test for CREATE VIEW / DROP VIEW commands
Unify the error message in case of the disabled "EnableViews" feature flag with CREATE VIEW command. Slightly better formatting of error messages
- Unit tests for disabled feature flag. - Explicit error in case views haven't been rewritten at later optimizer stage. - Explicit node typing in RewriteReadFromView. - Remove unnecessary check in RewriteReadFromView. - Formatting.
VisitExpr is unconventional. + Better error printout in unit tests of views.
586058f to
504dad8
Compare
|
⚪
|
|
⚪
|
KIKIMR-19567
The approach is the following:
Rewriting a node happens during parsing of the YQL statement. At this moment the statement is expressed as a graph of TExprNode nodes. It is a better version of an AST of the statement. It is actually a DAG: some nodes appear multiple times in it.
Rewriting a read from a view consists of these steps:
a) inject dependency of the first operation in the view graph on the last operation before the Read! from the view. In more technical terms: replace all the pure worlds (i. e. worlds without any dependencies) inside the view graph with the outer world of the Read! from the view node.
b) output the last operation, happening in the view graph, outside the graph to make the outside nodes, whose operations should happen after the read from the view, to be dependent on the last operation of the view. In other words: return the last operation in the view graph as a result of a Left! query from the rewritten Read! from a view node. (The Left! query is a way to get from a node its current state of the world. And a world node is way we express time dependency between operations in our backend functional language (s-expressions).)