Skip to content

Conversation

@jepett0
Copy link
Contributor

@jepett0 jepett0 commented Dec 28, 2023

KIKIMR-19567

The approach is the following:

  • save query text specified in CREATE VIEW statement in SchemeShard,
  • whenever we encounter a Read! operation from a view, rewrite it on the provider side. Such rewriting can only happen on the Kikimr (a. k. a. YDB) side, because YQL itself knows nothing about the view: the query text is stored in YDB.
  • repeat some of the steps of the YQL transformer pipeline, so the query stored in the view can be evaluated. As a prime example: we need to rewrite all the reads from other views that we might encounter inside the query stored in the original view.

Rewriting a node happens during parsing of the YQL statement. At this moment the statement is expressed as a graph of TExprNode nodes. It is a better version of an AST of the statement. It is actually a DAG: some nodes appear multiple times in it.

Rewriting a read from a view consists of these steps:

  1. Load view metadata (it is disguised as a generic table metadata, but with a view specific field present).
  2. Get the query text from the metadata.
  3. Read! from a view is expressed as a callable TExprNode, which we need to rewrite to correspond to a TExprNode, which provides results from the subquery, stored in the view. The rewritten node will replace the Read! from the view node in the execution graph of a SELECT from a view statement. The first step of the rewriting is to build the view query graph.
  4. Then we need to correctly express the execution order dependencies between the operations, happening inside the view graph, and the operations outside the view graph. This consists of two parts:

a) inject dependency of the first operation in the view graph on the last operation before the Read! from the view. In more technical terms: replace all the pure worlds (i. e. worlds without any dependencies) inside the view graph with the outer world of the Read! from the view node.

b) output the last operation, happening in the view graph, outside the graph to make the outside nodes, whose operations should happen after the read from the view, to be dependent on the last operation of the view. In other words: return the last operation in the view graph as a result of a Left! query from the rewritten Read! from a view node. (The Left! query is a way to get from a node its current state of the world. And a world node is way we express time dependency between operations in our backend functional language (s-expressions).)

@github-actions
Copy link

github-actions bot commented Dec 28, 2023

Note

This is an automated comment that will be appended during run.

🔴 linux-x86_64-relwithdebinfo: some tests FAILED for commit 586058f.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
59907 50686 0 3 9206 12

🔴 linux-x86_64-release-asan: some tests FAILED for commit 586058f.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
15887 15757 0 14 94 22

const auto readNode = node->ChildPtr(0);
const auto worldBeforeThisRead = readNode->ChildPtr(0);

TExprNode::TPtr queryGraph = FindSavedQueryGraph(readNode);
Copy link
Contributor Author

@jepett0 jepett0 Dec 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I save the query graph as the last child of the Read! from the view node to reuse it for both:

  1. rewriting of the Left! callable from the Read! from the view node,
  2. rewriting of the Right! callable from the Read! from the view node.

I need all the generated Read! (and other nodes) of the view graph to be the same ("same" = same object) for both Left! and Right! rewrites, because CheckTx expects all the Read! nodes to be seen on the path from the root node of the YQL statement by the first child.

CompareResults(etalonResults, selectFromViewResults);
}

Y_UNIT_TEST(ReadTestCasesFromFiles) {
Copy link
Contributor Author

@jepett0 jepett0 Dec 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrote my own .sql tests "framework", because it was faster at the moment. Will change them to canonical tests in the next iteration of the development.

@jepett0 jepett0 force-pushed the VIEWs.select_from_view.1 branch 2 times, most recently from 3e9cd42 to 712ecfa Compare January 11, 2024 07:33
return nullptr;
}

YQL_ENSURE(CheckTopLevelness(*lastReadInTopologicalOrder, queryGraph),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is a good idea. This message is for developers, not for the user. However, I would like to check this assumption in the code somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, if I messed up in choosing the correct Read! node in this function, then the user will get an even more cryptic error message:

"Failed to execute callable with name: ResWrite!, you possibly used cross provider/cluster operations or pulled not materialized result in refselect mode"

This message appears whenever KQP failed to substitute some ResWrite! node to a TKiExecDataQuery! node, which usually happens when there are nodes with side-effects (like Read!) not seen on the path by the first child from the root of the query graph (see CheckTx function).

@gridnevvvit
Copy link
Member

briefly pr looks okay to me, btw I'll let peers to take a look a on this, if there are no other comments I will approve request.

const TString tablePath = key.GetTablePath();
auto& tableDesc = SessionCtx->Tables().GetTable(cluster, tablePath);
if (key.GetKeyType() == TKikimrKey::Type::Table) {
if (tableDesc.Metadata->Kind == EKikimrTableKind::External) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will improve readability here. I'm only interested in 2 out of 6 cases of this enum. I don't think switch will be more concise.

}

ctx.Step
.Repeat(TExprStep::ExprEval)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow expression evaluation inside VIEW query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took this line from yt implementation of views.

Do you think expression evaluation inside views should be forbidden? I'm not sure, what it is exactly, but in my opinion the example from the docs:

$now = CurrentUtcDate();
SELECT EvaluateExpr(
    DateTime::MakeDate(DateTime::StartOfWeek($now)
    )
);

looks fine and it is reasonable to allow it inside views.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bug in the way VIEWs evaluate expressions, see this ticket. In particular, SELECT EvaluateExpr(CurrentUtcTimestamp()) will be evaluated only once, if saved inside a VIEW.

};

struct TViewPersistedData {
TString QueryText;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it SQL or AST?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is SQL. Ideally we would like to save the original text that the user specified at the moment of the CREATE VIEW call. However, I didn't find a way to save the original text (with the original formatting). I save the query text that is recovered from the protobuf of the select_stmt message.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should save original query text, since you need to preserve user comments - we support optimiser hints via comments.

You can save original user query as following

  1. remember position of first and last SELECT token
  2. There is class TTextWalker somewhere in parser code that can help to translate position offsets to byte offset in original query

Also you need to save some parsers settings together with query text. At least ansi_lexer flag (otherwise you can not recompile query correctly)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be done in a separate ticket to speed up the development and split the big task into smaller ones for better workflow

YQL_CLOG(TRACE, ProviderKqp) << "Expression graph of the query stored in the view:\n"
<< NCommon::ExprToPrettyString(ctx, *queryGraph);

InsertExecutionOrderDependencies(queryGraph, worldBeforeThisRead);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit like a hack. Parser should be able to return you all "root" world dependences, so you don't have to search for the in query graph.

Copy link
Contributor Author

@jepett0 jepett0 Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know of a way to do it. I asked Andrey Neporada (nepal) and he doesn't know about such a function either. He said in a private message that my approach (talking about FindTopLevelRead) seems ok. InsertExecutionOrderDependencies looks even more straight forward to me, so I don't imagine there could be a ready-made function for this in the parser.

<< NCommon::ExprToPrettyString(ctx, *queryGraph);

InsertExecutionOrderDependencies(queryGraph, worldBeforeThisRead);
SaveQueryGraph(readNode, ctx, queryGraph);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure why do we need to save view AST to Read? Why not just rewrite the query immediately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explained the motivation in this comment. The idea is that in most cases we use nodes that express Read!s from views in two ways:

  1. call Left! on the read node,
  2. call Right! on that same read node.

I noticed that in order for all the nodes with side-effects (like Read! from tables and such) in the rewritten query graph of the select from the view statement to be linearly stacked on top of each other, we need to ensure that the query graph of the view query is exactly the same in both rewrites: a) of the Left! call and b) of the Right! call to the node reading from the view. Otherwise, the rewritten query will fail on CheckTx function.

I didn't find a better way to ensure that we call CompileExpr on the view query only once.

Copy link
Collaborator

@nepal nepal Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return queryGraph;
}

const auto topLevelRead = FindTopLevelRead(queryGraph);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should at least combine InsertExecutionOrderDependencies and FindTopLevelRead methods. They serve the same purpose: get world topology roots and leafs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the them (InsertExecutionOrderDependencies) is needed for both rewrites (rewrite of the Left! call to the read node and rewrite of the Right! call to the read node), while the other one (FindTopLevelRead) is needed only for the rewrite of the Left! call the read node. In addition, one of them does change the query graph, while the other is only searching for a node in it. I would like to keep them separated.

}

const auto topLevelRead = FindTopLevelRead(queryGraph);
if (!topLevelRead) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is that possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually pretty easy to have a query stored in a view that does not contain any reads. For example,

SELECT 1

@jepett0 jepett0 force-pushed the VIEWs.select_from_view.1 branch from 712ecfa to 586058f Compare January 15, 2024 20:44
};

struct TViewPersistedData {
TString QueryText;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should save original query text, since you need to preserve user comments - we support optimiser hints via comments.

You can save original user query as following

  1. remember position of first and last SELECT token
  2. There is class TTextWalker somewhere in parser code that can help to translate position offsets to byte offset in original query

Also you need to save some parsers settings together with query text. At least ansi_lexer flag (otherwise you can not recompile query correctly)

Selecting from a view currently works for the following queries written in the view:
- SELECT 1
- SELECT * FROM SomeTable
- SELECT a, b FROM SomeTable (you can specify which columns you need)
- SELECT * FROM FirstTable JOIN SecondTable ON ...
- SELECT * FROM FirstTable UNION SELECT * FROM SecondTable
- SELECT * FROM SomeOtherView
- SELECT * FROM FirstView JOIN SecondView ON ...

The idea of the implementation is the following: whenever we encounter TExprNode corresponding to a Right! read from a view, we change this node to a TExprNode, which corresponds to the compiled query, stored in the view. This is done on RewriteIO stage of the pipeline. The biggest challenge was to meet all the expectations of the CheckTx function applied to the rewritten query.
…= TRUE) option. Add TablePrefixPath pragma test for CREATE VIEW / DROP VIEW commands
Unify the error message in case of the disabled "EnableViews" feature flag with CREATE VIEW command. Slightly better formatting of error messages
- Unit tests for disabled feature flag.
- Explicit error in case views haven't been rewritten at later optimizer
stage.
- Explicit node typing in RewriteReadFromView.
- Remove unnecessary check in RewriteReadFromView.
- Formatting.
VisitExpr is unconventional.
+ Better error printout in unit tests of views.
@jepett0 jepett0 force-pushed the VIEWs.select_from_view.1 branch from 586058f to 504dad8 Compare January 18, 2024 05:16
@github-actions
Copy link

github-actions bot commented Jan 18, 2024

2024-01-18 05:19:16 UTC Pre-commit check for 504dad8 has started.
2024-01-18 05:19:18 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-01-18 05:24:15 UTC Build successful.
2024-01-18 05:24:32 UTC Tests are running...
🔴 2024-01-18 07:05:18 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
59969 50734 0 2 9221 12

@github-actions
Copy link

github-actions bot commented Jan 18, 2024

2024-01-18 05:19:33 UTC Pre-commit check for 504dad8 has started.
2024-01-18 05:19:35 UTC Build linux-x86_64-release-asan is running...
🟢 2024-01-18 05:24:58 UTC Build successful.
2024-01-18 05:25:10 UTC Tests are running...
🔴 2024-01-18 06:59:27 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
15908 15823 0 21 43 21

@jepett0 jepett0 requested review from nepal and spuchin January 18, 2024 09:16
@starlinskiy starlinskiy mentioned this pull request Feb 12, 2024
@vitstn vitstn mentioned this pull request Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants