Enable SELECT from views #795

jepett0 · 2023-12-28T19:48:01Z

KIKIMR-19567

The approach is the following:

save query text specified in CREATE VIEW statement in SchemeShard,
whenever we encounter a Read! operation from a view, rewrite it on the provider side. Such rewriting can only happen on the Kikimr (a. k. a. YDB) side, because YQL itself knows nothing about the view: the query text is stored in YDB.
repeat some of the steps of the YQL transformer pipeline, so the query stored in the view can be evaluated. As a prime example: we need to rewrite all the reads from other views that we might encounter inside the query stored in the original view.

Rewriting a node happens during parsing of the YQL statement. At this moment the statement is expressed as a graph of TExprNode nodes. It is a better version of an AST of the statement. It is actually a DAG: some nodes appear multiple times in it.

Rewriting a read from a view consists of these steps:

Load view metadata (it is disguised as a generic table metadata, but with a view specific field present).
Get the query text from the metadata.
Read! from a view is expressed as a callable TExprNode, which we need to rewrite to correspond to a TExprNode, which provides results from the subquery, stored in the view. The rewritten node will replace the Read! from the view node in the execution graph of a SELECT from a view statement. The first step of the rewriting is to build the view query graph.
Then we need to correctly express the execution order dependencies between the operations, happening inside the view graph, and the operations outside the view graph. This consists of two parts:

a) inject dependency of the first operation in the view graph on the last operation before the Read! from the view. In more technical terms: replace all the pure worlds (i. e. worlds without any dependencies) inside the view graph with the outer world of the Read! from the view node.

b) output the last operation, happening in the view graph, outside the graph to make the outside nodes, whose operations should happen after the read from the view, to be dependent on the last operation of the view. In other words: return the last operation in the view graph as a result of a Left! query from the rewritten Read! from a view node. (The Left! query is a way to get from a node its current state of the world. And a world node is way we express time dependency between operations in our backend functional language (s-expressions).)

github-actions · 2023-12-28T21:20:30Z

Note

This is an automated comment that will be appended during run.

🔴 linux-x86_64-relwithdebinfo: some tests FAILED for commit 586058f.

Test history

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
59907	50686	0	3	9206	12

🔴 linux-x86_64-release-asan: some tests FAILED for commit 586058f.

Test history

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
15887	15757	0	14	94	22

ydb/core/kqp/provider/yql_kikimr_datasource.cpp

jepett0 · 2023-12-29T13:36:11Z

ydb/core/kqp/provider/rewrite_io_utils.cpp

+    const auto readNode = node->ChildPtr(0);
+    const auto worldBeforeThisRead = readNode->ChildPtr(0);
+
+    TExprNode::TPtr queryGraph = FindSavedQueryGraph(readNode);


I save the query graph as the last child of the Read! from the view node to reuse it for both:

rewriting of the Left! callable from the Read! from the view node,

rewriting of the Right! callable from the Read! from the view node.

I need all the generated Read! (and other nodes) of the view graph to be the same ("same" = same object) for both Left! and Right! rewrites, because CheckTx expects all the Read! nodes to be seen on the path from the root node of the YQL statement by the first child.

jepett0 · 2023-12-29T13:45:57Z

ydb/core/kqp/ut/view/view_ut.cpp

+        CompareResults(etalonResults, selectFromViewResults);
+    }
+
+    Y_UNIT_TEST(ReadTestCasesFromFiles) {


Wrote my own .sql tests "framework", because it was faster at the moment. Will change them to canonical tests in the next iteration of the development.

jepett0 · 2024-01-11T10:41:13Z

ydb/core/kqp/provider/rewrite_io_utils.cpp

+        return nullptr;
+    }
+
+    YQL_ENSURE(CheckTopLevelness(*lastReadInTopologicalOrder, queryGraph),


Not sure if this is a good idea. This message is for developers, not for the user. However, I would like to check this assumption in the code somehow.

On the other hand, if I messed up in choosing the correct Read! node in this function, then the user will get an even more cryptic error message:

"Failed to execute callable with name: ResWrite!, you possibly used cross provider/cluster operations or pulled not materialized result in refselect mode"

This message appears whenever KQP failed to substitute some ResWrite! node to a TKiExecDataQuery! node, which usually happens when there are nodes with side-effects (like Read!) not seen on the path by the first child from the root of the query graph (see CheckTx function).

gridnevvvit · 2024-01-11T15:34:59Z

briefly pr looks okay to me, btw I'll let peers to take a look a on this, if there are no other comments I will approve request.

ydb/core/kqp/opt/logical/kqp_opt_log_ranges_predext.cpp

ydb/core/kqp/gateway/kqp_metadata_loader.cpp

spuchin · 2024-01-11T12:45:31Z

ydb/core/kqp/provider/yql_kikimr_datasource.cpp

+        const TString tablePath = key.GetTablePath();
+        auto& tableDesc = SessionCtx->Tables().GetTable(cluster, tablePath);
+        if (key.GetKeyType() == TKikimrKey::Type::Table) {
+            if (tableDesc.Metadata->Kind == EKikimrTableKind::External) {


I don't think it will improve readability here. I'm only interested in 2 out of 6 cases of this enum. I don't think switch will be more concise.

spuchin · 2024-01-11T12:47:01Z

ydb/core/kqp/provider/yql_kikimr_datasource.cpp

+                }
+
+                ctx.Step
+                    .Repeat(TExprStep::ExprEval)


Do we allow expression evaluation inside VIEW query?

I took this line from yt implementation of views.

Do you think expression evaluation inside views should be forbidden? I'm not sure, what it is exactly, but in my opinion the example from the docs:

$now = CurrentUtcDate(); SELECT EvaluateExpr( DateTime::MakeDate(DateTime::StartOfWeek($now) ) );

looks fine and it is reasonable to allow it inside views.

There is a bug in the way VIEWs evaluate expressions, see this ticket. In particular, SELECT EvaluateExpr(CurrentUtcTimestamp()) will be evaluated only once, if saved inside a VIEW.

spuchin · 2024-01-11T12:47:56Z

ydb/core/kqp/provider/yql_kikimr_gateway.h

 };

+struct TViewPersistedData {
+    TString QueryText;


Is it SQL or AST?

It is SQL. Ideally we would like to save the original text that the user specified at the moment of the CREATE VIEW call. However, I didn't find a way to save the original text (with the original formatting). I save the query text that is recovered from the protobuf of the select_stmt message.

You should save original query text, since you need to preserve user comments - we support optimiser hints via comments.

You can save original user query as following

remember position of first and last SELECT token

There is class TTextWalker somewhere in parser code that can help to translate position offsets to byte offset in original query

Also you need to save some parsers settings together with query text. At least ansi_lexer flag (otherwise you can not recompile query correctly)

This will be done in a separate ticket to speed up the development and split the big task into smaller ones for better workflow

ydb/core/kqp/provider/rewrite_io_utils.cpp

spuchin · 2024-01-12T10:43:59Z

ydb/core/kqp/provider/rewrite_io_utils.cpp

+        YQL_CLOG(TRACE, ProviderKqp) << "Expression graph of the query stored in the view:\n"
+                                     << NCommon::ExprToPrettyString(ctx, *queryGraph);
+
+        InsertExecutionOrderDependencies(queryGraph, worldBeforeThisRead);


This looks a bit like a hack. Parser should be able to return you all "root" world dependences, so you don't have to search for the in query graph.

I don't know of a way to do it. I asked Andrey Neporada (nepal) and he doesn't know about such a function either. He said in a private message that my approach (talking about FindTopLevelRead) seems ok. InsertExecutionOrderDependencies looks even more straight forward to me, so I don't imagine there could be a ready-made function for this in the parser.

spuchin · 2024-01-12T10:44:43Z

ydb/core/kqp/provider/rewrite_io_utils.cpp

+                                     << NCommon::ExprToPrettyString(ctx, *queryGraph);
+
+        InsertExecutionOrderDependencies(queryGraph, worldBeforeThisRead);
+        SaveQueryGraph(readNode, ctx, queryGraph);


I'm not quite sure why do we need to save view AST to Read? Why not just rewrite the query immediately?

I explained the motivation in this comment. The idea is that in most cases we use nodes that express Read!s from views in two ways:

call Left! on the read node,

call Right! on that same read node.

I noticed that in order for all the nodes with side-effects (like Read! from tables and such) in the rewritten query graph of the select from the view statement to be linearly stacked on top of each other, we need to ensure that the query graph of the view query is exactly the same in both rewrites: a) of the Left! call and b) of the Right! call to the node reading from the view. Otherwise, the rewritten query will fail on CheckTx function.

I didn't find a better way to ensure that we call CompileExpr on the view query only once.

spuchin · 2024-01-12T10:46:23Z

ydb/core/kqp/provider/rewrite_io_utils.cpp

+        return queryGraph;
+    }
+
+    const auto topLevelRead = FindTopLevelRead(queryGraph);


I think you should at least combine InsertExecutionOrderDependencies and FindTopLevelRead methods. They serve the same purpose: get world topology roots and leafs.

One of the them (InsertExecutionOrderDependencies) is needed for both rewrites (rewrite of the Left! call to the read node and rewrite of the Right! call to the read node), while the other one (FindTopLevelRead) is needed only for the rewrite of the Left! call the read node. In addition, one of them does change the query graph, while the other is only searching for a node in it. I would like to keep them separated.

spuchin · 2024-01-12T10:46:46Z

ydb/core/kqp/provider/rewrite_io_utils.cpp

+    }
+
+    const auto topLevelRead = FindTopLevelRead(queryGraph);
+    if (!topLevelRead) {


How is that possible?

It is actually pretty easy to have a query stored in a view that does not contain any reads. For example,

SELECT 1

ydb/core/kqp/provider/rewrite_io_utils.cpp

nepal · 2024-01-16T11:01:16Z

ydb/core/kqp/provider/yql_kikimr_gateway.h

 };

+struct TViewPersistedData {
+    TString QueryText;


You should save original query text, since you need to preserve user comments - we support optimiser hints via comments.

You can save original user query as following

remember position of first and last SELECT token

There is class TTextWalker somewhere in parser code that can help to translate position offsets to byte offset in original query

Also you need to save some parsers settings together with query text. At least ansi_lexer flag (otherwise you can not recompile query correctly)

Selecting from a view currently works for the following queries written in the view: - SELECT 1 - SELECT * FROM SomeTable - SELECT a, b FROM SomeTable (you can specify which columns you need) - SELECT * FROM FirstTable JOIN SecondTable ON ... - SELECT * FROM FirstTable UNION SELECT * FROM SecondTable - SELECT * FROM SomeOtherView - SELECT * FROM FirstView JOIN SecondView ON ... The idea of the implementation is the following: whenever we encounter TExprNode corresponding to a Right! read from a view, we change this node to a TExprNode, which corresponds to the compiled query, stored in the view. This is done on RewriteIO stage of the pipeline. The biggest challenge was to meet all the expectations of the CheckTx function applied to the rewritten query.

…= TRUE) option. Add TablePrefixPath pragma test for CREATE VIEW / DROP VIEW commands

Unify the error message in case of the disabled "EnableViews" feature flag with CREATE VIEW command. Slightly better formatting of error messages

- Unit tests for disabled feature flag. - Explicit error in case views haven't been rewritten at later optimizer stage. - Explicit node typing in RewriteReadFromView. - Remove unnecessary check in RewriteReadFromView. - Formatting.

VisitExpr is unconventional. + Better error printout in unit tests of views.

github-actions · 2024-01-18T05:19:16Z

⚪ 2024-01-18 05:19:16 UTC Pre-commit check for 504dad8 has started.
⚪ 2024-01-18 05:19:18 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-01-18 05:24:15 UTC Build successful.
⚪ 2024-01-18 05:24:32 UTC Tests are running...
🔴 2024-01-18 07:05:18 UTC Some tests failed, follow the links below.

Test history

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
59969	50734	0	2	9221	12

github-actions · 2024-01-18T05:19:33Z

⚪ 2024-01-18 05:19:33 UTC Pre-commit check for 504dad8 has started.
⚪ 2024-01-18 05:19:35 UTC Build linux-x86_64-release-asan is running...
🟢 2024-01-18 05:24:58 UTC Build successful.
⚪ 2024-01-18 05:25:10 UTC Tests are running...
🔴 2024-01-18 06:59:27 UTC Some tests failed, follow the links below.

Test history

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
15908	15823	0	21	43	21

jepett0 requested review from dorooleg, gridnevvvit, ijon, spuchin and vitstn December 29, 2023 12:48

jepett0 commented Dec 29, 2023

View reviewed changes

ydb/core/kqp/provider/yql_kikimr_datasource.cpp Show resolved Hide resolved

jepett0 commented Dec 29, 2023

View reviewed changes

jepett0 force-pushed the VIEWs.select_from_view.1 branch 2 times, most recently from 3e9cd42 to 712ecfa Compare January 11, 2024 07:33

jepett0 commented Jan 11, 2024

View reviewed changes

spuchin requested changes Jan 12, 2024

View reviewed changes

jepett0 force-pushed the VIEWs.select_from_view.1 branch from 712ecfa to 586058f Compare January 15, 2024 20:44

nepal requested changes Jan 16, 2024

View reviewed changes

jepett0 added 5 commits January 18, 2024 05:08

Add unit tests for the absence of an explicit WITH (security_invoker …

a5c0949

…= TRUE) option. Add TablePrefixPath pragma test for CREATE VIEW / DROP VIEW commands

Check the feature flag before rewriting a read from a view.

1a3c7bb

Unify the error message in case of the disabled "EnableViews" feature flag with CREATE VIEW command. Slightly better formatting of error messages

Review fixes

03b2738

- Unit tests for disabled feature flag. - Explicit error in case views haven't been rewritten at later optimizer stage. - Explicit node typing in RewriteReadFromView. - Remove unnecessary check in RewriteReadFromView. - Formatting.

Use a better way to replace nodes in RewriteReadFromView function

504dad8

VisitExpr is unconventional. + Better error printout in unit tests of views.

jepett0 force-pushed the VIEWs.select_from_view.1 branch from 586058f to 504dad8 Compare January 18, 2024 05:16

jepett0 requested review from nepal and spuchin January 18, 2024 09:16

nepal approved these changes Jan 18, 2024

View reviewed changes

spuchin approved these changes Jan 19, 2024

View reviewed changes

jepett0 merged commit 8e0390c into ydb-platform:main Jan 19, 2024

pavelvelikhov mentioned this pull request Jan 24, 2024

Fixed a minor bug in simplified plan JSON generation #1257

Merged

shnikd mentioned this pull request Feb 5, 2024

Support temp tables in yql #1589

Merged

This was referenced Feb 5, 2024

YQL-17339 import all columns from real PG #1594

Merged

fix aarch64 compile flags #1598

Merged

niksaveliev mentioned this pull request Feb 5, 2024

Fixes lock partition delay and balance test #1600

Merged

pavelvelikhov mentioned this pull request Feb 5, 2024

Generate single JSON with all types of plans in the server #1606

Merged

vitstn mentioned this pull request Feb 5, 2024

YQL-17476 check for cycle during evaluation of file args #1608

Merged

This was referenced Feb 6, 2024

Fix iterator at kafka connect #1611

Closed

Fix iterator at kafka connect #1612

Merged

This was referenced Feb 6, 2024

parallel for #1650

Merged

YQL-17725 drop b and t suffixes #1660

Merged

YQL-17725 tune doc #1667

Merged

Switch pg_parser BC tests on again #1709

Merged

unmute test #1711

Merged

ildar-khisambeev mentioned this pull request Feb 8, 2024

LOGBROKER-8783 bugfix #1713

Merged

vitstn mentioned this pull request Feb 8, 2024

Mark mutable_id for YT writes #1744

Merged

niksaveliev mentioned this pull request Feb 9, 2024

Fix pq writer and few renames #1757

Merged

pavelvelikhov mentioned this pull request Feb 9, 2024

Cannonized two plans #1773

Merged

niksaveliev mentioned this pull request Feb 12, 2024

Workload read without consumer and metrics fixes #1792

Merged

starlinskiy mentioned this pull request Feb 12, 2024

runtime dispatching #1800

Closed

This was referenced Feb 12, 2024

quick fix for column name of select version() #1801

Merged

Support of database sys cache #1817

Merged

starlinskiy mentioned this pull request Feb 12, 2024

runtime dispatching (fixed) #1847

Merged

pavelvelikhov mentioned this pull request Feb 13, 2024

Fixed a problem with simplified plan JSONs #1878

Merged

This was referenced Feb 13, 2024

Case insensivity for pg_catalog tables/cluster, columns. Support of pg_class:relam, and some columns in pg_database & pg_namespace #1893

Merged

fixed pg_get_userbyid function #1936

Merged

support of AnyNonArray type, fixes for array_agg #1958

Merged

pavelvelikhov mentioned this pull request Feb 15, 2024

Fixed a problem with plans in PG ut #1974

Merged

niksaveliev mentioned this pull request Feb 16, 2024

Fix kafka with enabled proxy #2017

Merged

vitstn mentioned this pull request Feb 16, 2024

mock of savepoints #2026

Merged

aakulaga-ydb mentioned this pull request Feb 24, 2024

Fix TzTimestamp wrong encoding (YQL-17920) #2222

Merged

Enable SELECT from views #795

Enable SELECT from views #795

Uh oh!

Conversation

jepett0 commented Dec 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jepett0 Dec 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jepett0 Dec 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gridnevvvit commented Jan 11, 2024

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jepett0 Jan 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nepal Jan 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

jepett0 commented Dec 28, 2023 •

edited

Loading

github-actions bot commented Dec 28, 2023 •

edited

Loading

jepett0 Dec 29, 2023 •

edited

Loading

jepett0 Dec 29, 2023 •

edited

Loading

jepett0 Jan 16, 2024 •

edited

Loading

nepal Jan 18, 2024 •

edited

Loading

github-actions bot commented Jan 18, 2024 •

edited

Loading

github-actions bot commented Jan 18, 2024 •

edited

Loading