From 69698276b59d626aa8d35168b2d987e6bb24dd8a Mon Sep 17 00:00:00 2001 From: emmymiao87 <522274284@qq.com> Date: Fri, 31 Jul 2020 17:42:57 +0800 Subject: [PATCH 1/3] [Docs] Add Materialized view manual Add usage manual of materialized view in Chinese and English Change-Id: Id7dc1a1e0f7b8e13ac12c9524b7b8f3ca5814873 --- docs/.vuepress/sidebar/en.js | 1 + docs/.vuepress/sidebar/zh-CN.js | 7 +- .../administrator-guide/materialized_view.md | 483 +++++++++++++++++ .../administrator-guide/materialized_view.md | 490 ++++++++++++++++++ 4 files changed, 975 insertions(+), 6 deletions(-) create mode 100644 docs/en/administrator-guide/materialized_view.md create mode 100644 docs/zh-CN/administrator-guide/materialized_view.md diff --git a/docs/.vuepress/sidebar/en.js b/docs/.vuepress/sidebar/en.js index 3fb8930200b2ff..140c44dfc555b4 100644 --- a/docs/.vuepress/sidebar/en.js +++ b/docs/.vuepress/sidebar/en.js @@ -69,6 +69,7 @@ module.exports = [ ], sidebarDepth: 2, }, + "materialized_view", { title: "HTTP API", directoryPath: "http-actions/", diff --git a/docs/.vuepress/sidebar/zh-CN.js b/docs/.vuepress/sidebar/zh-CN.js index da69db3ec919d1..e7297c609b8fa9 100644 --- a/docs/.vuepress/sidebar/zh-CN.js +++ b/docs/.vuepress/sidebar/zh-CN.js @@ -70,12 +70,7 @@ module.exports = [ ], sidebarDepth: 2, }, - { - title: "物化视图", - directoryPath: "materialized-view/", - children: [], - sidebarDepth: 2, - }, + "materialized_view", { title: "HTTP API", directoryPath: "http-actions/", diff --git a/docs/en/administrator-guide/materialized_view.md b/docs/en/administrator-guide/materialized_view.md new file mode 100644 index 00000000000000..9db165ff7bd2eb --- /dev/null +++ b/docs/en/administrator-guide/materialized_view.md @@ -0,0 +1,483 @@ +--- +{ + "title": "Materialized view", + "language": "en" +} +--- + + + +# Materialized view +A materialized view is a data set that is pre-calculated (according to a defined SELECT statement) and stored in a special table in Doris. + +The emergence of materialized views is mainly to satisfy users. It can analyze any dimension of the original detailed data, but also can quickly analyze and query fixed dimensions. + +## When to use materialized view + ++ Analyze requirements to cover both detailed data query and fixed-dimensional query. ++ The query only involves a small part of the columns or rows in the table. ++ The query contains some time-consuming processing operations, such as long-time aggregation operations. ++ The query needs to match different prefix indexes. + +## Advantage + ++ For those queries that frequently use the same sub-query results repeatedly, the performance is greatly improved ++ Doris automatically maintains the data of the materialized view, whether it is a new import or delete operation, it can ensure the data consistency of the base table and the materialized view table. No need for any additional labor maintenance costs. ++ When querying, it will automatically match the optimal materialized view and read data directly from the materialized view. + +*Automatic maintenance of materialized view data will cause some maintenance overhead, which will be explained in the limitations of materialized views later.* + +## Materialized View VS Rollup + +Before the materialized view function, users generally used the Rollup function to improve query efficiency through pre-aggregation. However, Rollup has certain limitations. It cannot do pre-aggregation based on the detailed model. + +Materialized views cover the functions of Rollup while also supporting richer aggregate functions. So the materialized view is actually a superset of Rollup. + +In other words, the functions previously supported by the `ALTER TABLE ADD ROLLUP` syntax can now be implemented by `CREATE MATERIALIZED VIEW`. + +## Use materialized views + +The Doris system provides a complete set of DDL syntax for materialized views, including creating, viewing, and deleting. The syntax of DDL is consistent with PostgreSQL and Oracle. + +### Create a materialized view + +Here you must first decide what kind of materialized view to create based on the characteristics of your query statement. This is not to say that your materialized view definition is exactly the same as one of your query statements. There are two principles here: + +1. **Abstract** from the query statement, the grouping and aggregation methods shared by multiple queries are used as the definition of the materialized view. +2. It is not necessary to create materialized views for all dimension combinations. + +First of all, the first point, if a materialized view is abstracted, and multiple queries can be matched to this materialized view. This materialized view works best. Because the maintenance of the materialized view itself also consumes resources. + +If the materialized view only fits a particular query, and other queries do not use this materialized view. As a result, the materialized view is not cost-effective, which not only occupies the storage resources of the cluster, but cannot serve more queries. + +Therefore, users need to combine their own query statements and data dimension information to abstract the definition of some materialized views. + +The second point is that in the actual analysis query, not all dimensional analysis will be covered. Therefore, it is enough to create a materialized view for the commonly used combination of dimensions, so as to achieve a space and time balance. + +The materialized view can be created by the following command. Creating a materialized view is an asynchronous operation, which means that after the user successfully submits the creation task, Doris will calculate the existing data in the background until the creation is successful. + +``` +CREATE MATERIALIZED VIEW +``` + +The specific syntax can be viewed through the following command: + +``` +HELP CREATE MATERIALIZED VIEW +``` + +### Support aggregate functions + +The aggregate functions currently supported by the materialized view function are: + ++ SUM, MIN, MAX (Version 0.12) ++ COUNT, BITMAP\_UNION, HLL\_UNION (Version 0.13) + +### Update strategy + +In order to ensure the data consistency between the materialized view table and the Base table, Doris will import, delete and other operations on the Base table are synchronized to the materialized view table. And through incremental update to improve update efficiency. To ensure atomicity through transaction. + +For example, if the user inserts data into the base table through the INSERT command, this data will be inserted into the materialized view synchronously. When both the base table and the materialized view table are written successfully, the INSERT command will return successfully. To + +### Query automatic matching + +After the materialized view is successfully created, the user's query does not need to be changed, that is, it is still the base table of the query. Doris will automatically select an optimal materialized view based on the current query statement, read data from the materialized view and calculate it. + +Users can use the EXPLAIN command to check whether the current query uses a materialized view. + +The matching relationship between the aggregation in the materialized view and the aggregation in the query: + +| Materialized View Aggregation | Query Aggregation | +| ---------- | -------- | +| sum | sum | +| min | min | +| max | max | +| count | count | +| bitmap\_union | bitmap\_union, bitmap\_union\_count, count( distinct) | +| hll\_union | hll\_raw\_agg, hll\_union\_agg, ndv() | + +After the aggregation functions of bitmap and hll match the materialized view in the query, the aggregation operator of the query will be rewritten according to the table structure of the materialized view. See example 2 for details. + +### Query materialized views + +Check what materialized views the current table has, and what their table structure is. Through the following command: + +``` +MySQL [test]> desc mv_test all; ++-----------+---------------+-----------------+----------+------+-------+---------+--------------+ +| IndexName | IndexKeysType | Field | Type | Null | Key | Default | Extra | ++-----------+---------------+-----------------+----------+------+-------+---------+--------------+ +| mv_test | DUP_KEYS | k1 | INT | Yes | true | NULL | | +| | | k2 | BIGINT | Yes | true | NULL | | +| | | k3 | LARGEINT | Yes | true | NULL | | +| | | k4 | SMALLINT | Yes | false | NULL | NONE | +| | | | | | | | | +| mv_2 | AGG_KEYS | k2 | BIGINT | Yes | true | NULL | | +| | | k4 | SMALLINT | Yes | false | NULL | MIN | +| | | k1 | INT | Yes | false | NULL | MAX | +| | | | | | | | | +| mv_3 | AGG_KEYS | k1 | INT | Yes | true | NULL | | +| | | to_bitmap(`k2`) | BITMAP | No | false | | BITMAP_UNION | +| | | | | | | | | +| mv_1 | AGG_KEYS | k4 | SMALLINT | Yes | true | NULL | | +| | | k1 | BIGINT | Yes | false | NULL | SUM | +| | | k3 | LARGEINT | Yes | false | NULL | SUM | +| | | k2 | BIGINT | Yes | false | NULL | MIN | ++-----------+---------------+-----------------+----------+------+-------+---------+--------------+ +``` + +You can see that the current `mv_test` table has three materialized views: mv_1, mv_2 and mv_3, and their table structure. + +### Delete materialized view + +If the user no longer needs the materialized view, you can delete the materialized view with the following command: + +``` +DROP MATERIALIZED VIEW +``` + +The specific syntax can be viewed through the following command: + +``` +HELP DROP MATERIALIZED VIEW +``` + +## Best Practice 1 + +The use of materialized views is generally divided into the following steps: + +1. Create a materialized view +2. Asynchronously check whether the materialized view has been constructed +3. Query and automatically match materialized views + +**First is the first step: Create a materialized view** + +Assume that the user has a sales record list, which stores the transaction id, salesperson, sales store, sales time, and amount of each transaction. The table building statement is: + +``` +create table sales_records(record_id int, seller_id int, store_id int, sale_date date, sale_amt bigint) distributed by hash(record_id) properties("replication_num" = "1"); +``` +The table structure of this `sales_records` is as follows: + +``` +MySQL [test]> desc sales_records; ++-----------+--------+------+-------+---------+--- ----+ +| Field | Type | Null | Key | Default | Extra | ++-----------+--------+------+-------+---------+--- ----+ +| record_id | INT | Yes | true | NULL | | +| seller_id | INT | Yes | true | NULL | | +| store_id | INT | Yes | true | NULL | | +| sale_date | DATE | Yes | false | NULL | NONE | +| sale_amt | BIGINT | Yes | false | NULL | NONE | ++-----------+--------+------+-------+---------+--- ----+ +``` + +At this time, if the user often performs an analysis query on the sales volume of different stores, you can create a materialized view for the `sales_records` table to group the sales stores and sum the sales of the same sales stores. The creation statement is as follows: + +``` +MySQL [test]> create materialized view store_amt as select store_id, sum(sale_amt) from sales_records group by store_id; +``` + +The backend returns to the following figure, indicating that the task of creating a materialized view is submitted successfully. + +``` +Query OK, 0 rows affected (0.012 sec) +``` + +**Step 2: Check whether the materialized view has been built** + +Since the creation of a materialized view is an asynchronous operation, after the user submits the task of creating a materialized view, he needs to asynchronously check whether the materialized view has been constructed through a command. The command is as follows: + +``` +SHOW ALTER TABLE ROLLUP FROM db_name; +``` + +In this command, `db_name` is a parameter, you need to replace it with your real db name. The result of the command is to display all the tasks of creating a materialized view of this db. The results are as follows: + +``` ++-------+---------------+---------------------+--- ------------------+---------------+--------------- --+----------+---------------+-----------+-------- -------------------------------------------------- -------------------------------------------------- -------------+----------+---------+ +| JobId | TableName | CreateTime | FinishedTime | BaseIndexName | RollupIndexName | RollupId | TransactionId | State | Msg | Progress | Timeout | ++-------+---------------+---------------------+--- ------------------+---------------+--------------- --+----------+---------------+-----------+-------- -------------------------------------------------- -------------------------------------------------- -------------+----------+---------+ +| 22036 | sales_records | 2020-07-30 20:04:28 | 2020-07-30 20:04:57 | sales_records | store_amt | 22037 | 5008 | FINISHED | | NULL | 86400 | ++-------+---------------+---------------------+--- ------------------+---------------+--------------- --+----------+---------------+-----------+-------- ---------------------------------------- + +``` + +Among them, TableName refers to which table the data of the materialized view comes from, and RollupIndexName refers to the name of the materialized view. One of the more important indicators is State. + +When the State of the task of creating a materialized view has become FINISHED, it means that the materialized view has been created successfully. This means that it is possible to automatically match this materialized view when querying. + +**Step 3: Query** + +After the materialized view is created, when users query the sales volume of different stores, they will directly read the aggregated data from the materialized view `store_amt` just created. To achieve the effect of improving query efficiency. + +The user's query still specifies the query `sales_records` table, for example: + +``` +SELECT store_id, sum(sale_amt) FROM sales_records GROUP BY store_id; +``` + +The above query will automatically match `store_amt`. The user can use the following command to check whether the current query matches the appropriate materialized view. + +``` +EXPLAIN SELECT store_id, sum(sale_amt) FROM sales_records GROUP BY store_id; ++-----------------------------------------------------------------------------+ +| Explain String | ++-----------------------------------------------------------------------------+ +| PLAN FRAGMENT 0 | +| OUTPUT EXPRS: `store_id` | sum(`sale_amt`) | +| PARTITION: UNPARTITIONED | +| | +| RESULT SINK | +| | +| 4:EXCHANGE | +| | +| PLAN FRAGMENT 1 | +| OUTPUT EXPRS: | +| PARTITION: HASH_PARTITIONED: `store_id` | +| | +| STREAM DATA SINK | +| EXCHANGE ID: 04 | +| UNPARTITIONED | +| | +| 3:AGGREGATE (merge finalize) | +| | output: sum( sum(`sale_amt`)) | +| | group by: `store_id` | +| | | +| 2:EXCHANGE | +| | +| PLAN FRAGMENT 2 | +| OUTPUT EXPRS: | +| PARTITION: RANDOM | +| | +| STREAM DATA SINK | +| EXCHANGE ID: 02 | +| HASH_PARTITIONED: `store_id` | +| | +| 1:AGGREGATE (update serialize) | +| | STREAMING | +| | output: sum(`sale_amt`) | +| | group by: `store_id` | +| | | +| 0:OlapScanNode | +| TABLE: sales_records | +| PREAGGREGATION: ON | +| partitions=1/1 | +| rollup: store_amt | +| tabletRatio=10/10 | +| tabletList=22038,22040,22042,22044,22046,22048,22050,22052,22054,22056 | +| cardinality=0 | +| avgRowSize=0.0 | +| numNodes=1 | ++-----------------------------------------------------------------------------+ +45 rows in set (0.006 sec) +``` +The final thing is the rollup attribute in OlapScanNode. You can see that the rollup of the current query shows `store_amt`. That is to say, the query has been correctly matched to the materialized view `store_amt`, and data is read directly from the materialized view. + +## Best Practice 2 PV,UV + +Business scenario: Calculate the UV and PV of advertising + +Assuming that the user's original ad click data is stored in Doris, then for ad PV and UV queries, the query speed can be improved by creating a materialized view of `bitmap_union`. + +Use the following statement to first create a table that stores the details of the advertisement click data, including the click event of each click, what advertisement was clicked, what channel clicked, and who was the user who clicked. + +``` +MySQL [test]> create table advertiser_view_record(time date, advertiser varchar(10), channel varchar(10), user_id int) distributed by hash(time) properties("replication_num" = "1"); +Query OK, 0 rows affected (0.014 sec) +``` +The original ad click data table structure is: + +``` +MySQL [test]> desc advertiser_view_record; ++------------+-------------+------+-------+---------+-------+ +| Field | Type | Null | Key | Default | Extra | ++------------+-------------+------+-------+---------+-------+ +| time | DATE | Yes | true | NULL | | +| advertiser | VARCHAR(10) | Yes | true | NULL | | +| channel | VARCHAR(10) | Yes | false | NULL | NONE | +| user_id | INT | Yes | false | NULL | NONE | ++------------+-------------+------+-------+---------+-------+ +4 rows in set (0.001 sec) +``` + +1. Create a materialized view + + Since the user wants to query the UV value of the advertisement, that is, a precise de-duplication of users of the same advertisement is required, the user's query is generally: + + ``` + SELECT advertiser, channel, count(distinct user_id) FROM advertiser_view_record GROUP BY advertiser, channel; + ``` + + For this kind of UV-seeking scene, we can create a materialized view with `bitmap_union` to achieve a precise deduplication effect in advance. + + In Doris, the result of `count(distinct)` aggregation is exactly the same as the result of `bitmap_union_count` aggregation. And `bitmap_union_count` is equal to the result of `bitmap_union` to calculate count, so if the query ** involves `count(distinct)`, you can speed up the query by creating a materialized view with `bitmap_union` aggregation.** + + For this case, you can create a materialized view that accurately deduplicates `user_id` based on advertising and channel grouping. + + ``` + MySQL [test]> create materialized view advertiser_uv as select advertiser, channel, bitmap_union(to_bitmap(user_id)) from advertiser_view_record group by advertiser, channel; + Query OK, 0 rows affected (0.012 sec) + ``` + + *Note: Because the user\_id itself is an INT type, it is called `bitmap_union` directly in Doris. The fields need to be converted to bitmap type through the function `to_bitmap` first, and then `bitmap_union` can be aggregated. * + + After the creation is complete, the table structure of the advertisement click schedule and the materialized view table is as follows: + + ``` + MySQL [test]> desc advertiser_view_record all; + +------------------------+---------------+----------------------+-------------+------+-------+---------+--------------+ + | IndexName | IndexKeysType | Field | Type | Null | Key | Default | Extra | + +------------------------+---------------+----------------------+-------------+------+-------+---------+--------------+ + | advertiser_view_record | DUP_KEYS | time | DATE | Yes | true | NULL | | + | | | advertiser | VARCHAR(10) | Yes | true | NULL | | + | | | channel | VARCHAR(10) | Yes | false | NULL | NONE | + | | | user_id | INT | Yes | false | NULL | NONE | + | | | | | | | | | + | advertiser_uv | AGG_KEYS | advertiser | VARCHAR(10) | Yes | true | NULL | | + | | | channel | VARCHAR(10) | Yes | true | NULL | | + | | | to_bitmap(`user_id`) | BITMAP | No | false | | BITMAP_UNION | + +------------------------+---------------+----------------------+-------------+------+-------+---------+--------------+ + ``` + +2. Automatic query matching + + When the materialized view table is created, when querying the advertisement UV, Doris will automatically query the data from the materialized view `advertiser_uv` just created. For example, the original query statement is as follows: + + ``` + SELECT advertiser, channel, count(distinct user_id) FROM advertiser_view_record GROUP BY advertiser, channel; + ``` + + After the materialized view is selected, the actual query will be transformed into: + + ``` + SELECT advertiser, channel, bitmap_union_count(to_bitmap(user_id)) FROM advertiser_uv GROUP BY advertiser, channel; + ``` + + Through the EXPLAIN command, you can check whether Doris matches the materialized view: + + ``` + MySQL [test]> explain SELECT advertiser, channel, count(distinct user_id) FROM advertiser_view_record GROUP BY advertiser, channel; + +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Explain String | + +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | PLAN FRAGMENT 0 | + | OUTPUT EXPRS: `advertiser` | `channel` | bitmap_union_count(`default_cluster:test`.`advertiser_view_record`.`mv_bitmap_union_user_id`) | + | PARTITION: UNPARTITIONED | + | | + | RESULT SINK | + | | + | 4:EXCHANGE | + | | + | PLAN FRAGMENT 1 | + | OUTPUT EXPRS: | + | PARTITION: HASH_PARTITIONED: `advertiser`, `channel` | + | | + | STREAM DATA SINK | + | EXCHANGE ID: 04 | + | UNPARTITIONED | + | | + | 3:AGGREGATE (merge finalize) | + | | output: bitmap_union_count( bitmap_union_count(`default_cluster:test`.`advertiser_view_record`.`mv_bitmap_union_user_id`)) | + | | group by: `advertiser`, `channel` | + | | | + | 2:EXCHANGE | + | | + | PLAN FRAGMENT 2 | + | OUTPUT EXPRS: | + | PARTITION: RANDOM | + | | + | STREAM DATA SINK | + | EXCHANGE ID: 02 | + | HASH_PARTITIONED: `advertiser`, `channel` | + | | + | 1:AGGREGATE (update serialize) | + | | STREAMING | + | | output: bitmap_union_count(`default_cluster:test`.`advertiser_view_record`.`mv_bitmap_union_user_id`) | + | | group by: `advertiser`, `channel` | + | | | + | 0:OlapScanNode | + | TABLE: advertiser_view_record | + | PREAGGREGATION: ON | + | partitions=1/1 | + | rollup: advertiser_uv | + | tabletRatio=10/10 | + | tabletList=22084,22086,22088,22090,22092,22094,22096,22098,22100,22102 | + | cardinality=0 | + | avgRowSize=0.0 | + | numNodes=1 | + +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + 45 rows in set (0.030 sec) + ``` + + In the result of EXPLAIN, you can first see that the rollup attribute value of OlapScanNode is advertiser_uv. In other words, the query directly scans the data of the materialized view. The match is successful. + + Secondly, the calculation of `count(distinct)` for the `user_id` field is rewritten as `bitmap_union_count`. That is to achieve the effect of precise deduplication through bitmap. + + +## Best Practice 3 + +Business scenario: matching a richer prefix index + +The user's original table has three columns (k1, k2, k3). Among them, k1, k2 are prefix index columns. At this time, if the user query condition contains `where k1=a and k2=b`, the query can be accelerated through the index. + +But in some cases, the user's filter conditions cannot match the prefix index, such as `where k3=c`. Then the query speed cannot be improved through the index. + +This problem can be solved by creating a materialized view with k3 as the first column. + +1. Create a materialized view + + ``` + CREATE MATERIALIZED VIEW mv_1 as SELECT k3, k2, k1 FROM tableA ORDER BY k3; + ``` + + After the creation of the above grammar is completed, the complete detail data is retained in the materialized view, and the prefix index of the materialized view is the k3 column. The table structure is as follows: + + ``` + MySQL [test]> desc tableA all; + +-----------+---------------+-------+------+------+-------+---------+-------+ + | IndexName | IndexKeysType | Field | Type | Null | Key | Default | Extra | + +-----------+---------------+-------+------+------+-------+---------+-------+ + | tableA | DUP_KEYS | k1 | INT | Yes | true | NULL | | + | | | k2 | INT | Yes | true | NULL | | + | | | k3 | INT | Yes | true | NULL | | + | | | | | | | | | + | mv_1 | DUP_KEYS | k3 | INT | Yes | true | NULL | | + | | | k2 | INT | Yes | false | NULL | NONE | + | | | k1 | INT | Yes | false | NULL | NONE | + +-----------+---------------+-------+------+------+-------+---------+-------+ + ``` + +2. Query matching + + At this time, if the user's query has k3 column, the filter condition is, for example: + + ``` + select k1, k2, k3 from table A where k3=1; + ``` + + At this time, the query will read data directly from the mv_1 materialized view just created. The materialized view has a prefix index on k3, and query efficiency will also be improved. + + +## Limitations + +1. The parameter of the aggregate function of the materialized view does not support the expression only supports a single column, for example: sum(a+b) does not support. +2. If the conditional column of the delete statement does not exist in the materialized view, the delete operation cannot be performed. If you must delete data, you need to delete the materialized view before deleting the data. +3. Too many materialized views on a single table will affect the efficiency of importing: When importing data, the materialized view and base table data are updated synchronously. If a table has more than 10 materialized view tables, it may cause the import speed to be very high. slow. This is the same as a single import needs to import 10 tables at the same time. diff --git a/docs/zh-CN/administrator-guide/materialized_view.md b/docs/zh-CN/administrator-guide/materialized_view.md new file mode 100644 index 00000000000000..4ba0835acbe7cb --- /dev/null +++ b/docs/zh-CN/administrator-guide/materialized_view.md @@ -0,0 +1,490 @@ +--- +{ + "title": "物化视图", + "language": "zh-CN" +} +--- + + + +# 物化视图 + +物化视图是将预先计算(根据定义好的 SELECT 语句)好的数据集,存储在 Doris 中的一个特殊的表。 + +物化视图的出现主要是为了满足用户,既能对原始明细数据的任意维度分析,也能快速的对固定维度进行分析查询。 + +## 适用场景 + ++ 分析需求覆盖明细数据查询以及固定维度查询两方面。 ++ 查询仅涉及表中的很小一部分列或行。 ++ 查询包含一些耗时处理操作,比如:时间很久的聚合操作等。 ++ 查询需要匹配不同前缀索引。 + +## 优势 + ++ 对于那些经常重复的使用相同的子查询结果的查询性能大幅提升 ++ Doris自动维护物化视图的数据,无论是新的导入,还是删除操作都能保证base 表和物化视图表的数据一致性。无需任何额外的人工维护成本。 ++ 查询时,会自动匹配到最优物化视图,并直接从物化视图中读取数据。 + +*自动维护物化视图的数据会造成一些维护开销,会在后面的物化视图的局限性中展开说明。* + +## 物化视图 VS Rollup + +在没有物化视图功能之前,用户一般都是使用 Rollup 功能通过预聚合方式提升查询效率的。但是 Rollup 具有一定的局限性,他不能基于明细模型做预聚合。 + +物化视图则在覆盖了 Rollup 的功能的同时,还能支持更丰富的聚合函数。所以物化视图其实是 Rollup 的一个超集。 + +也就是说,之前 `ALTER TABLE ADD ROLLUP` 语法支持的功能现在均可以通过 `CREATE MATERIALIZED VIEW` 实现。 + +## 使用物化视图 + +Doris 系统提供了一整套对物化视图的 DDL 语法,包括创建,查看,删除。DDL 的语法和 PostgreSQL, Oracle都是一致的。 + +### 创建物化视图 + +这里首先你要根据你的查询语句的特点来决定创建一个什么样的物化视图。这里并不是说你的物化视图定义和你的某个查询语句一模一样就最好。这里有两个原则: + +1. 从查询语句中**抽象**出,多个查询共有的分组和聚合方式作为物化视图的定义。 +2. 不需要给所有维度组合都创建物化视图。 + +首先第一个点,一个物化视图如果抽象出来,并且多个查询都可以匹配到这张物化视图。这种物化视图效果最好。因为物化视图的维护本身也需要消耗资源。 + +如果物化视图只和某个特殊的查询很贴合,而其他查询均用不到这个物化视图。则会导致这张物化视图的性价比不高,既占用了集群的存储资源,还不能为更多的查询服务。 + +所以用户需要结合自己的查询语句,以及数据维度信息去抽象出一些物化视图的定义。 + +第二点就是,在实际的分析查询中,并不会覆盖到所有的维度分析。所以给常用的维度组合创建物化视图即可,从而到达一个空间和时间上的平衡。 + +通过下面命令就可以创建物化视图了。创建物化视图是一个异步的操作,也就是说用户成功提交创建任务后,Doris 会在后台对存量的数据进行计算,直到创建成功。 + +``` +CREATE MATERIALIZED VIEW +``` + +具体的语法可以通过下面命令查看: + +``` +HELP CREATE MATERIALIZED VIEW +``` + +### 支持聚合函数 + +目前物化视图功能支持的聚合函数有: + ++ SUM, MIN, MAX (Version 0.12) ++ COUNT, BITMAP\_UNION, HLL\_UNION (Version 0.13) + +### 更新策略 + +为保证物化视图表和 Base 表的数据一致性, Doris 会将导入,删除等对 base 表的操作都同步到物化视图表中。并且通过增量更新的方式来提升更新效率。通过事务方式来保证原子性。 + +比如如果用户通过 INSERT 命令插入数据到 base 表中,则这条数据会同步插入到物化视图中。当 base 表和物化视图表均写入成功后,INSERT 命令才会成功返回。 + +### 查询自动匹配 + +物化视图创建成功后,用户的查询不需要发生任何改变,也就是还是查询的 base 表。Doris 会根据当前查询的语句去自动选择一个最优的物化视图,从物化视图中读取数据并计算。 + +用户可以通过 EXPLAIN 命令来检查当前查询是否使用了物化视图。 + +物化视图中的聚合和查询中聚合的匹配关系: + +| 物化视图聚合 | 查询中聚合 | +| ---------- | -------- | +| sum | sum | +| min | min | +| max | max | +| count | count | +| bitmap\_union | bitmap\_union, bitmap\_union\_count, count(distinct) | +| hll\_union | hll\_raw\_agg, hll\_union\_agg, ndv() | + +其中 bitmap 和 hll 的聚合函数在查询匹配到物化视图后,查询的聚合算子会根据物化视图的表结构进行一个改写。详细见实例2。 + +### 查询物化视图 + +查看当前表都有哪些物化视图,以及他们的表结构都是什么样的。通过下面命令: + +``` +MySQL [test]> desc mv_test all; ++-----------+---------------+-----------------+----------+------+-------+---------+--------------+ +| IndexName | IndexKeysType | Field | Type | Null | Key | Default | Extra | ++-----------+---------------+-----------------+----------+------+-------+---------+--------------+ +| mv_test | DUP_KEYS | k1 | INT | Yes | true | NULL | | +| | | k2 | BIGINT | Yes | true | NULL | | +| | | k3 | LARGEINT | Yes | true | NULL | | +| | | k4 | SMALLINT | Yes | false | NULL | NONE | +| | | | | | | | | +| mv_2 | AGG_KEYS | k2 | BIGINT | Yes | true | NULL | | +| | | k4 | SMALLINT | Yes | false | NULL | MIN | +| | | k1 | INT | Yes | false | NULL | MAX | +| | | | | | | | | +| mv_3 | AGG_KEYS | k1 | INT | Yes | true | NULL | | +| | | to_bitmap(`k2`) | BITMAP | No | false | | BITMAP_UNION | +| | | | | | | | | +| mv_1 | AGG_KEYS | k4 | SMALLINT | Yes | true | NULL | | +| | | k1 | BIGINT | Yes | false | NULL | SUM | +| | | k3 | LARGEINT | Yes | false | NULL | SUM | +| | | k2 | BIGINT | Yes | false | NULL | MIN | ++-----------+---------------+-----------------+----------+------+-------+---------+--------------+ +``` + +可以看到当前 `mv_test` 表一共有三张物化视图:mv_1, mv_2 和 mv_3,以及他们的表结构。 + +### 删除物化视图 + +如果用户不再需要物化视图,则可以通过下面命令删除物化视图: + +``` +DROP MATERIALIZED VIEW +``` + +具体的语法可以通过下面命令查看: + +``` +HELP DROP MATERIALIZED VIEW +``` + +## 最佳实践1 + +使用物化视图一般分为一下几个步骤: + +1. 创建物化视图 +2. 异步检查物化视图是否构建完成 +3. 查询并自动匹配物化视图 + +**首先是第一步:创建物化视图** + +假设用户有一张销售记录明细表,存储了每个交易的交易id,销售员,售卖门店,销售时间,以及金额。建表语句为: + +``` +create table sales_records(record_id int, seller_id int, store_id int, sale_date date, sale_amt bigint) distributed by hash(record_id) properties("replication_num" = "1"); +``` +这张 `sales_records` 的表结构如下: + +``` +MySQL [test]> desc sales_records; ++-----------+--------+------+-------+---------+-------+ +| Field | Type | Null | Key | Default | Extra | ++-----------+--------+------+-------+---------+-------+ +| record_id | INT | Yes | true | NULL | | +| seller_id | INT | Yes | true | NULL | | +| store_id | INT | Yes | true | NULL | | +| sale_date | DATE | Yes | false | NULL | NONE | +| sale_amt | BIGINT | Yes | false | NULL | NONE | ++-----------+--------+------+-------+---------+-------+ +``` + +这时候如果用户经常对不同门店的销售量进行一个分析查询,则可以给这个 `sales_records` 表创建一张以售卖门店分组,对相同售卖门店的销售额求和的一个物化视图。创建语句如下: + +``` +MySQL [test]> create materialized view store_amt as select store_id, sum(sale_amt) from sales_records group by store_id; +``` + +后端返回下图,则说明创建物化视图任务提交成功。 + +``` +Query OK, 0 rows affected (0.012 sec) +``` + +**第二步:检查物化视图是否构建完成** + +由于创建物化视图是一个异步的操作,用户在提交完创建物化视图任务后,需要异步的通过命令检查物化视图是否构建完成。命令如下: + +``` +SHOW ALTER TABLE ROLLUP FROM db_name; +``` + +这个命令中 `db_name` 是一个参数, 你需要替换成自己真实的 db 名称。命令的结果是显示这个 db 的所有创建物化视图的任务。结果如下: + +``` ++-------+---------------+---------------------+---------------------+---------------+-----------------+----------+---------------+-----------+-------------------------------------------------------------------------------------------------------------------------+----------+---------+ +| JobId | TableName | CreateTime | FinishedTime | BaseIndexName | RollupIndexName | RollupId | TransactionId | State | Msg | Progress | Timeout | ++-------+---------------+---------------------+---------------------+---------------+-----------------+----------+---------------+-----------+-------------------------------------------------------------------------------------------------------------------------+----------+---------+ +| 22036 | sales_records | 2020-07-30 20:04:28 | 2020-07-30 20:04:57 | sales_records | store_amt | 22037 | 5008 | FINISHED | | NULL | 86400 | ++-------+---------------+---------------------+---------------------+---------------+-----------------+----------+---------------+-----------+-------------------------------------------------------------------------------------------------------------------------+----------+---------+ +``` + +其中 TableName 指的是物化视图的数据来自于哪个表,RollupIndexName 指的是物化视图的名称叫什么。其中比较重要的指标是 State。 + +当创建物化视图任务的 State 已经变成 FINISHED 后,就说明这个物化视图已经创建成功了。这就意味着,查询的时候有可能自动匹配到这张物化视图了。 + +**第三步:查询** + +当创建完成物化视图后,用户再查询不同门店的销售量时,就会直接从刚才创建的物化视图 `store_amt` 中读取聚合好的数据。达到提升查询效率的效果。 + +用户的查询依旧指定查询 `sales_records` 表,比如: + +``` +SELECT store_id, sum(sale_amt) FROM sales_records GROUP BY store_id; +``` + +上面查询就能自动匹配到 `store_amt`。用户可以通过下面命令,检验当前查询是否匹配到了合适的物化视图。 + +``` +EXPLAIN SELECT store_id, sum(sale_amt) FROM sales_records GROUP BY store_id; ++-----------------------------------------------------------------------------+ +| Explain String | ++-----------------------------------------------------------------------------+ +| PLAN FRAGMENT 0 | +| OUTPUT EXPRS: `store_id` | sum(`sale_amt`) | +| PARTITION: UNPARTITIONED | +| | +| RESULT SINK | +| | +| 4:EXCHANGE | +| | +| PLAN FRAGMENT 1 | +| OUTPUT EXPRS: | +| PARTITION: HASH_PARTITIONED: `store_id` | +| | +| STREAM DATA SINK | +| EXCHANGE ID: 04 | +| UNPARTITIONED | +| | +| 3:AGGREGATE (merge finalize) | +| | output: sum( sum(`sale_amt`)) | +| | group by: `store_id` | +| | | +| 2:EXCHANGE | +| | +| PLAN FRAGMENT 2 | +| OUTPUT EXPRS: | +| PARTITION: RANDOM | +| | +| STREAM DATA SINK | +| EXCHANGE ID: 02 | +| HASH_PARTITIONED: `store_id` | +| | +| 1:AGGREGATE (update serialize) | +| | STREAMING | +| | output: sum(`sale_amt`) | +| | group by: `store_id` | +| | | +| 0:OlapScanNode | +| TABLE: sales_records | +| PREAGGREGATION: ON | +| partitions=1/1 | +| rollup: store_amt | +| tabletRatio=10/10 | +| tabletList=22038,22040,22042,22044,22046,22048,22050,22052,22054,22056 | +| cardinality=0 | +| avgRowSize=0.0 | +| numNodes=1 | ++-----------------------------------------------------------------------------+ +45 rows in set (0.006 sec) +``` + +其中最重要的就是 OlapScanNode 中的 rollup 属性。可以看到当前查询的 rollup 显示的是 `store_amt`。也就是说查询已经正确匹配到物化视图 `store_amt`, 并直接从物化视图中读取数据了。 + +## 最佳实践2 PV,UV + +业务场景: 计算广告的 UV,PV + +假设用户的原始广告点击数据存储在 Doris,那么针对广告 PV, UV 查询就可以通过创建 `bitmap_union` 的物化视图来提升查询速度。 + +通过下面语句首先创建一个存储广告点击数据明细的表,包含每条点击的点击事件,点击的是什么广告,通过什么渠道点击,以及点击的用户是谁。 + +``` +MySQL [test]> create table advertiser_view_record(time date, advertiser varchar(10), channel varchar(10), user_id int) distributed by hash(time) properties("replication_num" = "1"); +Query O +K, 0 rows affected (0.014 sec) +``` +原始的广告点击数据表结构为: + +``` +MySQL [test]> desc advertiser_view_record; ++------------+-------------+------+-------+---------+-------+ +| Field | Type | Null | Key | Default | Extra | ++------------+-------------+------+-------+---------+-------+ +| time | DATE | Yes | true | NULL | | +| advertiser | VARCHAR(10) | Yes | true | NULL | | +| channel | VARCHAR(10) | Yes | false | NULL | NONE | +| user_id | INT | Yes | false | NULL | NONE | ++------------+-------------+------+-------+---------+-------+ +4 rows in set (0.001 sec) +``` + +1. 创建物化视图 + + 由于用户想要查询的是广告的 UV 值,也就是需要对相同广告的用户进行一个精确去重,则查询一般为: + + ``` + SELECT advertiser, channel, count(distinct user_id) FROM  advertiser_view_record GROUP BY advertiser, channel; + ``` + + 针对这种求 UV 的场景,我们就可以创建一个带 `bitmap_union` 的物化视图从而达到一个预先精确去重的效果。 + + 在 Doris 中,`count(distinct)` 聚合的结果和 `bitmap_union_count`聚合的结果是完全一致的。而`bitmap_union_count` 等于 `bitmap_union` 的结果求 count, 所以如果查询中**涉及到 `count(distinct)` 则通过创建带 `bitmap_union` 聚合的物化视图方可加快查询**。 + + 针对这个 case,则可以创建一个根据广告和渠道分组,对 `user_id` 进行精确去重的物化视图。 + + ``` + MySQL [test]> create materialized view advertiser_uv as select advertiser, channel, bitmap_union(to_bitmap(user_id)) from advertiser_view_record group by advertiser, channel; + Query OK, 0 rows affected (0.012 sec) + ``` + + *注意:因为本身 user\_id 是一个 INT 类型,所以在 Doris 中需要先将字段通过函数 `to_bitmap` 转换为 bitmap 类型然后才可以进行 `bitmap_union` 聚合。* + + 创建完成后, 广告点击明细表和物化视图表的表结构如下: + + ``` + MySQL [test]> desc advertiser_view_record all; + +------------------------+---------------+----------------------+-------------+------+-------+---------+--------------+ + | IndexName | IndexKeysType | Field | Type | Null | Key | Default | Extra | + +------------------------+---------------+----------------------+-------------+------+-------+---------+--------------+ + | advertiser_view_record | DUP_KEYS | time | DATE | Yes | true | NULL | | + | | | advertiser | VARCHAR(10) | Yes | true | NULL | | + | | | channel | VARCHAR(10) | Yes | false | NULL | NONE | + | | | user_id | INT | Yes | false | NULL | NONE | + | | | | | | | | | + | advertiser_uv | AGG_KEYS | advertiser | VARCHAR(10) | Yes | true | NULL | | + | | | channel | VARCHAR(10) | Yes | true | NULL | | + | | | to_bitmap(`user_id`) | BITMAP | No | false | | BITMAP_UNION | + +------------------------+---------------+----------------------+-------------+------+-------+---------+--------------+ + ``` + +2. 查询自动匹配 + + 当物化视图表创建完成后,查询广告 UV 时,Doris就会自动从刚才创建好的物化视图 `advertiser_uv ` 中查询数据。比如原始的查询语句如下: + + ``` + SELECT advertiser, channel, count(distinct user_id) FROM  advertiser_view_record GROUP BY advertiser, channel; + ``` + + 在选中物化视图后,实际的查询会转化为: + + ``` + SELECT advertiser, channel, bitmap_union_count(to_bitmap(user_id)) FROM advertiser_uv GROUP BY advertiser, channel; + ``` + + 通过 EXPLAIN 命令可以检验到 Doris 是否匹配到了物化视图: + + ``` + MySQL [test]> explain SELECT advertiser, channel, count(distinct user_id) FROM advertiser_view_record GROUP BY advertiser, channel; + +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Explain String | + +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | PLAN FRAGMENT 0 | + | OUTPUT EXPRS: `advertiser` | `channel` | bitmap_union_count(`default_cluster:test`.`advertiser_view_record`.`mv_bitmap_union_user_id`) | + | PARTITION: UNPARTITIONED | + | | + | RESULT SINK | + | | + | 4:EXCHANGE | + | | + | PLAN FRAGMENT 1 | + | OUTPUT EXPRS: | + | PARTITION: HASH_PARTITIONED: `advertiser`, `channel` | + | | + | STREAM DATA SINK | + | EXCHANGE ID: 04 | + | UNPARTITIONED | + | | + | 3:AGGREGATE (merge finalize) | + | | output: bitmap_union_count( bitmap_union_count(`default_cluster:test`.`advertiser_view_record`.`mv_bitmap_union_user_id`)) | + | | group by: `advertiser`, `channel` | + | | | + | 2:EXCHANGE | + | | + | PLAN FRAGMENT 2 | + | OUTPUT EXPRS: | + | PARTITION: RANDOM | + | | + | STREAM DATA SINK | + | EXCHANGE ID: 02 | + | HASH_PARTITIONED: `advertiser`, `channel` | + | | + | 1:AGGREGATE (update serialize) | + | | STREAMING | + | | output: bitmap_union_count(`default_cluster:test`.`advertiser_view_record`.`mv_bitmap_union_user_id`) | + | | group by: `advertiser`, `channel` | + | | | + | 0:OlapScanNode | + | TABLE: advertiser_view_record | + | PREAGGREGATION: ON | + | partitions=1/1 | + | rollup: advertiser_uv | + | tabletRatio=10/10 | + | tabletList=22084,22086,22088,22090,22092,22094,22096,22098,22100,22102 | + | cardinality=0 | + | avgRowSize=0.0 | + | numNodes=1 | + +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + 45 rows in set (0.030 sec) + ``` + + 在 EXPLAIN 的结果中,首先可以看到 OlapScanNode 的 rollup 属性值为 advertiser_uv。也就是说,查询会直接扫描物化视图的数据。说明匹配成功。 + + 其次对于 `user_id` 字段求 `count(distinct)` 被改写为求 `bitmap_union_count(to_bitmap)`。也就是通过 bitmap 的方式来达到精确去重的效果。 + + +## 最佳实践3 + +业务场景:匹配更丰富的前缀索引 + +用户的原始表有 (k1, k2, k3) 三列。其中 k1, k2 为前缀索引列。这时候如果用户查询条件中包含 `where k1=1 and k2=2` 就能通过索引加速查询。 + +但是有些情况下,用户的过滤条件无法匹配到前缀索引,比如 `where k3=3`。则无法通过索引提升查询速度。 + +创建以 k3 作为第一列的物化视图就可以解决这个问题。 + +1. 创建物化视图 + + ``` + CREATE MATERIALIZED VIEW mv_1 as SELECT k3, k2, k1 FROM tableA ORDER BY k3; + ``` + + 通过上面语法创建完成后,物化视图中既保留了完整的明细数据,且物化视图的前缀索引为 k3 列。表结构如下: + + ``` + MySQL [test]> desc tableA all; + +-----------+---------------+-------+------+------+-------+---------+-------+ + | IndexName | IndexKeysType | Field | Type | Null | Key | Default | Extra | + +-----------+---------------+-------+------+------+-------+---------+-------+ + | tableA | DUP_KEYS | k1 | INT | Yes | true | NULL | | + | | | k2 | INT | Yes | true | NULL | | + | | | k3 | INT | Yes | true | NULL | | + | | | | | | | | | + | mv_1 | DUP_KEYS | k3 | INT | Yes | true | NULL | | + | | | k2 | INT | Yes | false | NULL | NONE | + | | | k1 | INT | Yes | false | NULL | NONE | + +-----------+---------------+-------+------+------+-------+---------+-------+ + ``` + +2. 查询匹配 + + 这时候如果用户的查询存在 k3 列的过滤条件是,比如: + + ``` + select k1, k2, k3 from table A where k3=3; + ``` + + 这时候查询就会直接从刚才创建的 mv_1 物化视图中读取数据。物化视图对 k3 是存在前缀索引的,查询效率也会提升。 + + +## 局限性 + +1. 物化视图的聚合函数的参数不支持表达式仅支持单列,比如: sum(a+b)不支持。 +2. 如果删除语句的条件列,在物化视图中不存在,则不能进行删除操作。如果一定要删除数据,则需要先将物化视图删除,然后方可删除数据。 +3. 单表上过多的物化视图会影响导入的效率:导入数据时,物化视图和 base 表数据是同步更新的,如果一张表的物化视图表超过10张,则有可能导致导入速度很慢。这就像单次导入需要同时导入10张表数据是一样的。 + + + + + From e5b18c92ce20b22051632d14255349a0e3c9a601 Mon Sep 17 00:00:00 2001 From: emmymiao87 <522274284@qq.com> Date: Mon, 3 Aug 2020 11:03:35 +0800 Subject: [PATCH 2/3] Fix Change-Id: I2e9e15fa7aafa263b75d912015386a8f69fa75d4 --- .../administrator-guide/materialized_view.md | 8 +++-- .../administrator-guide/materialized_view.md | 31 +++++++++---------- 2 files changed, 19 insertions(+), 20 deletions(-) diff --git a/docs/en/administrator-guide/materialized_view.md b/docs/en/administrator-guide/materialized_view.md index 9db165ff7bd2eb..38d4d58bb6da9b 100644 --- a/docs/en/administrator-guide/materialized_view.md +++ b/docs/en/administrator-guide/materialized_view.md @@ -111,7 +111,7 @@ The matching relationship between the aggregation in the materialized view and t | max | max | | count | count | | bitmap\_union | bitmap\_union, bitmap\_union\_count, count( distinct) | -| hll\_union | hll\_raw\_agg, hll\_union\_agg, ndv() | +| hll\_union | hll\_raw\_agg, hll\_union\_agg, ndv, approx_count_distinct | After the aggregation functions of bitmap and hll match the materialized view in the query, the aggregation operator of the query will be rewritten according to the table structure of the materialized view. See example 2 for details. @@ -143,7 +143,7 @@ MySQL [test]> desc mv_test all; +-----------+---------------+-----------------+----------+------+-------+---------+--------------+ ``` -You can see that the current `mv_test` table has three materialized views: mv_1, mv_2 and mv_3, and their table structure. +You can see that the current `mv_test` table has three materialized views: mv\_1, mv\_2 and mv\_3, and their table structure. ### Delete materialized view @@ -206,7 +206,8 @@ Query OK, 0 rows affected (0.012 sec) Since the creation of a materialized view is an asynchronous operation, after the user submits the task of creating a materialized view, he needs to asynchronously check whether the materialized view has been constructed through a command. The command is as follows: ``` -SHOW ALTER TABLE ROLLUP FROM db_name; +SHOW ALTER TABLE ROLLUP FROM db_name; (Version 0.12) +SHOW ALTER TABLE MATERIALIZED VIEW FROM db_name; (Version 0.13) ``` In this command, `db_name` is a parameter, you need to replace it with your real db name. The result of the command is to display all the tasks of creating a materialized view of this db. The results are as follows: @@ -481,3 +482,4 @@ This problem can be solved by creating a materialized view with k3 as the first 1. The parameter of the aggregate function of the materialized view does not support the expression only supports a single column, for example: sum(a+b) does not support. 2. If the conditional column of the delete statement does not exist in the materialized view, the delete operation cannot be performed. If you must delete data, you need to delete the materialized view before deleting the data. 3. Too many materialized views on a single table will affect the efficiency of importing: When importing data, the materialized view and base table data are updated synchronously. If a table has more than 10 materialized view tables, it may cause the import speed to be very high. slow. This is the same as a single import needs to import 10 tables at the same time. +4. The same column with different aggregate functions cannot appear in a materialized view at the same time. For example, select sum(a), min(a) from table are not supported. diff --git a/docs/zh-CN/administrator-guide/materialized_view.md b/docs/zh-CN/administrator-guide/materialized_view.md index 4ba0835acbe7cb..eecdbffd78b2b6 100644 --- a/docs/zh-CN/administrator-guide/materialized_view.md +++ b/docs/zh-CN/administrator-guide/materialized_view.md @@ -39,7 +39,7 @@ under the License. ## 优势 -+ 对于那些经常重复的使用相同的子查询结果的查询性能大幅提升 ++ 对于那些经常重复的使用相同的子查询结果的查询性能大幅提升。 + Doris自动维护物化视图的数据,无论是新的导入,还是删除操作都能保证base 表和物化视图表的数据一致性。无需任何额外的人工维护成本。 + 查询时,会自动匹配到最优物化视图,并直接从物化视图中读取数据。 @@ -86,7 +86,7 @@ HELP CREATE MATERIALIZED VIEW ### 支持聚合函数 -目前物化视图功能支持的聚合函数有: +目前物化视图创建语句支持的聚合函数有: + SUM, MIN, MAX (Version 0.12) + COUNT, BITMAP\_UNION, HLL\_UNION (Version 0.13) @@ -95,7 +95,7 @@ HELP CREATE MATERIALIZED VIEW 为保证物化视图表和 Base 表的数据一致性, Doris 会将导入,删除等对 base 表的操作都同步到物化视图表中。并且通过增量更新的方式来提升更新效率。通过事务方式来保证原子性。 -比如如果用户通过 INSERT 命令插入数据到 base 表中,则这条数据会同步插入到物化视图中。当 base 表和物化视图表均写入成功后,INSERT 命令才会成功返回。 +比如如果用户通过 INSERT 命令插入数据到 base 表中,则这条数据会同步插入到物化视图中。当 base 表和物化视图表均写入成功后,INSERT 命令才会成功返回。 ### 查询自动匹配 @@ -111,8 +111,8 @@ HELP CREATE MATERIALIZED VIEW | min | min | | max | max | | count | count | -| bitmap\_union | bitmap\_union, bitmap\_union\_count, count(distinct) | -| hll\_union | hll\_raw\_agg, hll\_union\_agg, ndv() | +| bitmap\_union | bitmap\_union, bitmap\_union\_count, count(distinct) | +| hll\_union | hll\_raw\_agg, hll\_union\_agg, ndv, approx_count_distinct | 其中 bitmap 和 hll 的聚合函数在查询匹配到物化视图后,查询的聚合算子会根据物化视图的表结构进行一个改写。详细见实例2。 @@ -144,7 +144,7 @@ MySQL [test]> desc mv_test all; +-----------+---------------+-----------------+----------+------+-------+---------+--------------+ ``` -可以看到当前 `mv_test` 表一共有三张物化视图:mv_1, mv_2 和 mv_3,以及他们的表结构。 +可以看到当前 `mv_test` 表一共有三张物化视图:mv\_1, mv\_2 和 mv\_3,以及他们的表结构。 ### 删除物化视图 @@ -207,7 +207,8 @@ Query OK, 0 rows affected (0.012 sec) 由于创建物化视图是一个异步的操作,用户在提交完创建物化视图任务后,需要异步的通过命令检查物化视图是否构建完成。命令如下: ``` -SHOW ALTER TABLE ROLLUP FROM db_name; +SHOW ALTER TABLE ROLLUP FROM db_name; (Version 0.12) +SHOW ALTER TABLE MATERIALIZED VIEW FROM db_name; (Version 0.13) ``` 这个命令中 `db_name` 是一个参数, 你需要替换成自己真实的 db 名称。命令的结果是显示这个 db 的所有创建物化视图的任务。结果如下: @@ -325,7 +326,7 @@ MySQL [test]> desc advertiser_view_record; 由于用户想要查询的是广告的 UV 值,也就是需要对相同广告的用户进行一个精确去重,则查询一般为: ``` - SELECT advertiser, channel, count(distinct user_id) FROM  advertiser_view_record GROUP BY advertiser, channel; + SELECT advertiser, channel, count(distinct user_id) FROM advertiser_view_record GROUP BY advertiser, channel; ``` 针对这种求 UV 的场景,我们就可以创建一个带 `bitmap_union` 的物化视图从而达到一个预先精确去重的效果。 @@ -361,16 +362,16 @@ MySQL [test]> desc advertiser_view_record; 2. 查询自动匹配 - 当物化视图表创建完成后,查询广告 UV 时,Doris就会自动从刚才创建好的物化视图 `advertiser_uv ` 中查询数据。比如原始的查询语句如下: + 当物化视图表创建完成后,查询广告 UV 时,Doris就会自动从刚才创建好的物化视图 `advertiser_uv` 中查询数据。比如原始的查询语句如下: ``` - SELECT advertiser, channel, count(distinct user_id) FROM  advertiser_view_record GROUP BY advertiser, channel; + SELECT advertiser, channel, count(distinct user_id) FROM advertiser_view_record GROUP BY advertiser, channel; ``` 在选中物化视图后,实际的查询会转化为: ``` - SELECT advertiser, channel, bitmap_union_count(to_bitmap(user_id)) FROM advertiser_uv GROUP BY advertiser, channel; + SELECT advertiser, channel, bitmap_union_count(to_bitmap(user_id)) FROM advertiser_uv GROUP BY advertiser, channel; ``` 通过 EXPLAIN 命令可以检验到 Doris 是否匹配到了物化视图: @@ -482,9 +483,5 @@ MySQL [test]> desc advertiser_view_record; 1. 物化视图的聚合函数的参数不支持表达式仅支持单列,比如: sum(a+b)不支持。 2. 如果删除语句的条件列,在物化视图中不存在,则不能进行删除操作。如果一定要删除数据,则需要先将物化视图删除,然后方可删除数据。 -3. 单表上过多的物化视图会影响导入的效率:导入数据时,物化视图和 base 表数据是同步更新的,如果一张表的物化视图表超过10张,则有可能导致导入速度很慢。这就像单次导入需要同时导入10张表数据是一样的。 - - - - - +3. 单表上过多的物化视图会影响导入的效率:导入数据时,物化视图和 base 表数据是同步更新的,如果一张表的物化视图表超过10张,则有可能导致导入速度很慢。这就像单次导入需要同时导入10张表数据是一样的。 +4. 相同列,不同聚合函数,不能同时出现在一张物化视图中,比如:select sum(a), min(a) from table 不支持。 From fbb3fd9ba18a1a5d2acc4976fb42a51ddda38ef2 Mon Sep 17 00:00:00 2001 From: emmymiao87 <522274284@qq.com> Date: Mon, 3 Aug 2020 11:07:09 +0800 Subject: [PATCH 3/3] Fix Change-Id: I9c5f3f9436616ddb776bac0c8d4d1cbba9c14c51 --- docs/en/administrator-guide/materialized_view.md | 4 ++-- docs/zh-CN/administrator-guide/materialized_view.md | 2 +- .../Data Definition/CREATE MATERIALIZED VIEW.md | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/en/administrator-guide/materialized_view.md b/docs/en/administrator-guide/materialized_view.md index 38d4d58bb6da9b..e79f275d0308d0 100644 --- a/docs/en/administrator-guide/materialized_view.md +++ b/docs/en/administrator-guide/materialized_view.md @@ -110,8 +110,8 @@ The matching relationship between the aggregation in the materialized view and t | min | min | | max | max | | count | count | -| bitmap\_union | bitmap\_union, bitmap\_union\_count, count( distinct) | -| hll\_union | hll\_raw\_agg, hll\_union\_agg, ndv, approx_count_distinct | +| bitmap\_union | bitmap\_union, bitmap\_union\_count, count(distinct) | +| hll\_union | hll\_raw\_agg, hll\_union\_agg, ndv, approx\_count\_distinct | After the aggregation functions of bitmap and hll match the materialized view in the query, the aggregation operator of the query will be rewritten according to the table structure of the materialized view. See example 2 for details. diff --git a/docs/zh-CN/administrator-guide/materialized_view.md b/docs/zh-CN/administrator-guide/materialized_view.md index eecdbffd78b2b6..a0fe43c4d72650 100644 --- a/docs/zh-CN/administrator-guide/materialized_view.md +++ b/docs/zh-CN/administrator-guide/materialized_view.md @@ -112,7 +112,7 @@ HELP CREATE MATERIALIZED VIEW | max | max | | count | count | | bitmap\_union | bitmap\_union, bitmap\_union\_count, count(distinct) | -| hll\_union | hll\_raw\_agg, hll\_union\_agg, ndv, approx_count_distinct | +| hll\_union | hll\_raw\_agg, hll\_union\_agg, ndv, approx\_count\_distinct | 其中 bitmap 和 hll 的聚合函数在查询匹配到物化视图后,查询的聚合算子会根据物化视图的表结构进行一个改写。详细见实例2。 diff --git a/docs/zh-CN/sql-reference/sql-statements/Data Definition/CREATE MATERIALIZED VIEW.md b/docs/zh-CN/sql-reference/sql-statements/Data Definition/CREATE MATERIALIZED VIEW.md index a2a68c172157ed..531fdd0b7b4eda 100644 --- a/docs/zh-CN/sql-reference/sql-statements/Data Definition/CREATE MATERIALIZED VIEW.md +++ b/docs/zh-CN/sql-reference/sql-statements/Data Definition/CREATE MATERIALIZED VIEW.md @@ -89,7 +89,7 @@ under the License. ``` - PROPERTIES ("key" = "value", "key" = "value" ...) + PROPERTIES ("key" = "value", "key" = "value" ...) ```