Add bitmap longitudinal cutting udaf #4198

zhbinbin · 2020-07-28T03:21:36Z

The original Doris bitmap aggregation function has poor performance on the intersection and union set of bitmap cardinality of more than one billion. There are two reasons for this. The first is that when the bitmap cardinality is large, if the data size exceeds 1g, the network / disk IO time consumption will increase; The second point is that all the sink data of the back-end be instance are transferred to the top node for intersection and union calculation, which leads to the pressure on the top single node and becomes the bottleneck.

My solution is to create a fixed schema table based on the Doris fragmentation rule, and hash fragment the ID range based on the bitmap, that is, cut the ID range vertically to form a small cube. Such bitmap blocks will become smaller and evenly distributed on all back-end be instances. Based on the schema table, some new high-performance udaf aggregation functions are developed. All Scan nodes participate in intersection and union calculation, and top nodes only summarize

The design goal is that the base number of bitmap is more than 10 billion, and the response time of cross union set calculation of 100 dimensional granularity is within 5 s

EmmyMiao87 · 2020-07-28T10:56:44Z

docs/.vuepress/sidebar/en.js

            directoryPath: "contrib/",
-            children:[],       
+            children:[
+                "udaf-bitmap-manual.md",


Please remove md

EmmyMiao87 · 2020-07-28T10:56:52Z

docs/.vuepress/sidebar/zh-CN.js

            directoryPath: "contrib/",
-            children:[],       
+            children:[
+                "udaf-bitmap-manual.md",


Same as above

EmmyMiao87 · 2020-07-28T10:57:22Z

docs/en/extending-doris/udf/contrib/udaf-bitmap-manual.md

+-->
+
+
+#Bitmap longitudinal cutting udaf


Please add a space at the beginning of title

EmmyMiao87 · 2020-07-28T10:57:38Z

docs/en/extending-doris/udf/contrib/udaf-bitmap-manual.md

+
+
+
+##Custom udaf


Same as above

EmmyMiao87 · 2020-07-28T11:09:27Z

This name of udaf is not accurate. Maybe 'udaf_orthogonal_bitmap' is better? Or?

EmmyMiao87 · 2020-07-28T11:32:57Z

contrib/udf/src/udaf_bitmap/bitmap_value.h

+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef DORIS_CONTRIB_UDF_SRC_UDAF_BITMAP_BITMAP_VALUE_H


Is this file same as the be/src/util/bitmap_value.h ?

Because it can not be reused, so it is copy, but a small change.

contrib/udf/src/udaf_bitmap/custom_bitmap_function.h

EmmyMiao87 · 2020-07-28T11:35:12Z

contrib/udf/src/udaf_bitmap/custom_bitmap_function.h

+
+namespace doris_udf {
+
+class CustomBitmapFunctions {


The class name should best reflect the meaning of dealing with orthogonal bitmap.

EmmyMiao87 · 2020-07-30T03:22:54Z

docs/zh-CN/extending-doris/udf/contrib/udaf-orthogonal-bitmap-manual.md

@@ -0,0 +1,209 @@
+---
+{
+    "title": "BITMAP正交计算UDAF",


正交的BITMAP计算UDAF

EmmyMiao87 · 2020-07-30T03:23:08Z

docs/zh-CN/extending-doris/udf/contrib/udaf-orthogonal-bitmap-manual.md

+under the License.
+-->
+
+# BITMAP正交计算UDAF


same as above

EmmyMiao87 · 2020-07-30T03:38:24Z

docs/zh-CN/extending-doris/udf/contrib/udaf-orthogonal-bitmap-manual.md

+新udaf需要在doris定义聚合函数时注册函数符号，函数符号通过动态库.so的方式被加载。
+
+### bitmap_orthogonal_intersect 
+


首先需要有函数的介绍，就是这个函数的行为是什么？是用来干啥的

EmmyMiao87 · 2020-07-30T03:39:17Z

docs/zh-CN/extending-doris/udf/contrib/udaf-orthogonal-bitmap-manual.md

+求交集函数
+  bitmap_orthogonal_intersect(bitmap_column, column_to_filter, filter_values)
+
+参数：


每个参数的介绍是需要包含，每个参数是什么意思的的，比如
第一个参数类型是bitmap，是待求交集的列。

EmmyMiao87 · 2020-07-30T03:42:21Z

docs/zh-CN/extending-doris/udf/contrib/udaf-orthogonal-bitmap-manual.md

+
+Doris原有的Bitmap聚合函数设计比较通用，但对亿级别以上bitmap大基数的交集和并集计算性能较差。排查后端be的bitmap聚合函数逻辑，发现主要有两个原因。一是当bitmap基数较大时，如数据大小超过1g，网络/磁盘IO处理时间比较长；二是后端be实例在scan数据后全部传输到顶层节点进行求交和并运算，给顶层单节点带来压力，成为处理瓶颈。
+
+解决方案是建表时增加hid列，罐库时hid列按照bitmap列的range划分，并且按hid均匀分桶。这样按range划分的聚合bitmap数据会均匀地分布在所有后端be实例上。在schema表的基础上，优化udaf聚合函数，使其在所有扫描节点参与分布式正交并算，然后在顶层节点进行汇总，如此会大大提高计算效率。


可以先总说，解决思路是什么。比如思路是将 bitmap列的值先按照range划分，不同range的值存储在不同的分桶中。保证不同分桶之间的bitmap值是正交的。然后再说怎么详细，最后说为什么这么做可以加速查询

EmmyMiao87 · 2020-07-30T03:46:59Z

contrib/udf/src/udaf_orthogonal_bitmap/orthogonal_bitmap_function.cpp

+}
+
+void OrthogonalBitmapFunctions::bitmap_count_merge(FunctionContext* context, const StringVal& src, StringVal* dst) {
+    if (dst->len != sizeof(int64_t)) {


Will dst be bitmap value?

EmmyMiao87 · 2020-07-30T03:49:53Z

docs/zh-CN/extending-doris/udf/contrib/udaf-orthogonal-bitmap-manual.md

+```
+libudaf_bitmap.so产出目录：
+```
+output/contrib/udf/lib/udaf_bitmap/libudaf_bitmap.so


名字好像是？ libudaf_orthogonal_bitmap.so?

EmmyMiao87 · 2020-07-30T03:50:47Z

docs/zh-CN/extending-doris/udf/contrib/udaf-orthogonal-bitmap-manual.md

+## 源码及编译
+源代码：
+```
+contrib/udf/src/udaf_bitmap/


名称统一一下，比如都用udaf_orthogonal_bitmap

EmmyMiao87 · 2020-07-30T03:51:14Z

docs/zh-CN/extending-doris/udf/contrib/udaf-orthogonal-bitmap-manual.md

+定义：
+```
+drop FUNCTION bitmap_orthogonal_intersect(BITMAP,BIGINT,BIGINT, ...);
+CREATE AGGREGATE FUNCTION bitmap_orthogonal_intersect(BITMAP,BIGINT,BIGINT, ...) RETURNS BITMAP INTERMEDIATE varchar(1)


注意文档中的名称统一

这个是指哪个名称？

比如前面都是 orthogonal_bitmap 那么这里也最好是 orthogonal_bitmap

EmmyMiao87 · 2020-07-30T03:52:33Z

docs/en/extending-doris/udf/contrib/udaf-orthogonal-bitmap-manual.md

+contrib/udf/src/udaf_bitmap/
+|-- bitmap_value.h
+|-- CMakeLists.txt
+|-- custom_bitmap_function.cpp


Suggested change

|-- custom_bitmap_function.cpp

|-- orthogonal_bitmap_function.cpp

EmmyMiao87 · 2020-08-13T12:02:43Z

docs/zh-CN/extending-doris/udf/contrib/udaf-orthogonal-bitmap-manual.md

+定义：
+```
+drop FUNCTION bitmap_orthogonal_intersect(BITMAP,BIGINT,BIGINT, ...);
+CREATE AGGREGATE FUNCTION bitmap_orthogonal_intersect(BITMAP,BIGINT,BIGINT, ...) RETURNS BITMAP INTERMEDIATE varchar(1)


比如前面都是 orthogonal_bitmap 那么这里也最好是 orthogonal_bitmap

EmmyMiao87 · 2020-08-13T12:04:05Z

docs/zh-CN/extending-doris/udf/contrib/udaf-orthogonal-bitmap-manual.md

+
+Doris原有的Bitmap聚合函数设计比较通用，但对亿级别以上bitmap大基数的交并集计算性能较差。排查后端be的bitmap聚合函数逻辑，发现主要有两个原因。一是当bitmap基数较大时，如bitmap大小超过1g，网络/磁盘IO处理时间比较长；二是后端be实例在scan数据后全部传输到顶层节点进行求交和并运算，给顶层单节点带来压力，成为处理瓶颈。
+
+解决思路是将bitmap列的值按照range划分，不同range的值存储在不同的分桶中，保证了不同分桶的bitmap值是正交的。这样，数据分布更均匀，一个查询会扫描所有分桶，在每个分桶中将正交的BITMAP进行聚合计算，然后把计算结果传输至顶层节点汇总，如此会大大提高计算效率，解决了顶层单节点计算瓶颈问题。


这一句有点歧义 在每个分桶中将正交的BITMAP进行聚合计算 是否改为 先分别对不同分桶中的正交bitmap进行聚合计算，然后顶层节点直接将聚合计算后的值合并汇总，并输出。

EmmyMiao87 · 2020-08-13T12:07:20Z

docs/zh-CN/extending-doris/udf/contrib/udaf-orthogonal-bitmap-manual.md

+Doris原有的Bitmap聚合函数设计比较通用，但对亿级别以上bitmap大基数的交并集计算性能较差。排查后端be的bitmap聚合函数逻辑，发现主要有两个原因。一是当bitmap基数较大时，如bitmap大小超过1g，网络/磁盘IO处理时间比较长；二是后端be实例在scan数据后全部传输到顶层节点进行求交和并运算，给顶层单节点带来压力，成为处理瓶颈。
+
+解决思路是将bitmap列的值按照range划分，不同range的值存储在不同的分桶中，保证了不同分桶的bitmap值是正交的。这样，数据分布更均匀，一个查询会扫描所有分桶，在每个分桶中将正交的BITMAP进行聚合计算，然后把计算结果传输至顶层节点汇总，如此会大大提高计算效率，解决了顶层单节点计算瓶颈问题。
+


这里可以加一个总说，

第一步：建表。这一步主要是为了xxx
第二步：编译 udaf，也就是编译 xxx，是为了xxx
第三步：将udaf 注册到doris中
第四部：如何使用

然后再针对每项分别说。

optimize docs and code

EmmyMiao87

LGTM

The original Doris bitmap aggregation function has poor performance on the intersection and union set of bitmap cardinality of more than one billion. There are two reasons for this. The first is that when the bitmap cardinality is large, if the data size exceeds 1g, the network / disk IO time consumption will increase; The second point is that all the sink data of the back-end be instance are transferred to the top node for intersection and union calculation, which leads to the pressure on the top single node and becomes the bottleneck. My solution is to create a fixed schema table based on the Doris fragmentation rule, and hash fragment the ID range based on the bitmap, that is, cut the ID range vertically to form a small cube. Such bitmap blocks will become smaller and evenly distributed on all back-end be instances. Based on the schema table, some new high-performance udaf aggregation functions are developed. All Scan nodes participate in intersection and union calculation, and top nodes only summarize The design goal is that the base number of bitmap is more than 10 billion, and the response time of cross union set calculation of 100 dimensional granularity is within 5 s. There are three udaf functions in this commit: orthogonal_bitmap_intersect_count, orthogonal_bitmap_union_count, orthogonal_bitmap_intersect.

EmmyMiao87 reviewed Jul 28, 2020

View reviewed changes

EmmyMiao87 reviewed Jul 30, 2020

View reviewed changes

zhbinbin force-pushed the udaf_bitmap branch 3 times, most recently from 9f7314f to d348d33 Compare July 31, 2020 09:57

zhbinbin force-pushed the udaf_bitmap branch from d348d33 to d55a578 Compare August 13, 2020 12:04

EmmyMiao87 reviewed Aug 13, 2020

View reviewed changes

add udaf_orthogonal_bitmap

0168aa3

optimize docs and code

zhbinbin force-pushed the udaf_bitmap branch from d55a578 to 0168aa3 Compare August 14, 2020 03:19

EmmyMiao87 added area/udf Issues or PRs related to the UDF good first issue approved Indicates a PR has been approved by one committer. labels Aug 14, 2020

EmmyMiao87 approved these changes Aug 14, 2020

View reviewed changes

EmmyMiao87 removed the good first issue label Aug 17, 2020

EmmyMiao87 merged commit f924282 into apache:master Aug 19, 2020

EmmyMiao87 mentioned this pull request Sep 1, 2020

Release Notes 0.13.0 #4370

Closed

yangzhg mentioned this pull request Feb 9, 2021

Release Notes 0.14.0 #5374

Closed

eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Aug 8, 2025

[chore](merge)Reslove maxcompute conflict. (apache#4198)

2818dbf

		新udaf需要在doris定义聚合函数时注册函数符号，函数符号通过动态库.so的方式被加载。

		### bitmap_orthogonal_intersect


		Doris原有的Bitmap聚合函数设计比较通用，但对亿级别以上bitmap大基数的交集和并集计算性能较差。排查后端be的bitmap聚合函数逻辑，发现主要有两个原因。一是当bitmap基数较大时，如数据大小超过1g，网络/磁盘IO处理时间比较长；二是后端be实例在scan数据后全部传输到顶层节点进行求交和并运算，给顶层单节点带来压力，成为处理瓶颈。

		解决方案是建表时增加hid列，罐库时hid列按照bitmap列的range划分，并且按hid均匀分桶。这样按range划分的聚合bitmap数据会均匀地分布在所有后端be实例上。在schema表的基础上，优化udaf聚合函数，使其在所有扫描节点参与分布式正交并算，然后在顶层节点进行汇总，如此会大大提高计算效率。

	\|-- custom_bitmap_function.cpp
	\|-- orthogonal_bitmap_function.cpp


		Doris原有的Bitmap聚合函数设计比较通用，但对亿级别以上bitmap大基数的交并集计算性能较差。排查后端be的bitmap聚合函数逻辑，发现主要有两个原因。一是当bitmap基数较大时，如bitmap大小超过1g，网络/磁盘IO处理时间比较长；二是后端be实例在scan数据后全部传输到顶层节点进行求交和并运算，给顶层单节点带来压力，成为处理瓶颈。

		解决思路是将bitmap列的值按照range划分，不同range的值存储在不同的分桶中，保证了不同分桶的bitmap值是正交的。这样，数据分布更均匀，一个查询会扫描所有分桶，在每个分桶中将正交的BITMAP进行聚合计算，然后把计算结果传输至顶层节点汇总，如此会大大提高计算效率，解决了顶层单节点计算瓶颈问题。




		##Custom udaf

Add bitmap longitudinal cutting udaf #4198

Add bitmap longitudinal cutting udaf #4198

Uh oh!

Conversation

zhbinbin commented Jul 28, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EmmyMiao87 commented Jul 28, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EmmyMiao87 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants