-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Support bitmap_intersect #3571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Support bitmap_intersect #3571
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
7df59a6
Support bitmap_intersect
EmmyMiao87 d2bdd1d
Update be/src/util/bitmap_value.h
EmmyMiao87 45ee670
Add docs of bitmap_union and bitmap_intersect
EmmyMiao87 1981a26
Merge branch 'bitmap_intersect' of https://github.com/EmmyMiao87/incu…
EmmyMiao87 c150919
Change the init function name of bitmap_intersect
EmmyMiao87 9e0bf1a
Change unit test
EmmyMiao87 48fddb6
Support null of bitmap_intersect
EmmyMiao87 e828ea1
Add doc in sidebar
EmmyMiao87 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -302,6 +302,31 @@ void BitmapFunctions::bitmap_union(FunctionContext* ctx, const StringVal& src, S | |
| } | ||
| } | ||
|
|
||
| // the dst value could be null | ||
| void BitmapFunctions::nullable_bitmap_init(FunctionContext* ctx, StringVal* dst) { | ||
| dst->is_null = true; | ||
| } | ||
|
|
||
| void BitmapFunctions::bitmap_intersect(FunctionContext* ctx, const StringVal& src, StringVal* dst) { | ||
| if (src.is_null) { | ||
| return; | ||
| } | ||
| // if dst is null, the src input is the first value | ||
| if (dst->is_null) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would better add a |
||
| dst->is_null = false; | ||
| dst->len = sizeof(BitmapValue); | ||
| dst->ptr = (uint8_t*)new BitmapValue((char*) src.ptr); | ||
| return; | ||
| } | ||
| auto dst_bitmap = reinterpret_cast<BitmapValue*>(dst->ptr); | ||
| // zero size means the src input is a agg object | ||
| if (src.len == 0) { | ||
EmmyMiao87 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| (*dst_bitmap) &= *reinterpret_cast<BitmapValue*>(src.ptr); | ||
| } else { | ||
| (*dst_bitmap) &= BitmapValue((char*) src.ptr); | ||
| } | ||
| } | ||
|
|
||
| BigIntVal BitmapFunctions::bitmap_count(FunctionContext* ctx, const StringVal& src) { | ||
| if (src.is_null) { | ||
| return 0; | ||
|
|
@@ -343,12 +368,17 @@ StringVal BitmapFunctions::bitmap_hash(doris_udf::FunctionContext* ctx, const do | |
| } | ||
|
|
||
| StringVal BitmapFunctions::bitmap_serialize(FunctionContext* ctx, const StringVal& src) { | ||
| if (src.is_null) { | ||
| return src; | ||
| } | ||
|
|
||
| auto src_bitmap = reinterpret_cast<BitmapValue*>(src.ptr); | ||
| StringVal result = serialize(ctx, src_bitmap); | ||
| delete src_bitmap; | ||
| return result; | ||
| } | ||
|
|
||
| // This is a init function for intersect_count not for bitmap_intersect. | ||
| template<typename T, typename ValType> | ||
| void BitmapFunctions::bitmap_intersect_init(FunctionContext* ctx, StringVal* dst) { | ||
| dst->is_null = false; | ||
|
|
@@ -510,6 +540,7 @@ template void BitmapFunctions::bitmap_update_int<IntVal>( | |
| template void BitmapFunctions::bitmap_update_int<BigIntVal>( | ||
| FunctionContext* ctx, const BigIntVal& src, StringVal* dst); | ||
|
|
||
| // this is init function for intersect_count not for bitmap_intersect | ||
| template void BitmapFunctions::bitmap_intersect_init<int8_t, TinyIntVal>( | ||
| FunctionContext* ctx, StringVal* dst); | ||
| template void BitmapFunctions::bitmap_intersect_init<int16_t, SmallIntVal>( | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
61 changes: 61 additions & 0 deletions
61
docs/en/sql-reference/sql-functions/bitmap-functions/bitmap_intersect.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| --- | ||
| { | ||
| "title": "bitmap_intersect", | ||
| "language": "en" | ||
EmmyMiao87 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| } | ||
| --- | ||
|
|
||
| <!-- | ||
| Licensed to the Apache Software Foundation (ASF) under one | ||
| or more contributor license agreements. See the NOTICE file | ||
| distributed with this work for additional information | ||
| regarding copyright ownership. The ASF licenses this file | ||
| to you under the Apache License, Version 2.0 (the | ||
| "License"); you may not use this file except in compliance | ||
| with the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, | ||
| software distributed under the License is distributed on an | ||
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations | ||
| under the License. | ||
| --> | ||
|
|
||
| # bitmap_intersect | ||
| ## description | ||
|
|
||
| Aggregation function, used to calculate the bitmap intersection after grouping. Common usage scenarios such as: calculating user retention rate. | ||
|
|
||
| ### Syntax | ||
|
|
||
| `BITMAP BITMAP_INTERSECT(BITMAP value)` | ||
|
|
||
| Enter a set of bitmap values, find the intersection of the set of bitmap values, and return. | ||
|
|
||
| ## example | ||
|
|
||
| Table schema | ||
|
|
||
| ``` | ||
| KeysType: AGG_KEY | ||
| Columns: tag varchar, date datetime, user_id bitmap bitmap_union | ||
| ``` | ||
|
|
||
| ``` | ||
| Find the retention of users between 2020-05-18 and 2020-05-19 under different tags. | ||
| mysql> select tag, bitmap_intersect(user_id) from (select tag, date, bitmap_union(user_id) user_id from table where date in ('2020-05-18', '2020-05-19') group by tag, date) a group by tag; | ||
| ``` | ||
|
|
||
| Used in combination with the bitmap_to_string function to obtain the specific data of the intersection | ||
|
|
||
| ``` | ||
| Who are the users retained under different tags between 2020-05-18 and 2020-05-19? | ||
| mysql> select tag, bitmap_to_string(bitmap_intersect(user_id)) from (select tag, date, bitmap_union(user_id) user_id from table where date in ('2020-05-18', '2020-05-19') group by tag, date) a group by tag; | ||
| ``` | ||
|
|
||
| ## keyword | ||
|
|
||
| BITMAP_INTERSECT, BITMAP | ||
58 changes: 58 additions & 0 deletions
58
docs/en/sql-reference/sql-functions/bitmap-functions/bitmap_union.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| --- | ||
| { | ||
| "title": "bitmap_union", | ||
| "language": "en" | ||
| } | ||
| --- | ||
|
|
||
| <!-- | ||
| Licensed to the Apache Software Foundation (ASF) under one | ||
| or more contributor license agreements. See the NOTICE file | ||
| distributed with this work for additional information | ||
| regarding copyright ownership. The ASF licenses this file | ||
| to you under the Apache License, Version 2.0 (the | ||
| "License"); you may not use this file except in compliance | ||
| with the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, | ||
| software distributed under the License is distributed on an | ||
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations | ||
| under the License. | ||
| --> | ||
|
|
||
| # bitmap_union | ||
| ## description | ||
|
|
||
| Aggregate function, used to calculate the grouped bitmap union. Common usage scenarios such as: calculating PV, UV. | ||
|
|
||
| ### Syntax | ||
|
|
||
| `BITMAP BITMAP_UNION(BITMAP value)` | ||
|
|
||
| Enter a set of bitmap values, find the union of this set of bitmap values, and return. | ||
|
|
||
| ## example | ||
|
|
||
| ``` | ||
| mysql> select page_id, bitmap_union(user_id) from table group by page_id; | ||
| ``` | ||
|
|
||
| Combined with the bitmap_count function, the PV data of the web page can be obtained | ||
|
|
||
| ``` | ||
| mysql> select page_id, bitmap_count(bitmap_union(user_id)) from table group by page_id; | ||
| ``` | ||
|
|
||
| When the user_id field is int, the above query semantics is equivalent to | ||
|
|
||
| ``` | ||
| mysql> select page_id, count(distinct user_id) from table group by page_id; | ||
| ``` | ||
|
|
||
| ## keyword | ||
|
|
||
| BITMAP_UNION, BITMAP |
62 changes: 62 additions & 0 deletions
62
docs/zh-CN/sql-reference/sql-functions/bitmap-functions/bitmap_intersect.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,62 @@ | ||
| --- | ||
| { | ||
| "title": "bitmap_intersect", | ||
| "language": "zh-CN" | ||
| } | ||
| --- | ||
|
|
||
| <!-- | ||
| Licensed to the Apache Software Foundation (ASF) under one | ||
| or more contributor license agreements. See the NOTICE file | ||
| distributed with this work for additional information | ||
| regarding copyright ownership. The ASF licenses this file | ||
| to you under the Apache License, Version 2.0 (the | ||
| "License"); you may not use this file except in compliance | ||
| with the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, | ||
| software distributed under the License is distributed on an | ||
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations | ||
| under the License. | ||
| --> | ||
|
|
||
| # bitmap_intersect | ||
| ## description | ||
|
|
||
| 聚合函数,用于计算分组后的 bitmap 交集。常见使用场景如:计算用户留存率。 | ||
|
|
||
| ### Syntax | ||
|
|
||
| `BITMAP BITMAP_INTERSECT(BITMAP value)` | ||
|
|
||
| 输入一组 bitmap 值,求这一组 bitmap 值的交集,并返回。 | ||
|
|
||
| ## example | ||
|
|
||
| 表结构 | ||
|
|
||
| ``` | ||
| KeysType: AGG_KEY | ||
| Columns: tag varchar, date datetime, user_id bitmap bitmap_union | ||
|
|
||
| ``` | ||
|
|
||
| ``` | ||
| 求今天和昨天不同 tag 下的用户留存 | ||
| mysql> select tag, bitmap_intersect(user_id) from (select tag, date, bitmap_union(user_id) user_id from table where date in ('2020-05-18', '2020-05-19') group by tag, date) a group by tag; | ||
| ``` | ||
|
|
||
| 和 bitmap_to_string 函数组合使用可以获取交集的具体数据 | ||
|
|
||
| ``` | ||
| 求今天和昨天不同 tag 下留存的用户都是哪些 | ||
| mysql> select tag, bitmap_to_string(bitmap_intersect(user_id)) from (select tag, date, bitmap_union(user_id) user_id from table where date in ('2020-05-18', '2020-05-19') group by tag, date) a group by tag; | ||
| ``` | ||
|
|
||
| ## keyword | ||
|
|
||
| BITMAP_INTERSECT, BITMAP |
58 changes: 58 additions & 0 deletions
58
docs/zh-CN/sql-reference/sql-functions/bitmap-functions/bitmap_union.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| --- | ||
| { | ||
| "title": "bitmap_union", | ||
| "language": "zh-CN" | ||
| } | ||
| --- | ||
|
|
||
| <!-- | ||
| Licensed to the Apache Software Foundation (ASF) under one | ||
| or more contributor license agreements. See the NOTICE file | ||
| distributed with this work for additional information | ||
| regarding copyright ownership. The ASF licenses this file | ||
| to you under the Apache License, Version 2.0 (the | ||
| "License"); you may not use this file except in compliance | ||
| with the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, | ||
| software distributed under the License is distributed on an | ||
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations | ||
| under the License. | ||
| --> | ||
|
|
||
| # bitmap_union | ||
| ## description | ||
|
|
||
| 聚合函数,用于计算分组后的 bitmap 并集。常见使用场景如:计算PV,UV。 | ||
|
|
||
| ### Syntax | ||
|
|
||
| `BITMAP BITMAP_UNION(BITMAP value)` | ||
|
|
||
| 输入一组 bitmap 值,求这一组 bitmap 值的并集,并返回。 | ||
|
|
||
| ## example | ||
|
|
||
| ``` | ||
| mysql> select page_id, bitmap_union(user_id) from table group by page_id; | ||
| ``` | ||
|
|
||
| 和 bitmap_count 函数组合使用可以求得网页的 PV 数据 | ||
|
|
||
| ``` | ||
| mysql> select page_id, bitmap_count(bitmap_union(user_id)) from table group by page_id; | ||
| ``` | ||
|
|
||
| 当 user_id 字段为 int 时,上面查询语义等同于 | ||
|
|
||
| ``` | ||
| mysql> select page_id, count(distinct user_id) from table group by page_id; | ||
| ``` | ||
|
|
||
| ## keyword | ||
|
|
||
| BITMAP_UNION, BITMAP |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why initial result bitmap as null? it seems that it will return empty bitmap when result bitmap is null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial result bitmap must be null. Otherwise, the intersection between dst and src will be empty.