Skip to content

Support Bitmap Intersect #3552

@EmmyMiao87

Description

@EmmyMiao87

Support Bitmap Intersect

Support aggregate function Bitmap Intersect, it is mainly used to take intersection of grouped data.

bitmap_intersect

Calculates the intersection of bitmap columns and returns a bitmap object.

bitmap_intersect(expr)

Parameters

The expr column type must be bitmap.

Return value

bitmap object

Example

table schema

create table bitmap_intersect_test (
    tag varchar(20),
    user_id bitmap bitmap_union
) 
AGGREGATE KEY(tag)
DISTRIBUTED BY HASH(tag) BUCKETS 3;

Query which users satisfy the three tags a, b, and c at the same time.

select bitmap_to_string(bitmap_intersect(user_id)) from 
(
    select bitmap_union(user_id) user_id from bitmap_intersect_test 
    where tag in ('a', 'b', 'c')
    group by tag
) a

Design

Semantic analysis

The child type of bitmap_intersect must be bitmap.

class FunctionCallExpr {

    void analyze() {
      if(fnName.equals("bitmap_intersect")) {
          ...
          if(!fn.getChild(0).isBitmapType()) {
              throw new AnalysisException("the child type of " + fnName + " must be bitmap")
          }
          ...
      }
    }
}

Function implement

The function of each stage of `bitmap_intersect``` is declared in` function set```.

Function definition

FunctionName: bitmap_intersect,
InputType: bitmap,
OutputType: bitmap,
IntermediateType: varchar

init

Directly reuse the current bitmap init function

update
merge

Perform intersection calculation on the bitmap grouped on the current node

void BitmapFunctions::bitmap_intersect(FunctionContext* ctx, const StringVal& src, StringVal* dst) {
    if (src.is_null) {
        return;
    }
    auto dst_bitmap = reinterpret_cast<BitmapValue*>(dst->ptr);
    // zero size means the src input is a agg object
    if (src.len == 0) {
        (*dst_bitmap) &= *reinterpret_cast<BitmapValue*>(src.ptr);
    } else {
        (*dst_bitmap) &= BitmapValue((char*) src.ptr);
    }
}

serialize
finalize

Directly reuse the current bitmap serialization function

Query plan


mysql> explain select bitmap_intersect(user_id) from (select bitmap_union(user_id) user_id from  bitmap_intersect_test   where tag in ('a', 'b', 'c') group by tag ) a;
+----------------------------------------------------------------------------------------+
| Explain String                                                                         |
+----------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                                        |
|  OUTPUT EXPRS:<slot 8>                                                                 |
|   PARTITION: UNPARTITIONED                                                             |
|                                                                                        |
|   RESULT SINK                                                                          |
|                                                                                        |
|   6:AGGREGATE (merge finalize)                                                         |
|   |  output: bitmap_intersect(<slot 7>)                                                    |
|   |  group by:                                                                         |
|   |  tuple ids: 5                                                                      |
|   |                                                                                    |
|   5:EXCHANGE                                                                           |
|      tuple ids: 4                                                                      |
|                                                                                        |
| PLAN FRAGMENT 1                                                                        |
|  OUTPUT EXPRS:                                                                         |
|   PARTITION: HASH_PARTITIONED: <slot 2>                                                |
|                                                                                        |
|   STREAM DATA SINK                                                                     |
|     EXCHANGE ID: 05                                                                    |
|     UNPARTITIONED                                                                      |
|                                                                                        |
|   2:AGGREGATE (update serialize)                                                       |
|   |  output: bitmap_intersect(<slot 5>)                                                    |
|   |  group by:                                                                         |
|   |  tuple ids: 4                                                                      |
|   |                                                                                    |
|   4:AGGREGATE (merge finalize)                                                         |
|   |  output: bitmap_union(<slot 3>)                                                    |
|   |  group by: <slot 2>                                                                |
|   |  tuple ids: 2                                                                      |
|   |                                                                                    |
|   3:EXCHANGE                                                                           |
|      tuple ids: 1                                                                      |
|                                                                                        |
| PLAN FRAGMENT 2                                                                        |
|  OUTPUT EXPRS:                                                                         |
|   PARTITION: RANDOM                                                                    |
|                                                                                        |
|   STREAM DATA SINK                                                                     |
|     EXCHANGE ID: 03                                                                    |
|     HASH_PARTITIONED: <slot 2>                                                         |
|                                                                                        |
|   1:AGGREGATE (update serialize)                                                       |
|   |  STREAMING                                                                         |
|   |  output: bitmap_union(`user_id`)                                                   |
|   |  group by: `tag`                                                                   |
|   |  tuple ids: 1                                                                      |
|   |                                                                                    |
|   0:OlapScanNode                                                                       |
|      TABLE:  bitmap_intersect_test                                                  |
|      PREAGGREGATION: ON                                                                |
|      PREDICATES: `tag` IN ('a', 'b', 'c') |
|      partitions=1/1                                                                    |
|      rollup: bitmap_intersect_test                                                  |
|      tabletRatio=100/100                                                               |                                                          |
|      numNodes=6                                                                        |
|      tuple ids: 0                                                                      |
+----------------------------------------------------------------------------------------+

Metadata

Metadata

Assignees

Labels

api-reviewCategorizes an issue or PR as actively needing an API review.area/sql/functionIssues or PRs related to the SQL functionskind/designCategorizes issue or PR as related to design.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions