Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] where条件带in的子查询过滤出现少数据问题 #29443

Open
3 tasks done
guanghuan96 opened this issue Jan 3, 2024 · 0 comments
Open
3 tasks done

[Bug] where条件带in的子查询过滤出现少数据问题 #29443

guanghuan96 opened this issue Jan 3, 2024 · 0 comments

Comments

@guanghuan96
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

Version

2.0.3

What's Wrong?

完整sql语句如下:

select 
      count(distinct employee_no) as employee_no   
from retail_dim.dim_ehr_org_staff_position_all_1229 
where employee_no in (
    select distinct employee_no  
    from retail_dim.dim_ehr_org_staff_position_all_1229
    where office_start_date = '2023-12-01'
) 

-- 问题描述:
where条件里的子查询查出的结果集条数为:2012条
select count(distinct employee_no)
from retail_dim.dim_ehr_org_staff_position_all_1229
where office_start_date = '2023-12-01'

但是外面整个语句查询数量却少了一半,只有918条
select
count(distinct employee_no)
from retail_dim.dim_ehr_org_staff_position_all_1229
where employee_no in (
select distinct employee_no
from retail_dim.dim_ehr_org_staff_position_all_1229
where office_start_date = '2023-12-01'
)

CREATE TABLE `dim_ehr_org_staff_position_all_1229` (
  `employee_no` char(20) NULL COMMENT '员工工号',
  `organ_new_no` char(20) NULL COMMENT '组织编码',
  `position_no` char(10) NULL COMMENT '岗位编号',
  `office_start_date` date NULL COMMENT '任职开始日期',
  `office_status` tinyint(4) NULL COMMENT '在本组织任职状态(1:在职,0:失效)',
  `employee_name` char(50) NULL COMMENT '员工姓名',
  `parent_employee_no` char(10) NULL COMMENT '上级领导编码',
  `parent_employee_name` varchar(50) NULL COMMENT '上级领导姓名',
  `unit_type` char(20) NULL COMMENT '组织类型',
  `organ_new_name` varchar(200) NULL COMMENT '组织名称',
  `position_name` char(50) NULL COMMENT '岗位名称',
  `position_type` tinyint(4) NULL COMMENT '岗位类型(01:管理线,02:陈列线)',
  `brand_unit_no` char(20) NULL COMMENT '品牌部编码',
  `region_no` char(10) NULL COMMENT '小区编码',
  `region_name` char(30) NULL COMMENT '小区名称',
  `office_end_date` date NULL COMMENT '任职结束日期',
  `update_time` datetime NULL COMMENT '更新时间'
) ENGINE=OLAP
UNIQUE KEY(`employee_no`, `organ_new_no`, `position_no`, `office_start_date`, `office_status`)
COMMENT '员工任职拉链表'
DISTRIBUTED BY HASH(`employee_no`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 3",
"is_being_synced" = "false",
"storage_format" = "V2",
"light_schema_change" = "true",
"disable_auto_compaction" = "false",
"enable_single_replica_compaction" = "false"
);

-- 执行计划如下:
PLAN FRAGMENT 0
OUTPUT EXPRS:
employee_no[#46]
PARTITION: UNPARTITIONED

HAS_COLO_PLAN_NODE: false

VRESULT SINK

809:VAGGREGATE (merge finalize)
| output: count(employee_no[#45])[#46]
| group by:
| cardinality=1
|
806:VEXCHANGE
offset: 0

PLAN FRAGMENT 1

PARTITION: HASH_PARTITIONED: employee_no[#21]

HAS_COLO_PLAN_NODE: true

STREAM DATA SINK
EXCHANGE ID: 806
UNPARTITIONED

803:VAGGREGATE (update serialize)
| output: partial_count(employee_no[#44])[#45]
| group by:
| cardinality=1
|
800:VAGGREGATE (merge serialize)
| group by: employee_no[#43]
| cardinality=42,546
|
797:VAGGREGATE (update serialize)
| group by: employee_no[#42]
| cardinality=85,092
|
794:VHASH JOIN
| join op: LEFT SEMI JOIN(COLOCATE[])[]
| equal join conjunct: employee_no[#40] = employee_no[#20]
| runtime filters: RF000[in_or_bloom] <- employee_no#20
| cardinality=170,185
| vec output tuple id: 6
| vIntermediate tuple ids: 5
| hash output slot ids: 40
|
|----791:VAGGREGATE (update finalize)
| | group by: employee_no[#19]
| | cardinality=21,273
| |
| 781:VOlapScanNode
| TABLE: default_cluster:retail_dim.dim_ehr_org_staff_position_all_1229(dim_ehr_org_staff_position_all_1229), PREAGGREGATION: OFF. Reason: DORIS_DELETE_SIGN is used as conjuncts.
| PREDICATES: office_start_date[#3] = '2023-12-01' AND DORIS_DELETE_SIGN[#17] = 0
| partitions=1/1 (dim_ehr_org_staff_position_all_1229), tablets=1/1, tabletList=4534320
| cardinality=170185, avgRowSize=121.58043, numNodes=3
| pushAggOp=NONE
| projections: employee_no[#0]
| project output tuple id: 1
|
774:VOlapScanNode
TABLE: default_cluster:retail_dim.dim_ehr_org_staff_position_all_1229(dim_ehr_org_staff_position_all_1229), PREAGGREGATION: OFF. Reason: DORIS_DELETE_SIGN is used as conjuncts.
PREDICATES: DORIS_DELETE_SIGN[#38] = 0
runtime filters: RF000[in_or_bloom] -> employee_no[#21]
partitions=1/1 (dim_ehr_org_staff_position_all_1229), tablets=1/1, tabletList=4534320
cardinality=170185, avgRowSize=121.58043, numNodes=3
pushAggOp=NONE
projections: employee_no[#21]
project output tuple id: 4

What You Expected?

期望得到正确的结果:
整个sql查询出来的结果应是2012条

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant