-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lossing data when inserting data to bitmap table by clickhouse-jdbc #641
Comments
@victortony, the SQL you provided is for populating DWS table If my understanding is correct, I think it's not related to JDBC driver, as the insertion happened on server without involving JDBC driver, if you're certain that no data loss found in the temp table. ClickHouse 19.16 and 20.3 are no longer supported, so maybe you should try 21.3 or at least 20.8? Moreover, if you can upgrade JDBC driver to 0.3.0, you no longer need the temp table as you can insert RoaringBitmap directly into the AggregateFunction column - please refer to example at here. |
不好意思,我把sql中的表名写错了,确认是使用clickhouse客户端进行插入bitmap数据是全的。
|
I don't think it's related to JDBC driver but something else with your cluster. Below worked for me on a standalone 21.3 server. Perhaps you can consult on Telegram/Slack or create an issue at here with detailed information like steps to reproduce? drop database if exists testbm;
create database testbm;
CREATE TABLE testbm.dm_user_set_mapping_di (
date UInt32,upsert_key String, uv AggregateFunction(groupBitmap, UInt64)
) ENGINE = MergeTree() partition by date order by (date, upsert_key);
create table testbm.dm_user_set_mapping_di_tmp (
date UInt32, upsert_key String, u_id UInt64
) ENGINE = MergeTree()
partition by date order by (date, upsert_key, intHash64(u_id))
sample by intHash64(u_id);
insert into testbm.dm_user_set_mapping_di_tmp
select 20210423, concat('key_', toString(rand64() % 2333)), number
from system.numbers limit 10000000;
insert into testbm.dm_user_set_mapping_di(date, upsert_key, uv)
select date, upsert_key, groupBitmapState(u_id)
from testbm.dm_user_set_mapping_di_tmp
where date=20210423
group by date, upsert_key;
select a.date, a.upsert_key,
bitmapCardinality(a.uv) - ifnull(b.cnt, 0) as cnt_diff,
b.bm is null ? 0 : bitmapHasAll(a.uv, b.bm) as has_all
from testbm.dm_user_set_mapping_di a
left join (
select date, upsert_key, uniqExact(u_id) as cnt, groupBitmapState(u_id) as bm
from testbm.dm_user_set_mapping_di_tmp
group by date, upsert_key
order by upsert_key
) b on a.date = b.date and a.upsert_key = b.upsert_key
where b.date is null or cnt_diff != 0 or has_all != 1
union all
select date, upsert_key, toInt64(bitmapCardinality(uv)) as cnt_diff, toUInt8(0) as has_all
from testbm.dm_user_set_mapping_di
where (date, upsert_key) not in (select distinct date, upsert_key from testbm.dm_user_set_mapping_di_tmp) |
thank you very much |
hi, 升级到21.3.8还是有问题,我的tmp表数据量是对的 ,但是使用JDBC执行insert生成bitmap数据量总差很多,在clickhouse客户端执行就一条不差。 导入数据相关代码:
/**
}` 生成bitmap相关代码: main方法 |
Not sure if it's related to #655. Does the latest code on develop branch work for you? |
hi, is it merged into any release version? |
Not yet. You'll need to check out code from develop and build a snapshot package by issue |
Hi, I have two tables in my system: dm_user_set_mapping_di and dm_user_set_mapping_di_tmp.
CREATE TABLE profile.dm_user_set_mapping_di (
dateUInt32,
userset_keyString,
uvAggregateFunction(groupBitmap, UInt64) ) ENGINE = Distributed( 'perftest_3shards_1replicas', 'profile', 'dm_user_set_mapping_di_local', cityHash64(userset_key)
CREATE TABLE profile.dm_user_set_mapping_di_tmp (
dateUInt32,
userset_keyString,
u_idUInt64 ) ENGINE = Distributed( 'perftest_3shards_1replicas', 'profile', 'dm_user_set_mapping_di_tmp_local', intHash64(u_id) )
Firstly, my system load data to "dm_user_set_mapping_di_tmp" by clickhouse-jdbc. And then execute the following sql to load data to "dm_user_set_mapping_di". The problem is that "dm_user_set_mapping_di_tmp" is accurate , but "dm_user_set_mapping_di" loss data by clickhouse-jdbc. When execute "insert" sql in clickhouse client without clickhouse-jdbc, the data in dm_user_set_mapping_di" become accurate.
insert into profile.dws_mifi_loan_user_profile_df (date, label_name, label_value, uv) select date, label_name, label_value, groupBitmapState(toUInt64(u_id)) as uv from profile.dws_mifi_loan_user_profile_df_tmp where date=20210423 group by date, label_name, label_value
I think maybe the data doesn't be updated, when the first table loaded data. Should I flush or update data of the first table when loading the second table, even I create a new jdbc connection?
ClickHouse client version 19.16.3.6.
Clickhouse-jdbc version 0.2.6.
The text was updated successfully, but these errors were encountered: