Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Featrue](default value) Support bitmap_empty default value #40364

Merged

Conversation

Yukang-Lian
Copy link
Collaborator

Proposed changes

Issue Number: close #xxx

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@Yukang-Lian
Copy link
Collaborator Author

run buildall

1 similar comment
@Yukang-Lian
Copy link
Collaborator Author

run buildall

Copy link
Contributor

github-actions bot commented Sep 4, 2024

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

github-actions bot commented Sep 4, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.85% (9395/25495)
Line Coverage: 28.28% (77470/273942)
Region Coverage: 27.68% (39985/144438)
Branch Coverage: 24.33% (20352/83642)
Coverage Report: http://coverage.selectdb-in.cc/coverage/71fe6708e4570bcaff8d3cdc60c3b830cfac11fb_71fe6708e4570bcaff8d3cdc60c3b830cfac11fb/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 38011 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 71fe6708e4570bcaff8d3cdc60c3b830cfac11fb, data reload: false

------ Round 1 ----------------------------------
q1	17606	4350	4279	4279
q2	2007	182	175	175
q3	12014	960	1140	960
q4	10515	751	697	697
q5	7790	2828	2826	2826
q6	227	142	140	140
q7	951	623	600	600
q8	9334	2087	2070	2070
q9	7058	6509	6559	6509
q10	6981	2168	2249	2168
q11	455	236	238	236
q12	405	232	233	232
q13	17785	3063	3036	3036
q14	282	247	238	238
q15	517	484	485	484
q16	584	526	524	524
q17	971	597	711	597
q18	7365	6893	6820	6820
q19	1394	999	961	961
q20	711	349	331	331
q21	4191	3114	3154	3114
q22	1135	1014	1023	1014
Total cold run time: 110278 ms
Total hot run time: 38011 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4401	4322	4326	4322
q2	382	270	274	270
q3	2883	2663	2600	2600
q4	1934	1658	1654	1654
q5	5585	5655	5709	5655
q6	249	144	141	141
q7	2235	1807	1842	1807
q8	3281	3458	3430	3430
q9	8850	8826	8844	8826
q10	3627	3413	3415	3413
q11	625	528	502	502
q12	831	682	656	656
q13	14406	3167	3327	3167
q14	327	308	298	298
q15	535	498	494	494
q16	597	575	580	575
q17	1826	1568	1527	1527
q18	8098	7831	7940	7831
q19	1745	1645	1694	1645
q20	2184	1893	1957	1893
q21	5821	5436	5443	5436
q22	1126	1034	1082	1034
Total cold run time: 71548 ms
Total hot run time: 57176 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192907 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 71fe6708e4570bcaff8d3cdc60c3b830cfac11fb, data reload: false

query1	1242	864	875	864
query2	6311	1997	2019	1997
query3	10731	3999	3976	3976
query4	60340	26489	23059	23059
query5	5348	505	506	505
query6	410	171	171	171
query7	5795	293	288	288
query8	283	202	197	197
query9	9059	2483	2484	2483
query10	479	270	255	255
query11	17619	14908	15388	14908
query12	145	104	101	101
query13	1558	429	411	411
query14	10085	7238	7289	7238
query15	249	166	182	166
query16	7620	454	512	454
query17	1114	586	572	572
query18	2028	293	292	292
query19	295	148	145	145
query20	125	107	110	107
query21	206	110	104	104
query22	4781	4756	4811	4756
query23	34279	33240	33389	33240
query24	5975	2851	2901	2851
query25	525	383	405	383
query26	694	162	176	162
query27	1771	279	278	278
query28	3717	2056	2041	2041
query29	692	404	409	404
query30	233	153	144	144
query31	953	765	737	737
query32	83	49	57	49
query33	463	286	276	276
query34	866	475	485	475
query35	831	751	718	718
query36	1043	923	944	923
query37	143	92	87	87
query38	3956	3914	3812	3812
query39	1585	1414	1390	1390
query40	197	113	111	111
query41	46	45	43	43
query42	111	99	94	94
query43	522	462	479	462
query44	1103	740	735	735
query45	205	168	163	163
query46	1082	748	728	728
query47	1935	1859	1843	1843
query48	387	310	301	301
query49	766	441	446	441
query50	800	442	442	442
query51	7025	6811	6758	6758
query52	100	88	86	86
query53	247	178	177	177
query54	576	471	456	456
query55	78	73	73	73
query56	292	268	266	266
query57	1207	1088	1067	1067
query58	226	255	247	247
query59	3080	2717	2800	2717
query60	289	276	277	276
query61	125	123	120	120
query62	735	667	665	665
query63	223	186	185	185
query64	2936	746	748	746
query65	3205	3179	3200	3179
query66	686	348	348	348
query67	15487	15304	15252	15252
query68	3072	596	609	596
query69	421	285	289	285
query70	1202	1164	1093	1093
query71	362	284	278	278
query72	6517	4103	4063	4063
query73	746	325	333	325
query74	9216	8795	8787	8787
query75	3343	2693	2709	2693
query76	1447	1055	1033	1033
query77	520	325	326	325
query78	9853	9364	9177	9177
query79	1071	530	540	530
query80	947	508	503	503
query81	561	240	233	233
query82	239	137	141	137
query83	168	144	146	144
query84	265	75	72	72
query85	938	279	278	278
query86	388	295	292	292
query87	4374	4254	4165	4165
query88	3117	2363	2363	2363
query89	382	288	289	288
query90	1825	193	187	187
query91	121	99	106	99
query92	60	50	48	48
query93	1468	537	545	537
query94	770	288	291	288
query95	343	250	254	250
query96	600	272	279	272
query97	3217	3093	3076	3076
query98	219	196	205	196
query99	1631	1323	1294	1294
Total cold run time: 306706 ms
Total hot run time: 192907 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.72 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 71fe6708e4570bcaff8d3cdc60c3b830cfac11fb, data reload: false

query1	0.04	0.05	0.04
query2	0.08	0.04	0.04
query3	0.22	0.05	0.05
query4	1.68	0.07	0.07
query5	0.51	0.49	0.50
query6	1.14	0.74	0.73
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.55	0.47	0.48
query10	0.53	0.55	0.54
query11	0.15	0.12	0.11
query12	0.14	0.12	0.12
query13	0.60	0.60	0.58
query14	2.05	2.11	2.05
query15	0.89	0.82	0.81
query16	0.37	0.36	0.39
query17	0.99	0.94	1.02
query18	0.21	0.20	0.20
query19	1.90	1.76	1.71
query20	0.02	0.01	0.00
query21	15.39	0.66	0.65
query22	3.64	7.88	1.82
query23	18.29	1.35	1.30
query24	2.12	0.23	0.21
query25	0.14	0.08	0.08
query26	0.27	0.18	0.18
query27	0.07	0.08	0.07
query28	13.29	1.03	1.00
query29	12.61	3.35	3.32
query30	0.24	0.05	0.05
query31	2.88	0.40	0.39
query32	3.25	0.50	0.47
query33	2.96	2.96	3.03
query34	17.13	4.37	4.48
query35	4.49	4.47	4.45
query36	0.65	0.46	0.50
query37	0.18	0.15	0.16
query38	0.15	0.15	0.15
query39	0.05	0.04	0.04
query40	0.15	0.12	0.13
query41	0.09	0.05	0.04
query42	0.06	0.04	0.05
query43	0.04	0.04	0.04
Total cold run time: 110.28 s
Total hot run time: 31.72 s

@Yukang-Lian Yukang-Lian force-pushed the Support-Bitmap-Empty-Default-Value branch from 71fe670 to 7d0d303 Compare September 6, 2024 09:29
@Yukang-Lian
Copy link
Collaborator Author

run buildall

Copy link
Contributor

github-actions bot commented Sep 6, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 38168 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7d0d303d720bb72df960c8271e09f1fbe1135f37, data reload: false

------ Round 1 ----------------------------------
q1	17678	4846	4402	4402
q2	2022	190	187	187
q3	11487	977	1063	977
q4	10529	775	700	700
q5	7764	2877	2852	2852
q6	224	136	134	134
q7	967	602	598	598
q8	9494	2106	2044	2044
q9	7338	6641	6671	6641
q10	7066	2214	2220	2214
q11	516	252	253	252
q12	455	224	229	224
q13	19003	3147	3134	3134
q14	283	252	258	252
q15	528	528	503	503
q16	1468	474	420	420
q17	1386	716	771	716
q18	7710	6769	6898	6769
q19	1490	1031	986	986
q20	701	319	348	319
q21	3874	3155	2840	2840
q22	1096	1004	1020	1004
Total cold run time: 113079 ms
Total hot run time: 38168 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4366	4237	4254	4237
q2	395	275	271	271
q3	2877	2644	2637	2637
q4	1948	1684	1741	1684
q5	5696	5696	5682	5682
q6	223	141	136	136
q7	2238	1799	1792	1792
q8	3291	3496	3448	3448
q9	8804	8760	8877	8760
q10	3604	3354	3298	3298
q11	624	518	489	489
q12	852	702	624	624
q13	15346	3246	3242	3242
q14	320	296	287	287
q15	534	467	518	467
q16	558	489	478	478
q17	1832	1544	1502	1502
q18	8060	7754	7987	7754
q19	1714	1663	1539	1539
q20	2145	1856	1898	1856
q21	5831	5619	5515	5515
q22	1095	1048	1012	1012
Total cold run time: 72353 ms
Total hot run time: 56710 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.85% (9381/25455)
Line Coverage: 28.22% (77338/274009)
Region Coverage: 27.63% (39953/144575)
Branch Coverage: 24.27% (20331/83786)
Coverage Report: http://coverage.selectdb-in.cc/coverage/7d0d303d720bb72df960c8271e09f1fbe1135f37_7d0d303d720bb72df960c8271e09f1fbe1135f37/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 192320 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7d0d303d720bb72df960c8271e09f1fbe1135f37, data reload: false

query1	1244	879	910	879
query2	6414	1922	1834	1834
query3	10630	4094	3859	3859
query4	59391	27109	23492	23492
query5	5393	500	494	494
query6	402	170	162	162
query7	5759	311	278	278
query8	309	220	208	208
query9	8918	2461	2437	2437
query10	501	272	263	263
query11	17988	15049	15454	15049
query12	154	101	101	101
query13	1562	386	387	386
query14	11110	7253	7195	7195
query15	264	176	172	172
query16	7442	452	491	452
query17	1113	555	582	555
query18	1380	289	309	289
query19	291	145	154	145
query20	127	119	109	109
query21	204	105	104	104
query22	4663	4440	4643	4440
query23	34255	33325	33571	33325
query24	5921	2920	2802	2802
query25	517	410	429	410
query26	684	149	151	149
query27	1809	272	275	272
query28	3752	2023	2002	2002
query29	635	390	400	390
query30	235	153	144	144
query31	921	731	769	731
query32	86	53	57	53
query33	439	283	282	282
query34	872	460	480	460
query35	833	723	706	706
query36	1053	926	948	926
query37	154	83	84	83
query38	3996	3898	3874	3874
query39	1440	1398	1439	1398
query40	193	114	108	108
query41	45	46	45	45
query42	116	98	95	95
query43	495	462	454	454
query44	1076	767	732	732
query45	195	166	164	164
query46	1099	713	744	713
query47	1919	1756	1811	1756
query48	378	292	286	286
query49	770	454	453	453
query50	833	404	412	404
query51	7023	6950	6813	6813
query52	99	87	88	87
query53	250	175	175	175
query54	564	459	459	459
query55	73	76	74	74
query56	284	265	273	265
query57	1190	1068	1061	1061
query58	233	248	235	235
query59	3061	2901	2830	2830
query60	308	272	278	272
query61	123	122	122	122
query62	752	641	662	641
query63	212	183	188	183
query64	2891	763	755	755
query65	3213	3127	3173	3127
query66	645	347	357	347
query67	15325	15302	15359	15302
query68	3351	578	568	568
query69	412	281	279	279
query70	1184	1096	1075	1075
query71	357	274	277	274
query72	6398	4063	3959	3959
query73	749	315	324	315
query74	9097	8892	8835	8835
query75	3357	2653	2680	2653
query76	1469	1101	991	991
query77	541	329	315	315
query78	9987	9102	8944	8944
query79	1077	528	524	524
query80	834	503	507	503
query81	486	228	222	222
query82	742	137	141	137
query83	169	151	150	150
query84	255	78	74	74
query85	776	287	275	275
query86	314	289	294	289
query87	4364	4365	4340	4340
query88	3009	2290	2278	2278
query89	377	283	283	283
query90	1894	189	182	182
query91	121	99	101	99
query92	63	47	52	47
query93	1296	526	527	526
query94	749	288	290	288
query95	335	253	250	250
query96	585	262	254	254
query97	3249	3117	3057	3057
query98	214	197	194	194
query99	1509	1290	1272	1272
Total cold run time: 305546 ms
Total hot run time: 192320 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.87 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7d0d303d720bb72df960c8271e09f1fbe1135f37, data reload: false

query1	0.05	0.04	0.04
query2	0.08	0.04	0.04
query3	0.22	0.05	0.05
query4	1.68	0.08	0.08
query5	0.54	0.51	0.50
query6	1.13	0.74	0.73
query7	0.02	0.02	0.01
query8	0.05	0.04	0.04
query9	0.54	0.49	0.48
query10	0.53	0.56	0.53
query11	0.15	0.11	0.11
query12	0.15	0.12	0.12
query13	0.62	0.59	0.58
query14	1.41	1.43	1.44
query15	0.88	0.84	0.84
query16	0.37	0.37	0.38
query17	1.00	1.05	1.06
query18	0.20	0.19	0.20
query19	1.84	1.75	1.85
query20	0.01	0.02	0.01
query21	15.40	0.65	0.65
query22	4.09	6.55	2.12
query23	18.27	1.44	1.33
query24	2.12	0.21	0.22
query25	0.14	0.09	0.08
query26	0.25	0.18	0.18
query27	0.08	0.07	0.08
query28	13.28	1.07	1.00
query29	12.68	3.34	3.36
query30	0.24	0.06	0.05
query31	2.87	0.40	0.40
query32	3.23	0.50	0.49
query33	2.98	3.02	3.04
query34	17.06	4.41	4.46
query35	4.47	4.49	4.50
query36	0.67	0.49	0.50
query37	0.19	0.15	0.15
query38	0.16	0.14	0.14
query39	0.05	0.04	0.04
query40	0.17	0.13	0.14
query41	0.09	0.05	0.05
query42	0.06	0.04	0.04
query43	0.05	0.04	0.05
Total cold run time: 110.07 s
Total hot run time: 31.87 s

@Yukang-Lian
Copy link
Collaborator Author

run feut

yiguolei pushed a commit that referenced this pull request Sep 9, 2024
…ap_empty default value (#40364)" (#40487)

## Proposed changes

Pick #40364 

<!--Describe your changes.-->
starocean999
starocean999 previously approved these changes Sep 9, 2024
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 9, 2024
Copy link
Contributor

github-actions bot commented Sep 9, 2024

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Sep 9, 2024

PR approved by anyone and no changes requested.

dataroaring
dataroaring previously approved these changes Sep 12, 2024
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Yukang-Lian Yukang-Lian force-pushed the Support-Bitmap-Empty-Default-Value branch from 7d0d303 to 4893338 Compare September 14, 2024 07:12
@Yukang-Lian
Copy link
Collaborator Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Sep 14, 2024
@doris-robot
Copy link

TPC-H: Total hot run time: 43094 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 489333853650a55d0ad66e84d3647db429733db7, data reload: false

------ Round 1 ----------------------------------
q1	17605	7393	7283	7283
q2	2066	184	182	182
q3	10470	1281	1399	1281
q4	10163	1026	1037	1026
q5	7744	3253	3178	3178
q6	245	153	149	149
q7	1061	671	630	630
q8	9467	2055	2037	2037
q9	6793	6331	6333	6331
q10	7041	2517	2479	2479
q11	431	252	251	251
q12	410	236	239	236
q13	17757	2996	3053	2996
q14	286	245	254	245
q15	584	530	514	514
q16	527	427	418	418
q17	1014	964	987	964
q18	7457	6776	6840	6776
q19	1379	1251	1237	1237
q20	626	355	341	341
q21	3949	3618	3552	3552
q22	1062	988	989	988
Total cold run time: 108137 ms
Total hot run time: 43094 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7206	7179	7210	7179
q2	350	238	242	238
q3	3089	3132	3052	3052
q4	2086	2191	2043	2043
q5	5694	5629	5797	5629
q6	244	150	149	149
q7	2193	1818	1807	1807
q8	3399	3427	3418	3418
q9	8919	8929	8850	8850
q10	3496	3596	3596	3596
q11	597	474	485	474
q12	807	624	630	624
q13	9408	3243	3206	3206
q14	326	278	290	278
q15	593	533	541	533
q16	526	479	473	473
q17	1829	1772	1737	1737
q18	8675	8045	8035	8035
q19	1812	1772	1763	1763
q20	2141	1905	1867	1867
q21	5941	5741	5710	5710
q22	1075	1031	1041	1031
Total cold run time: 70406 ms
Total hot run time: 61692 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.92% (9474/25663)
Line Coverage: 28.27% (77877/275495)
Region Coverage: 27.68% (40226/145321)
Branch Coverage: 24.29% (20437/84152)
Coverage Report: http://coverage.selectdb-in.cc/coverage/489333853650a55d0ad66e84d3647db429733db7_489333853650a55d0ad66e84d3647db429733db7/report/index.html

@Yukang-Lian
Copy link
Collaborator Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 42497 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 489333853650a55d0ad66e84d3647db429733db7, data reload: false

------ Round 1 ----------------------------------
q1	17628	7321	7722	7321
q2	2056	163	160	160
q3	11866	1272	1293	1272
q4	10330	819	938	819
q5	7849	3200	3208	3200
q6	227	152	151	151
q7	1017	642	602	602
q8	9748	2074	2093	2074
q9	6983	6517	6462	6462
q10	7075	2346	2333	2333
q11	425	252	250	250
q12	406	219	219	219
q13	17784	3008	3002	3002
q14	251	216	212	212
q15	579	518	505	505
q16	501	411	430	411
q17	976	943	929	929
q18	7295	6713	6873	6713
q19	1387	1238	1202	1202
q20	578	279	280	279
q21	3948	3396	3489	3396
q22	1119	1036	985	985
Total cold run time: 110028 ms
Total hot run time: 42497 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7132	7105	7172	7105
q2	345	238	237	237
q3	3011	3046	3125	3046
q4	2025	1977	1909	1909
q5	5597	5585	5676	5585
q6	228	142	142	142
q7	2157	1814	1770	1770
q8	3301	3447	3450	3447
q9	8773	8569	8754	8569
q10	3525	3497	3541	3497
q11	591	479	488	479
q12	815	611	575	575
q13	4585	3187	3214	3187
q14	328	293	278	278
q15	569	529	524	524
q16	520	474	473	473
q17	1797	1739	1736	1736
q18	8505	8188	8053	8053
q19	1738	1729	1772	1729
q20	2094	1819	1824	1819
q21	5392	5292	5289	5289
q22	1146	1074	1066	1066
Total cold run time: 64174 ms
Total hot run time: 60515 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.31% (9575/25663)
Line Coverage: 28.69% (79098/275696)
Region Coverage: 28.17% (40951/145368)
Branch Coverage: 24.79% (20872/84184)
Coverage Report: http://coverage.selectdb-in.cc/coverage/489333853650a55d0ad66e84d3647db429733db7_489333853650a55d0ad66e84d3647db429733db7/report/index.html

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 18, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@dataroaring dataroaring merged commit 979cb01 into apache:master Sep 19, 2024
25 of 30 checks passed
bobhan1 pushed a commit to bobhan1/doris that referenced this pull request Oct 14, 2024
… bitmap_empty default value (apache#40364)" (apache#40487)

Pick apache#40364

<!--Describe your changes.-->
@bobhan1 bobhan1 mentioned this pull request Oct 14, 2024
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 15, 2024
…update apache#39619

pick [opt](partial update) Remove unnecessary lock and refactor some code for partial update (apache#40062)

1. apache#34112 let partial update fetch
rowsets in the initialization of RowsetBuilder rather than flush phase.
So we can remove that tablet header lock.
2. refactor some partial update code

fix compile

pick [Fix](partial update) Fix __DORIS_SEQUENCE_COL__ is not set for newly inserted rows in partial update apache#40272

picks apache#40272

pick [Cherry-pick](branch-2.1) Pick "[Featrue](default value) Support bitmap_empty default value (apache#40364)" (apache#40487)

Pick apache#40364

<!--Describe your changes.-->

pick [Feature](partial update) Support flexible partial update in stream load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile

pick [branch-2.1] Picks "[opt](partial update) Allow to only specify key columns in partial update apache#40736" (apache#40863)

picks apache#40736

fix
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 15, 2024
…update apache#39619

pick [opt](partial update) Remove unnecessary lock and refactor some code for partial update (apache#40062)

1. apache#34112 let partial update fetch
rowsets in the initialization of RowsetBuilder rather than flush phase.
So we can remove that tablet header lock.
2. refactor some partial update code

fix compile

pick [Fix](partial update) Fix __DORIS_SEQUENCE_COL__ is not set for newly inserted rows in partial update apache#40272

picks apache#40272

pick [Cherry-pick](branch-2.1) Pick "[Featrue](default value) Support bitmap_empty default value (apache#40364)" (apache#40487)

Pick apache#40364

<!--Describe your changes.-->

pick [Feature](partial update) Support flexible partial update in stream load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile

pick [branch-2.1] Picks "[opt](partial update) Allow to only specify key columns in partial update apache#40736" (apache#40863)

picks apache#40736

fix
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 15, 2024
…update apache#39619

pick [opt](partial update) Remove unnecessary lock and refactor some code for partial update (apache#40062)

1. apache#34112 let partial update fetch
rowsets in the initialization of RowsetBuilder rather than flush phase.
So we can remove that tablet header lock.
2. refactor some partial update code

fix compile

pick [Fix](partial update) Fix __DORIS_SEQUENCE_COL__ is not set for newly inserted rows in partial update apache#40272

picks apache#40272

pick [Cherry-pick](branch-2.1) Pick "[Featrue](default value) Support bitmap_empty default value (apache#40364)" (apache#40487)

Pick apache#40364

<!--Describe your changes.-->

pick [Feature](partial update) Support flexible partial update in stream load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile

pick [branch-2.1] Picks "[opt](partial update) Allow to only specify key columns in partial update apache#40736" (apache#40863)

picks apache#40736

fix
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 15, 2024
…update apache#39619

pick [opt](partial update) Remove unnecessary lock and refactor some code for partial update (apache#40062)

1. apache#34112 let partial update fetch
rowsets in the initialization of RowsetBuilder rather than flush phase.
So we can remove that tablet header lock.
2. refactor some partial update code

fix compile

pick [Fix](partial update) Fix __DORIS_SEQUENCE_COL__ is not set for newly inserted rows in partial update apache#40272

picks apache#40272

pick [Cherry-pick](branch-2.1) Pick "[Featrue](default value) Support bitmap_empty default value (apache#40364)" (apache#40487)

Pick apache#40364

<!--Describe your changes.-->

pick [Feature](partial update) Support flexible partial update in stream load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile

pick [branch-2.1] Picks "[opt](partial update) Allow to only specify key columns in partial update apache#40736" (apache#40863)

picks apache#40736

fix
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 21, 2024
@yiguolei yiguolei mentioned this pull request Nov 6, 2024
sollhui added a commit to sollhui/doris that referenced this pull request Nov 28, 2024
@yiguolei yiguolei mentioned this pull request Jan 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.3-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants