Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](partial update) Extract some common logic in partial update #39619

Merged

Conversation

bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Aug 20, 2024

No description provided.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@bobhan1
Copy link
Contributor Author

bobhan1 commented Aug 20, 2024

run buildall

1 similar comment
@bobhan1
Copy link
Contributor Author

bobhan1 commented Aug 20, 2024

run buildall

@bobhan1 bobhan1 force-pushed the extract-common-logic-in-partial-update branch from 34728c6 to dcb8aec Compare August 20, 2024 10:52
@bobhan1
Copy link
Contributor Author

bobhan1 commented Aug 20, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38477 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit dcb8aecea4475c5a37b6051bab4b0c9f94905192, data reload: false

------ Round 1 ----------------------------------
q1	18376	4523	4381	4381
q2	2058	229	219	219
q3	11849	949	1196	949
q4	10523	884	739	739
q5	7797	2879	2858	2858
q6	264	159	163	159
q7	1033	668	655	655
q8	9434	2101	2139	2101
q9	7314	6593	6596	6593
q10	7073	2226	2284	2226
q11	513	285	278	278
q12	439	265	270	265
q13	18938	2965	2989	2965
q14	304	259	277	259
q15	580	548	528	528
q16	541	417	424	417
q17	1024	642	757	642
q18	7618	7006	6921	6921
q19	6938	1076	1071	1071
q20	715	366	358	358
q21	4148	2851	3060	2851
q22	1117	1058	1042	1042
Total cold run time: 118596 ms
Total hot run time: 38477 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4564	4292	4299	4292
q2	431	313	319	313
q3	2881	2715	2762	2715
q4	2064	1643	1779	1643
q5	5609	5769	5638	5638
q6	244	148	162	148
q7	2187	1761	1755	1755
q8	3335	3514	3482	3482
q9	8823	8757	8756	8756
q10	3578	3328	3289	3289
q11	645	524	531	524
q12	844	680	648	648
q13	17063	3207	3040	3040
q14	324	295	290	290
q15	560	549	543	543
q16	507	478	468	468
q17	1855	1577	1579	1577
q18	8245	7840	7616	7616
q19	9567	1530	1618	1530
q20	3433	1894	1884	1884
q21	12663	5280	5212	5212
q22	1235	1105	1086	1086
Total cold run time: 90657 ms
Total hot run time: 56449 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 196044 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit dcb8aecea4475c5a37b6051bab4b0c9f94905192, data reload: false

query1	1349	906	884	884
query2	6541	1987	1928	1928
query3	10641	3915	3796	3796
query4	55911	24341	23177	23177
query5	5739	708	740	708
query6	550	219	212	212
query7	6387	322	328	322
query8	551	436	437	436
query9	9071	2543	2547	2543
query10	587	343	339	339
query11	18195	15140	15122	15122
query12	229	139	150	139
query13	1698	431	435	431
query14	11771	7329	7458	7329
query15	309	199	199	199
query16	7651	529	523	523
query17	1176	610	608	608
query18	2137	350	335	335
query19	298	171	168	168
query20	147	134	141	134
query21	249	141	163	141
query22	4463	4419	4626	4419
query23	34527	33684	33727	33684
query24	5841	3055	2969	2969
query25	570	437	423	423
query26	714	182	180	180
query27	1816	318	316	316
query28	3925	2238	2175	2175
query29	726	466	446	446
query30	237	187	208	187
query31	1044	849	798	798
query32	105	79	77	77
query33	519	345	344	344
query34	892	497	505	497
query35	868	761	753	753
query36	1092	971	972	971
query37	156	103	105	103
query38	4095	3940	3813	3813
query39	1502	1460	1469	1460
query40	238	154	154	154
query41	140	136	137	136
query42	135	115	120	115
query43	545	486	501	486
query44	1102	777	805	777
query45	227	197	198	197
query46	1122	776	743	743
query47	1915	1805	1856	1805
query48	424	335	367	335
query49	916	593	586	586
query50	876	456	462	456
query51	7244	7223	6969	6969
query52	123	108	108	108
query53	305	222	222	222
query54	608	501	495	495
query55	92	90	88	88
query56	325	317	305	305
query57	1174	1131	1129	1129
query58	299	314	308	308
query59	3040	2790	2950	2790
query60	356	322	333	322
query61	152	148	148	148
query62	801	695	685	685
query63	254	226	237	226
query64	3383	1846	1856	1846
query65	3243	3173	3205	3173
query66	1044	678	693	678
query67	15306	15010	14986	14986
query68	6340	584	584	584
query69	560	360	332	332
query70	1203	1156	1172	1156
query71	533	312	311	311
query72	6780	2397	2058	2058
query73	817	356	359	356
query74	9337	8724	8814	8724
query75	3752	2757	2715	2715
query76	3386	1003	990	990
query77	657	444	432	432
query78	9971	9124	9064	9064
query79	2753	566	563	563
query80	2847	599	603	599
query81	623	261	259	259
query82	933	161	160	160
query83	377	214	212	212
query84	289	94	100	94
query85	1424	354	350	350
query86	499	330	311	311
query87	4355	4284	4177	4177
query88	4273	2459	2568	2459
query89	433	330	326	326
query90	2107	224	228	224
query91	157	126	123	123
query92	92	74	74	74
query93	3792	558	556	556
query94	1048	313	329	313
query95	390	294	300	294
query96	607	289	287	287
query97	3292	3078	3052	3052
query98	248	232	230	230
query99	1649	1319	1302	1302
Total cold run time: 325192 ms
Total hot run time: 196044 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.08 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit dcb8aecea4475c5a37b6051bab4b0c9f94905192, data reload: false

query1	0.05	0.04	0.04
query2	0.09	0.04	0.04
query3	0.22	0.05	0.05
query4	1.67	0.08	0.09
query5	0.52	0.51	0.50
query6	1.13	0.73	0.73
query7	0.03	0.02	0.02
query8	0.06	0.04	0.05
query9	0.55	0.50	0.50
query10	0.56	0.54	0.55
query11	0.16	0.13	0.12
query12	0.15	0.13	0.13
query13	0.62	0.59	0.59
query14	0.76	0.80	0.79
query15	0.84	0.82	0.82
query16	0.37	0.39	0.40
query17	0.99	0.99	0.96
query18	0.22	0.20	0.22
query19	1.77	1.68	1.70
query20	0.01	0.01	0.01
query21	15.41	0.68	0.67
query22	4.29	6.26	2.06
query23	18.33	1.36	1.27
query24	2.08	0.23	0.22
query25	0.15	0.09	0.08
query26	0.26	0.19	0.19
query27	0.08	0.08	0.08
query28	13.30	1.02	1.00
query29	12.63	3.31	3.24
query30	0.43	0.25	0.24
query31	2.82	0.40	0.41
query32	3.24	0.48	0.47
query33	2.97	3.01	2.98
query34	16.90	4.36	4.35
query35	4.43	4.46	4.43
query36	0.66	0.48	0.47
query37	0.21	0.17	0.17
query38	0.17	0.16	0.18
query39	0.06	0.05	0.05
query40	0.18	0.15	0.15
query41	0.11	0.06	0.07
query42	0.07	0.07	0.06
query43	0.07	0.06	0.05
Total cold run time: 109.62 s
Total hot run time: 31.08 s

@bobhan1 bobhan1 force-pushed the extract-common-logic-in-partial-update branch from 33dcc08 to cb3ac5a Compare August 21, 2024 00:45
@bobhan1
Copy link
Contributor Author

bobhan1 commented Aug 21, 2024

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 37895 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit cb3ac5a009b4de30a6b5177f4d8bb611c5b32e6d, data reload: false

------ Round 1 ----------------------------------
q1	17864	4355	4332	4332
q2	2063	216	209	209
q3	11830	999	1110	999
q4	10522	696	751	696
q5	7763	2852	2840	2840
q6	267	162	161	161
q7	1033	659	648	648
q8	9602	2102	2132	2102
q9	8635	6513	6564	6513
q10	7095	2191	2254	2191
q11	495	277	271	271
q12	429	266	263	263
q13	17771	3007	3008	3007
q14	314	262	266	262
q15	568	538	512	512
q16	518	403	413	403
q17	1007	703	677	677
q18	7292	6712	6728	6712
q19	6977	1052	1129	1052
q20	707	354	342	342
q21	3917	2705	2915	2705
q22	1099	1019	998	998
Total cold run time: 117768 ms
Total hot run time: 37895 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4531	4353	4249	4249
q2	400	303	287	287
q3	2875	2667	2688	2667
q4	2008	1744	1696	1696
q5	5668	5754	5899	5754
q6	247	153	153	153
q7	2189	1786	1783	1783
q8	3348	3472	3441	3441
q9	8869	8828	8772	8772
q10	3620	3340	3308	3308
q11	633	513	530	513
q12	838	662	657	657
q13	15937	3178	3255	3178
q14	334	297	297	297
q15	548	527	531	527
q16	505	448	452	448
q17	1802	1557	1557	1557
q18	8372	7956	7677	7677
q19	9041	1456	1510	1456
q20	2111	1885	1894	1885
q21	13860	5445	5220	5220
q22	1158	1049	1045	1045
Total cold run time: 88894 ms
Total hot run time: 56570 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197666 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit cb3ac5a009b4de30a6b5177f4d8bb611c5b32e6d, data reload: false

query1	1346	928	877	877
query2	6592	1973	1969	1969
query3	10640	3787	3769	3769
query4	58034	24927	23649	23649
query5	6015	730	726	726
query6	546	211	211	211
query7	6472	341	336	336
query8	561	450	443	443
query9	9436	2538	2518	2518
query10	600	351	325	325
query11	18592	15108	15510	15108
query12	219	142	141	141
query13	1720	445	455	445
query14	12279	7429	7396	7396
query15	268	195	195	195
query16	7613	506	519	506
query17	1156	626	622	622
query18	2152	347	350	347
query19	319	165	171	165
query20	155	148	140	140
query21	251	141	144	141
query22	4700	4480	4230	4230
query23	34372	34036	33863	33863
query24	5644	3057	2945	2945
query25	585	423	438	423
query26	721	182	184	182
query27	1762	316	323	316
query28	3864	2192	2166	2166
query29	770	458	457	457
query30	236	202	200	200
query31	1039	852	830	830
query32	100	81	81	81
query33	507	356	350	350
query34	934	511	509	509
query35	880	775	771	771
query36	1117	979	973	973
query37	152	101	103	101
query38	4023	3980	3951	3951
query39	1542	1478	1490	1478
query40	232	155	152	152
query41	139	136	138	136
query42	134	118	114	114
query43	548	501	513	501
query44	1111	787	786	786
query45	224	199	195	195
query46	1122	774	776	774
query47	1883	1835	1843	1835
query48	410	337	349	337
query49	916	580	581	580
query50	870	467	468	467
query51	7250	7239	7136	7136
query52	126	110	105	105
query53	301	227	224	224
query54	610	501	520	501
query55	93	88	86	86
query56	341	309	322	309
query57	1189	1122	1124	1122
query58	302	302	298	298
query59	2994	2960	2987	2960
query60	380	320	327	320
query61	157	146	148	146
query62	816	685	703	685
query63	264	227	229	227
query64	3407	1874	1885	1874
query65	3274	3189	3182	3182
query66	1054	723	664	664
query67	15356	14896	15058	14896
query68	6665	591	588	588
query69	620	430	340	340
query70	1216	1205	1149	1149
query71	533	314	320	314
query72	6755	2358	2075	2075
query73	837	355	353	353
query74	9485	8835	8831	8831
query75	3728	2751	2780	2751
query76	3945	1133	1034	1034
query77	844	440	441	440
query78	9912	9264	10165	9264
query79	6026	566	557	557
query80	1459	607	609	607
query81	592	261	260	260
query82	540	156	156	156
query83	362	216	219	216
query84	298	97	102	97
query85	980	376	365	365
query86	406	316	327	316
query87	4414	4215	4293	4215
query88	3728	2475	2467	2467
query89	479	317	319	317
query90	2036	226	226	226
query91	153	128	124	124
query92	86	75	76	75
query93	2446	552	554	552
query94	770	330	326	326
query95	379	299	298	298
query96	602	288	282	282
query97	3248	3067	3088	3067
query98	251	229	222	222
query99	1824	1336	1315	1315
Total cold run time: 329036 ms
Total hot run time: 197666 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.3 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit cb3ac5a009b4de30a6b5177f4d8bb611c5b32e6d, data reload: false

query1	0.05	0.04	0.04
query2	0.09	0.05	0.05
query3	0.23	0.05	0.06
query4	1.67	0.10	0.09
query5	0.54	0.51	0.51
query6	1.14	0.73	0.74
query7	0.02	0.01	0.02
query8	0.06	0.05	0.05
query9	0.56	0.48	0.49
query10	0.54	0.54	0.54
query11	0.16	0.12	0.12
query12	0.16	0.13	0.14
query13	0.67	0.59	0.59
query14	0.77	0.79	0.80
query15	0.84	0.84	0.84
query16	0.38	0.38	0.37
query17	0.99	1.04	1.07
query18	0.21	0.21	0.20
query19	1.81	1.68	1.78
query20	0.02	0.01	0.01
query21	15.42	0.67	0.65
query22	4.09	7.80	2.03
query23	18.28	1.37	1.35
query24	2.09	0.23	0.22
query25	0.15	0.09	0.09
query26	0.28	0.19	0.19
query27	0.08	0.09	0.08
query28	13.22	1.04	1.01
query29	12.67	3.33	3.36
query30	0.43	0.25	0.24
query31	2.82	0.41	0.40
query32	3.24	0.49	0.49
query33	2.92	2.95	2.95
query34	16.99	4.31	4.34
query35	4.41	4.42	4.42
query36	0.67	0.49	0.50
query37	0.22	0.18	0.17
query38	0.18	0.17	0.17
query39	0.06	0.06	0.06
query40	0.18	0.15	0.14
query41	0.12	0.08	0.07
query42	0.08	0.07	0.06
query43	0.06	0.06	0.06
Total cold run time: 109.57 s
Total hot run time: 31.3 s

Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,nice work, plz add some description for this PR

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 21, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bobhan1
Copy link
Contributor Author

bobhan1 commented Aug 21, 2024

run p0

@zhannngchen zhannngchen merged commit 8fa411f into apache:master Aug 21, 2024
27 of 30 checks passed
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 14, 2024
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 14, 2024
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 14, 2024
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 14, 2024
@bobhan1 bobhan1 mentioned this pull request Oct 14, 2024
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 15, 2024
…update apache#39619

pick [opt](partial update) Remove unnecessary lock and refactor some code for partial update (apache#40062)

1. apache#34112 let partial update fetch
rowsets in the initialization of RowsetBuilder rather than flush phase.
So we can remove that tablet header lock.
2. refactor some partial update code

fix compile

pick [Fix](partial update) Fix __DORIS_SEQUENCE_COL__ is not set for newly inserted rows in partial update apache#40272

picks apache#40272

pick [Cherry-pick](branch-2.1) Pick "[Featrue](default value) Support bitmap_empty default value (apache#40364)" (apache#40487)

Pick apache#40364

<!--Describe your changes.-->

pick [Feature](partial update) Support flexible partial update in stream load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile

pick [branch-2.1] Picks "[opt](partial update) Allow to only specify key columns in partial update apache#40736" (apache#40863)

picks apache#40736

fix
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 15, 2024
…update apache#39619

pick [opt](partial update) Remove unnecessary lock and refactor some code for partial update (apache#40062)

1. apache#34112 let partial update fetch
rowsets in the initialization of RowsetBuilder rather than flush phase.
So we can remove that tablet header lock.
2. refactor some partial update code

fix compile

pick [Fix](partial update) Fix __DORIS_SEQUENCE_COL__ is not set for newly inserted rows in partial update apache#40272

picks apache#40272

pick [Cherry-pick](branch-2.1) Pick "[Featrue](default value) Support bitmap_empty default value (apache#40364)" (apache#40487)

Pick apache#40364

<!--Describe your changes.-->

pick [Feature](partial update) Support flexible partial update in stream load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile

pick [branch-2.1] Picks "[opt](partial update) Allow to only specify key columns in partial update apache#40736" (apache#40863)

picks apache#40736

fix
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 15, 2024
…update apache#39619

pick [opt](partial update) Remove unnecessary lock and refactor some code for partial update (apache#40062)

1. apache#34112 let partial update fetch
rowsets in the initialization of RowsetBuilder rather than flush phase.
So we can remove that tablet header lock.
2. refactor some partial update code

fix compile

pick [Fix](partial update) Fix __DORIS_SEQUENCE_COL__ is not set for newly inserted rows in partial update apache#40272

picks apache#40272

pick [Cherry-pick](branch-2.1) Pick "[Featrue](default value) Support bitmap_empty default value (apache#40364)" (apache#40487)

Pick apache#40364

<!--Describe your changes.-->

pick [Feature](partial update) Support flexible partial update in stream load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile

pick [branch-2.1] Picks "[opt](partial update) Allow to only specify key columns in partial update apache#40736" (apache#40863)

picks apache#40736

fix
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 15, 2024
…update apache#39619

pick [opt](partial update) Remove unnecessary lock and refactor some code for partial update (apache#40062)

1. apache#34112 let partial update fetch
rowsets in the initialization of RowsetBuilder rather than flush phase.
So we can remove that tablet header lock.
2. refactor some partial update code

fix compile

pick [Fix](partial update) Fix __DORIS_SEQUENCE_COL__ is not set for newly inserted rows in partial update apache#40272

picks apache#40272

pick [Cherry-pick](branch-2.1) Pick "[Featrue](default value) Support bitmap_empty default value (apache#40364)" (apache#40487)

Pick apache#40364

<!--Describe your changes.-->

pick [Feature](partial update) Support flexible partial update in stream load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile

pick [branch-2.1] Picks "[opt](partial update) Allow to only specify key columns in partial update apache#40736" (apache#40863)

picks apache#40736

fix
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.3-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants