Skip to content

Conversation

@gnehil
Copy link
Contributor

@gnehil gnehil commented Dec 25, 2024

What problem does this PR solve?

Issue Number: close #47932

Related PR: #xxx

Problem Summary:

Ingestion Load is used to load pre-processed data into doris.

Preprocessing refers to writing the result data to an external storage system after the data is processed according to the partitioning, bucketing and aggregation methods defined by the doris table.

The preprocessing is completed by the external system, and then the BE reads the data and converts it into segment files and saves it.

The basic flow is as follows:
ingestion_load

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Dec 25, 2024

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@JNSimba
Copy link
Member

JNSimba commented Dec 25, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32378 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5d05ad3dbc025f235ec41b42bd5fe0f23bf296e7, data reload: false

------ Round 1 ----------------------------------
q1	17643	6139	6025	6025
q2	2039	299	170	170
q3	10422	1249	689	689
q4	10222	855	435	435
q5	7517	2190	1964	1964
q6	208	181	144	144
q7	881	739	609	609
q8	9224	1364	1208	1208
q9	5432	4872	4932	4872
q10	6754	2297	1851	1851
q11	464	274	253	253
q12	347	366	230	230
q13	17772	3554	2970	2970
q14	233	232	218	218
q15	546	522	501	501
q16	637	619	607	607
q17	563	861	316	316
q18	7083	6488	6265	6265
q19	1243	984	559	559
q20	310	314	191	191
q21	2800	2145	1982	1982
q22	360	326	319	319
Total cold run time: 102700 ms
Total hot run time: 32378 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6231	6206	6260	6206
q2	233	331	229	229
q3	2219	2659	2301	2301
q4	1407	1849	1339	1339
q5	4350	4728	4655	4655
q6	183	171	135	135
q7	2017	1820	1762	1762
q8	2468	2725	2595	2595
q9	6931	6865	6895	6865
q10	2973	3238	2680	2680
q11	591	513	493	493
q12	622	715	581	581
q13	3314	3592	2963	2963
q14	274	297	276	276
q15	554	498	496	496
q16	637	688	629	629
q17	1185	1703	1210	1210
q18	7245	7106	7035	7035
q19	820	1136	1004	1004
q20	1929	1980	1819	1819
q21	5348	4935	4786	4786
q22	593	622	597	597
Total cold run time: 52124 ms
Total hot run time: 50656 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.80% (10093/26016)
Line Coverage: 29.79% (85182/285958)
Region Coverage: 28.91% (43501/150453)
Branch Coverage: 25.45% (22172/87136)
Coverage Report: http://coverage.selectdb-in.cc/coverage/5d05ad3dbc025f235ec41b42bd5fe0f23bf296e7_5d05ad3dbc025f235ec41b42bd5fe0f23bf296e7/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 189506 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5d05ad3dbc025f235ec41b42bd5fe0f23bf296e7, data reload: false

query1	965	369	376	369
query2	6512	2382	2294	2294
query3	6699	208	204	204
query4	33814	23945	23677	23677
query5	4356	646	455	455
query6	281	202	203	202
query7	4622	492	297	297
query8	295	250	240	240
query9	9593	2740	2727	2727
query10	463	303	242	242
query11	18176	15323	15065	15065
query12	168	107	109	107
query13	1683	545	416	416
query14	12185	7105	6685	6685
query15	243	191	190	190
query16	8142	583	443	443
query17	1563	736	569	569
query18	2091	429	283	283
query19	220	183	144	144
query20	137	112	110	110
query21	209	145	102	102
query22	4394	4296	4365	4296
query23	34589	33391	33173	33173
query24	6354	2329	2283	2283
query25	481	448	382	382
query26	907	268	150	150
query27	1989	447	324	324
query28	5454	2460	2463	2460
query29	636	533	406	406
query30	231	183	147	147
query31	971	886	809	809
query32	72	83	57	57
query33	492	347	283	283
query34	733	826	502	502
query35	781	853	732	732
query36	966	1031	904	904
query37	121	98	76	76
query38	4327	4224	4228	4224
query39	1496	1458	1449	1449
query40	210	110	101	101
query41	46	45	45	45
query42	120	102	103	102
query43	502	536	509	509
query44	1285	794	803	794
query45	179	178	173	173
query46	840	1027	654	654
query47	1871	1917	1832	1832
query48	403	400	315	315
query49	726	469	375	375
query50	615	656	386	386
query51	7225	7078	7196	7078
query52	109	103	91	91
query53	218	256	182	182
query54	491	486	396	396
query55	79	75	80	75
query56	257	261	260	260
query57	1205	1150	1130	1130
query58	231	231	222	222
query59	3271	3222	2987	2987
query60	264	259	238	238
query61	114	104	120	104
query62	879	790	748	748
query63	227	190	196	190
query64	3484	1011	673	673
query65	3315	3198	3218	3198
query66	955	416	305	305
query67	16087	15685	15486	15486
query68	9466	765	519	519
query69	475	286	258	258
query70	1204	1118	1112	1112
query71	441	277	263	263
query72	5841	3833	3816	3816
query73	836	746	364	364
query74	10202	9428	8854	8854
query75	4727	3169	2635	2635
query76	5637	1192	757	757
query77	1015	353	272	272
query78	9954	10201	9389	9389
query79	5868	876	578	578
query80	712	521	414	414
query81	473	276	224	224
query82	224	148	115	115
query83	195	164	148	148
query84	287	92	69	69
query85	804	359	355	355
query86	349	321	305	305
query87	4555	4383	4623	4383
query88	3606	2229	2216	2216
query89	438	343	301	301
query90	2069	190	187	187
query91	128	138	107	107
query92	68	56	55	55
query93	3501	889	527	527
query94	662	394	290	290
query95	338	263	250	250
query96	488	609	291	291
query97	2721	2835	2708	2708
query98	225	199	202	199
query99	1663	1559	1418	1418
Total cold run time: 301791 ms
Total hot run time: 189506 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.49 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5d05ad3dbc025f235ec41b42bd5fe0f23bf296e7, data reload: false

query1	0.03	0.04	0.05
query2	0.07	0.03	0.03
query3	0.24	0.07	0.07
query4	1.60	0.11	0.11
query5	0.46	0.41	0.40
query6	1.13	0.65	0.65
query7	0.02	0.01	0.02
query8	0.04	0.04	0.04
query9	0.57	0.54	0.50
query10	0.55	0.58	0.56
query11	0.15	0.10	0.10
query12	0.14	0.11	0.11
query13	0.60	0.62	0.60
query14	2.76	2.78	2.84
query15	0.90	0.84	0.81
query16	0.37	0.37	0.37
query17	0.97	1.04	1.02
query18	0.22	0.21	0.21
query19	1.95	1.73	2.03
query20	0.02	0.01	0.01
query21	15.36	0.95	0.60
query22	0.74	0.91	0.69
query23	15.21	1.46	0.59
query24	3.35	1.18	1.76
query25	0.15	0.28	0.11
query26	0.32	0.14	0.14
query27	0.05	0.06	0.04
query28	13.98	1.46	1.05
query29	12.57	3.88	3.23
query30	0.25	0.08	0.06
query31	2.88	0.60	0.38
query32	3.23	0.54	0.46
query33	3.26	3.12	3.10
query34	16.83	5.07	4.49
query35	4.49	4.43	4.50
query36	0.65	0.49	0.48
query37	0.10	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.03
query40	0.16	0.14	0.12
query41	0.08	0.02	0.02
query42	0.03	0.02	0.04
query43	0.04	0.03	0.03
Total cold run time: 106.6 s
Total hot run time: 31.49 s

@gnehil
Copy link
Contributor Author

gnehil commented Dec 26, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32528 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit dcae68568a3a17c620c4d6c8bf5b81cfdaae79db, data reload: false

------ Round 1 ----------------------------------
q1	17592	6097	6027	6027
q2	2047	295	166	166
q3	10425	1247	722	722
q4	10197	878	439	439
q5	7526	2210	2052	2052
q6	205	177	147	147
q7	905	751	606	606
q8	9238	1395	1175	1175
q9	5351	4899	4873	4873
q10	6762	2313	1873	1873
q11	469	280	263	263
q12	355	359	217	217
q13	17768	3566	2943	2943
q14	231	234	219	219
q15	582	522	506	506
q16	640	624	601	601
q17	561	848	326	326
q18	6962	6575	6418	6418
q19	1228	966	539	539
q20	303	308	185	185
q21	2789	2191	1930	1930
q22	359	337	301	301
Total cold run time: 102495 ms
Total hot run time: 32528 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6171	6195	6183	6183
q2	233	337	231	231
q3	2303	2705	2297	2297
q4	1442	1854	1363	1363
q5	4346	4748	4632	4632
q6	179	169	143	143
q7	1982	1876	1736	1736
q8	2486	2714	2581	2581
q9	6986	6908	6880	6880
q10	2947	3205	2676	2676
q11	573	518	487	487
q12	655	728	562	562
q13	3164	3672	3011	3011
q14	265	296	281	281
q15	571	501	495	495
q16	640	682	630	630
q17	1186	1686	1210	1210
q18	7278	7317	7078	7078
q19	761	1070	1024	1024
q20	1933	2004	1815	1815
q21	5405	5120	4830	4830
q22	588	628	546	546
Total cold run time: 52094 ms
Total hot run time: 50691 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190941 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit dcae68568a3a17c620c4d6c8bf5b81cfdaae79db, data reload: false

query1	979	388	368	368
query2	6534	2400	2341	2341
query3	6708	212	211	211
query4	33221	24112	23359	23359
query5	4977	633	438	438
query6	281	203	178	178
query7	4638	495	315	315
query8	318	249	240	240
query9	9738	2768	2765	2765
query10	504	317	259	259
query11	18612	15385	15564	15385
query12	165	106	105	105
query13	1692	527	417	417
query14	11948	7189	6959	6959
query15	261	186	185	185
query16	8128	592	445	445
query17	1547	762	578	578
query18	2109	429	289	289
query19	213	179	148	148
query20	119	120	113	113
query21	207	126	101	101
query22	4299	4261	4380	4261
query23	34476	33461	33543	33461
query24	5975	2156	2201	2156
query25	490	444	374	374
query26	865	273	152	152
query27	2037	452	323	323
query28	5343	2494	2493	2493
query29	709	529	411	411
query30	230	179	156	156
query31	1027	902	851	851
query32	88	59	60	59
query33	497	346	293	293
query34	760	836	509	509
query35	772	817	724	724
query36	1004	1052	961	961
query37	113	96	78	78
query38	4307	4422	4307	4307
query39	1479	1428	1437	1428
query40	201	113	100	100
query41	46	49	47	47
query42	117	103	107	103
query43	509	521	495	495
query44	1273	812	803	803
query45	183	176	169	169
query46	855	1033	643	643
query47	1890	1923	1850	1850
query48	378	406	355	355
query49	723	474	399	399
query50	612	643	403	403
query51	7044	7212	7164	7164
query52	98	102	91	91
query53	215	248	190	190
query54	479	488	407	407
query55	77	76	75	75
query56	247	253	245	245
query57	1165	1172	1093	1093
query58	233	221	218	218
query59	2969	3077	2974	2974
query60	290	275	252	252
query61	111	111	111	111
query62	911	779	744	744
query63	223	194	181	181
query64	3780	1004	718	718
query65	3260	3173	3234	3173
query66	887	410	309	309
query67	15928	15823	15607	15607
query68	9081	772	531	531
query69	485	290	243	243
query70	1259	1135	1120	1120
query71	443	277	261	261
query72	5911	3856	3742	3742
query73	661	738	352	352
query74	9440	9010	8956	8956
query75	4694	3140	2718	2718
query76	4881	1169	768	768
query77	872	367	270	270
query78	10030	10145	9843	9843
query79	5500	884	578	578
query80	706	526	419	419
query81	476	275	232	232
query82	575	150	119	119
query83	192	169	146	146
query84	285	88	86	86
query85	783	355	310	310
query86	356	374	301	301
query87	4384	4455	4310	4310
query88	3853	2225	2219	2219
query89	408	338	298	298
query90	1928	185	182	182
query91	135	138	109	109
query92	71	56	56	56
query93	2221	889	537	537
query94	665	395	286	286
query95	340	265	244	244
query96	489	607	280	280
query97	2742	2815	2665	2665
query98	226	211	197	197
query99	1692	1554	1448	1448
Total cold run time: 297858 ms
Total hot run time: 190941 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.83% (10103/26020)
Line Coverage: 29.84% (85358/286070)
Region Coverage: 28.97% (43606/150524)
Branch Coverage: 25.51% (22239/87184)
Coverage Report: http://coverage.selectdb-in.cc/coverage/dcae68568a3a17c620c4d6c8bf5b81cfdaae79db_dcae68568a3a17c620c4d6c8bf5b81cfdaae79db/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 31.55 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit dcae68568a3a17c620c4d6c8bf5b81cfdaae79db, data reload: false

query1	0.03	0.03	0.03
query2	0.08	0.03	0.03
query3	0.23	0.06	0.07
query4	1.61	0.10	0.10
query5	0.43	0.40	0.40
query6	1.16	0.65	0.65
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.56	0.52	0.49
query10	0.55	0.61	0.56
query11	0.14	0.10	0.11
query12	0.15	0.11	0.11
query13	0.61	0.62	0.59
query14	2.84	2.77	2.88
query15	0.89	0.83	0.83
query16	0.38	0.39	0.39
query17	1.06	1.06	1.01
query18	0.23	0.21	0.22
query19	1.95	1.83	2.03
query20	0.02	0.01	0.01
query21	15.36	0.93	0.57
query22	0.76	0.85	0.66
query23	15.23	1.46	0.58
query24	2.94	1.19	1.98
query25	0.27	0.08	0.10
query26	0.24	0.16	0.14
query27	0.05	0.05	0.05
query28	14.24	1.49	1.05
query29	12.60	3.90	3.21
query30	0.25	0.10	0.06
query31	2.80	0.62	0.39
query32	3.27	0.53	0.46
query33	3.06	3.12	3.13
query34	16.69	5.10	4.50
query35	4.51	4.48	4.47
query36	0.66	0.51	0.50
query37	0.09	0.07	0.06
query38	0.04	0.04	0.03
query39	0.03	0.02	0.03
query40	0.18	0.13	0.13
query41	0.08	0.03	0.02
query42	0.04	0.02	0.03
query43	0.04	0.02	0.03
Total cold run time: 106.41 s
Total hot run time: 31.55 s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 26, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 6580f6b into apache:master Dec 27, 2024
24 of 26 checks passed
github-actions bot pushed a commit that referenced this pull request Dec 27, 2024
### What problem does this PR solve?


Problem Summary:

Ingestion Load is used to load pre-processed data into doris.

Preprocessing refers to writing the result data to an external storage
system after the data is processed according to the partitioning,
bucketing and aggregation methods defined by the doris table.

The preprocessing is completed by the external system, and then the BE
reads the data and converts it into segment files and saves it.

The basic flow is as follows:

![ingestion_load](https://github.com/apache/doris/assets/30104232/aa468cd4-90bf-4d9d-b69b-0425b66b15f4)


### Release note
[feature](load) new insgestion load
@gnehil gnehil mentioned this pull request Feb 14, 2025
3 tasks
gnehil added a commit to gnehil/doris that referenced this pull request Jul 14, 2025
### What problem does this PR solve?

Problem Summary:

Ingestion Load is used to load pre-processed data into doris.

Preprocessing refers to writing the result data to an external storage
system after the data is processed according to the partitioning,
bucketing and aggregation methods defined by the doris table.

The preprocessing is completed by the external system, and then the BE
reads the data and converts it into segment files and saves it.

The basic flow is as follows:

![ingestion_load](https://github.com/apache/doris/assets/30104232/aa468cd4-90bf-4d9d-b69b-0425b66b15f4)

### Release note
[feature](load) new insgestion load

(cherry picked from commit 6580f6b)
gnehil added a commit to gnehil/doris that referenced this pull request Jul 16, 2025
### What problem does this PR solve?

Problem Summary:

Ingestion Load is used to load pre-processed data into doris.

Preprocessing refers to writing the result data to an external storage
system after the data is processed according to the partitioning,
bucketing and aggregation methods defined by the doris table.

The preprocessing is completed by the external system, and then the BE
reads the data and converts it into segment files and saves it.

The basic flow is as follows:

![ingestion_load](https://github.com/apache/doris/assets/30104232/aa468cd4-90bf-4d9d-b69b-0425b66b15f4)

### Release note
[feature](load) new insgestion load

(cherry picked from commit 6580f6b)
morningman pushed a commit to morningman/doris that referenced this pull request Jul 19, 2025
Problem Summary:

Ingestion Load is used to load pre-processed data into doris.

Preprocessing refers to writing the result data to an external storage
system after the data is processed according to the partitioning,
bucketing and aggregation methods defined by the doris table.

The preprocessing is completed by the external system, and then the BE
reads the data and converts it into segment files and saves it.

The basic flow is as follows:

![ingestion_load](https://github.com/apache/doris/assets/30104232/aa468cd4-90bf-4d9d-b69b-0425b66b15f4)

[feature](load) new insgestion load
morrySnow pushed a commit that referenced this pull request Jul 21, 2025
bp #45937

---------

Co-authored-by: gnehil <liheng@selectdb.com>
morrySnow pushed a commit that referenced this pull request Sep 4, 2025
…55500)

### What problem does this PR solve?
Related PR: #45937

Problem Summary:
Fix the error case on ingestion load and the core in parquet reader.

==8898==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62f0020603fc at pc 0x55f634e64ded bp 0x7fba0d03c410 sp 0x7fba0d03bbd8
READ of size 4 at 0x62f0020603fc thread T768 (PUSH-9699)
    #0 0x55f634e64dec in __asan_memcpy (/mnt/hdd01/ci/doris-deploy-branch-3.1-local/be/lib/doris_be+0x39a24dec) (BuildId: 9b04e7f7d3075dac)
    #1 0x55f634eca93f in std::char_traits::copy(char*, char const*, unsigned long) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/char_traits.h:409:33
    #2 0x55f634eca93f in std::__cxx11::basic_string, std::allocator>::_S_copy(char*, char const*, unsigned long) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:351:4
    #3 0x55f634eca93f in std::__cxx11::basic_string, std::allocator>::_S_copy_chars(char*, char const*, char const*) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:398:9
    #4 0x55f634eca93f in void std::__cxx11::basic_string, std::allocator>::_M_construct(char const*, char const*, std::forward_iterator_tag) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.tcc:225:6
    #5 0x55f654a4f74d in void std::__cxx11::basic_string, std::allocator>::_M_construct_aux(char const*, char const*, std::__false_type) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:247:11
    #6 0x55f654a4f74d in void std::__cxx11::basic_string, std::allocator>::_M_construct(char const*, char const*) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:266:4
    #7 0x55f654a4f74d in std::__cxx11::basic_string, std::allocator>::basic_string(char const*, unsigned long, std::allocator const&) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:513:9
    #8 0x55f654a4f74d in doris::vectorized::parse_thrift_footer(std::shared_ptr, doris::vectorized::FileMetaData**, unsigned long*, doris::io::IOContext*) /home/zcp/repo_center/doris_branch-3.1/doris/be/src/vec/exec/format/parquet/parquet_thrift_util.h:55:17
hubgeter added a commit to hubgeter/doris that referenced this pull request Sep 11, 2025
…pache#55500)

Related PR: apache#45937

Problem Summary:
Fix the error case on ingestion load and the core in parquet reader.

==8898==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62f0020603fc at pc 0x55f634e64ded bp 0x7fba0d03c410 sp 0x7fba0d03bbd8
READ of size 4 at 0x62f0020603fc thread T768 (PUSH-9699)
    #0 0x55f634e64dec in __asan_memcpy (/mnt/hdd01/ci/doris-deploy-branch-3.1-local/be/lib/doris_be+0x39a24dec) (BuildId: 9b04e7f7d3075dac)
    apache#1 0x55f634eca93f in std::char_traits::copy(char*, char const*, unsigned long) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/char_traits.h:409:33
    apache#2 0x55f634eca93f in std::__cxx11::basic_string, std::allocator>::_S_copy(char*, char const*, unsigned long) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:351:4
    apache#3 0x55f634eca93f in std::__cxx11::basic_string, std::allocator>::_S_copy_chars(char*, char const*, char const*) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:398:9
    apache#4 0x55f634eca93f in void std::__cxx11::basic_string, std::allocator>::_M_construct(char const*, char const*, std::forward_iterator_tag) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.tcc:225:6
    apache#5 0x55f654a4f74d in void std::__cxx11::basic_string, std::allocator>::_M_construct_aux(char const*, char const*, std::__false_type) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:247:11
    apache#6 0x55f654a4f74d in void std::__cxx11::basic_string, std::allocator>::_M_construct(char const*, char const*) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:266:4
    apache#7 0x55f654a4f74d in std::__cxx11::basic_string, std::allocator>::basic_string(char const*, unsigned long, std::allocator const&) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:513:9
    apache#8 0x55f654a4f74d in doris::vectorized::parse_thrift_footer(std::shared_ptr, doris::vectorized::FileMetaData**, unsigned long*, doris::io::IOContext*) /home/zcp/repo_center/doris_branch-3.1/doris/be/src/vec/exec/format/parquet/parquet_thrift_util.h:55:17
morningman pushed a commit that referenced this pull request Sep 12, 2025
Related PR: #45937
branch-3.1: #55500

Problem Summary:
Fix the error case on ingestion load and the core in parquet reader.

```
==8898==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62f0020603fc at pc 0x55f634e64ded bp 0x7fba0d03c410 sp 0x7fba0d03bbd8 READ of size 4 at 0x62f0020603fc thread T768 (PUSH-9699)
    #0 0x55f634e64dec in __asan_memcpy (/mnt/hdd01/ci/doris-deploy-branch-3.1-local/be/lib/doris_be+0x39a24dec) (BuildId: 9b04e7f7d3075dac)
    #1 0x55f634eca93f in std::char_traits::copy(char*, char const*, unsigned long) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/char_traits.h:409:33
    #2 0x55f634eca93f in std::__cxx11::basic_string, std::allocator>::_S_copy(char*, char const*, unsigned long) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:351:4
    #3 0x55f634eca93f in std::__cxx11::basic_string, std::allocator>::_S_copy_chars(char*, char const*, char const*) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:398:9
    #4 0x55f634eca93f in void std::__cxx11::basic_string, std::allocator>::_M_construct(char const*, char const*, std::forward_iterator_tag) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.tcc:225:6
    #5 0x55f654a4f74d in void std::__cxx11::basic_string, std::allocator>::_M_construct_aux(char const*, char const*, std::__false_type) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:247:11
    #6 0x55f654a4f74d in void std::__cxx11::basic_string, std::allocator>::_M_construct(char const*, char const*) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:266:4
    #7 0x55f654a4f74d in std::__cxx11::basic_string, std::allocator>::basic_string(char const*, unsigned long, std::allocator const&) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:513:9
    #8 0x55f654a4f74d in doris::vectorized::parse_thrift_footer(std::shared_ptr, doris::vectorized::FileMetaData**, unsigned long*, doris::io::IOContext*) /home/zcp/repo_center/doris_branch-3.1/doris/be/src/vec/exec/format/parquet/parquet_thrift_util.h:55:17
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.x-experimental dev/3.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Ingestion Load

8 participants