Skip to content

Conversation

@amorynan
Copy link
Contributor

@amorynan amorynan commented Dec 25, 2024

What problem does this PR solve?

fix coredump with new_json_reader if we read invalid data will core which source from #43469
image

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@amorynan
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.89% (10121/26022)
Line Coverage: 29.89% (85525/286103)
Region Coverage: 29.03% (43712/150563)
Branch Coverage: 25.57% (22302/87218)
Coverage Report: http://coverage.selectdb-in.cc/coverage/2d330f960282d37d961f07d6c4757c4fcc98156d_2d330f960282d37d961f07d6c4757c4fcc98156d/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 32669 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2d330f960282d37d961f07d6c4757c4fcc98156d, data reload: false

------ Round 1 ----------------------------------
q1	17576	6196	6039	6039
q2	2041	302	168	168
q3	10477	1204	746	746
q4	10201	851	430	430
q5	7525	2183	1966	1966
q6	204	179	144	144
q7	900	765	597	597
q8	9221	1341	1191	1191
q9	5174	4886	4859	4859
q10	6802	2313	1843	1843
q11	491	284	255	255
q12	347	371	227	227
q13	18361	3710	3129	3129
q14	256	241	225	225
q15	572	511	504	504
q16	636	621	595	595
q17	579	850	324	324
q18	6801	6394	6388	6388
q19	1725	951	532	532
q20	316	333	192	192
q21	2849	2237	1997	1997
q22	368	337	318	318
Total cold run time: 103422 ms
Total hot run time: 32669 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6208	6266	6223	6223
q2	245	321	244	244
q3	2231	2621	2319	2319
q4	1353	1822	1399	1399
q5	4351	4795	4802	4795
q6	185	174	140	140
q7	2064	2002	1844	1844
q8	2652	2762	2697	2697
q9	7368	7230	7317	7230
q10	3096	3376	2807	2807
q11	576	508	487	487
q12	639	777	658	658
q13	3403	3768	3166	3166
q14	277	306	278	278
q15	568	540	500	500
q16	651	697	672	672
q17	1263	1709	1263	1263
q18	7647	7428	7360	7360
q19	835	1166	1076	1076
q20	2045	2013	1907	1907
q21	5597	5375	4927	4927
q22	602	637	612	612
Total cold run time: 53856 ms
Total hot run time: 52604 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197527 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2d330f960282d37d961f07d6c4757c4fcc98156d, data reload: false

query1	1291	982	930	930
query2	6491	2488	2483	2483
query3	11028	4748	4656	4656
query4	33326	23733	23346	23346
query5	4505	617	467	467
query6	282	194	203	194
query7	4629	502	315	315
query8	296	260	234	234
query9	9365	2755	2752	2752
query10	462	314	243	243
query11	17706	15255	15780	15255
query12	172	105	103	103
query13	1647	539	435	435
query14	10520	7134	7764	7134
query15	255	200	198	198
query16	8457	637	491	491
query17	1574	790	602	602
query18	2103	421	333	333
query19	213	195	161	161
query20	120	118	123	118
query21	209	125	107	107
query22	4463	4590	4415	4415
query23	34730	33676	33505	33505
query24	6366	2363	2342	2342
query25	474	484	401	401
query26	796	270	156	156
query27	2050	464	333	333
query28	5678	2508	2455	2455
query29	628	562	438	438
query30	207	184	153	153
query31	1010	918	828	828
query32	75	60	59	59
query33	484	373	294	294
query34	776	867	539	539
query35	790	850	793	793
query36	1044	1056	985	985
query37	115	98	73	73
query38	4406	4306	4297	4297
query39	1590	1503	1486	1486
query40	222	116	106	106
query41	45	68	43	43
query42	122	105	106	105
query43	541	554	519	519
query44	1406	833	819	819
query45	187	175	169	169
query46	899	1052	677	677
query47	1992	1997	1930	1930
query48	388	439	326	326
query49	725	471	392	392
query50	631	671	416	416
query51	7319	7334	7343	7334
query52	107	113	95	95
query53	230	253	186	186
query54	483	508	415	415
query55	82	77	79	77
query56	279	263	255	255
query57	1218	1234	1175	1175
query58	239	225	235	225
query59	3314	3493	3455	3455
query60	279	267	254	254
query61	142	122	108	108
query62	867	834	749	749
query63	224	198	197	197
query64	3120	1028	666	666
query65	3313	3337	3310	3310
query66	792	405	311	311
query67	16619	15824	15501	15501
query68	9543	737	507	507
query69	481	293	250	250
query70	1259	1165	1125	1125
query71	436	280	250	250
query72	6011	3851	3947	3851
query73	674	746	355	355
query74	9897	9150	8986	8986
query75	4644	3158	2681	2681
query76	5115	1172	762	762
query77	962	381	301	301
query78	10128	10504	9434	9434
query79	2839	883	597	597
query80	863	518	446	446
query81	494	260	224	224
query82	661	146	123	123
query83	189	159	147	147
query84	279	82	72	72
query85	799	367	308	308
query86	356	318	279	279
query87	4725	4467	4380	4380
query88	3349	2244	2234	2234
query89	418	331	298	298
query90	1878	190	189	189
query91	135	142	105	105
query92	61	58	50	50
query93	1652	875	515	515
query94	647	399	287	287
query95	380	260	246	246
query96	494	605	282	282
query97	2786	2850	2706	2706
query98	225	201	196	196
query99	1740	1560	1417	1417
Total cold run time: 300238 ms
Total hot run time: 197527 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.77 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2d330f960282d37d961f07d6c4757c4fcc98156d, data reload: false

query1	0.03	0.04	0.03
query2	0.07	0.03	0.03
query3	0.24	0.06	0.07
query4	1.61	0.10	0.10
query5	0.42	0.41	0.43
query6	1.16	0.66	0.65
query7	0.02	0.02	0.01
query8	0.03	0.03	0.03
query9	0.60	0.51	0.52
query10	0.56	0.59	0.56
query11	0.14	0.10	0.09
query12	0.13	0.11	0.11
query13	0.61	0.62	0.59
query14	2.75	2.84	2.73
query15	0.88	0.82	0.82
query16	0.40	0.40	0.38
query17	1.07	1.05	1.07
query18	0.22	0.21	0.21
query19	1.95	1.85	1.97
query20	0.01	0.00	0.01
query21	15.35	0.93	0.58
query22	0.74	0.91	0.56
query23	15.32	1.40	0.59
query24	3.25	0.47	1.24
query25	0.19	0.19	0.22
query26	0.26	0.15	0.13
query27	0.06	0.05	0.04
query28	13.56	1.52	1.04
query29	12.58	3.97	3.26
query30	0.26	0.09	0.06
query31	2.82	0.61	0.38
query32	3.24	0.54	0.46
query33	3.05	3.12	3.12
query34	16.69	5.05	4.48
query35	4.54	4.43	4.44
query36	0.65	0.48	0.46
query37	0.10	0.06	0.05
query38	0.05	0.04	0.04
query39	0.03	0.03	0.02
query40	0.18	0.14	0.12
query41	0.08	0.02	0.02
query42	0.04	0.02	0.03
query43	0.04	0.03	0.03
Total cold run time: 105.98 s
Total hot run time: 30.77 s

} else {
return Status::InternalError("Not support load to complex column.");
}
// we should set nullmap at last to avoid column_nullable nullmap and data column size not same
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it cause problems if it is in the original position?

BTW, use LLM to generate native English comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the original method, we set the nullmap in advance, but if an exception occurs during the logical process of processing data_column below and returns, the size of data_column and nullmap in column_nullable will be inconsistent. Then we catch this exception outter, and call _handle_simdjson_error which calls column pop_back , in this wrong ColumnNullable , data_column which is nullptr would cause core

Copy link
Contributor

@hubgeter hubgeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@amorynan
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.90% (10120/26016)
Line Coverage: 29.90% (85528/286093)
Region Coverage: 29.02% (43709/150612)
Branch Coverage: 25.55% (22298/87262)
Coverage Report: http://coverage.selectdb-in.cc/coverage/b0c489415647eeb0c07a2f88d1b09b18fd7d6bf0_b0c489415647eeb0c07a2f88d1b09b18fd7d6bf0/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 32298 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b0c489415647eeb0c07a2f88d1b09b18fd7d6bf0, data reload: false

------ Round 1 ----------------------------------
q1	17571	6122	6045	6045
q2	2045	303	171	171
q3	10460	1229	718	718
q4	10206	847	422	422
q5	7527	2192	1967	1967
q6	200	179	149	149
q7	897	757	600	600
q8	9251	1350	1142	1142
q9	5142	4904	4911	4904
q10	6738	2305	1852	1852
q11	483	282	265	265
q12	342	358	213	213
q13	17764	3568	2910	2910
q14	248	230	205	205
q15	565	494	494	494
q16	644	604	585	585
q17	571	840	328	328
q18	6865	6495	6327	6327
q19	2732	964	550	550
q20	295	300	175	175
q21	2790	2230	1980	1980
q22	367	332	296	296
Total cold run time: 103703 ms
Total hot run time: 32298 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6284	6242	6287	6242
q2	239	326	236	236
q3	2253	2598	2348	2348
q4	1430	1838	1388	1388
q5	4385	4755	4725	4725
q6	189	171	139	139
q7	2073	1971	1786	1786
q8	2625	2843	2645	2645
q9	7314	7261	7339	7261
q10	3047	3376	2870	2870
q11	574	523	495	495
q12	649	764	603	603
q13	3447	3833	3128	3128
q14	297	297	300	297
q15	571	517	504	504
q16	643	670	646	646
q17	1211	1741	1272	1272
q18	7675	7554	7335	7335
q19	850	867	1150	867
q20	2010	2052	1958	1958
q21	5740	5270	4957	4957
q22	623	613	636	613
Total cold run time: 54129 ms
Total hot run time: 52315 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197643 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b0c489415647eeb0c07a2f88d1b09b18fd7d6bf0, data reload: false

query1	1292	1019	924	924
query2	6220	2463	2283	2283
query3	11142	4802	4758	4758
query4	33279	23598	23338	23338
query5	4334	610	453	453
query6	288	221	195	195
query7	3989	488	305	305
query8	304	255	257	255
query9	9497	2756	2801	2756
query10	472	333	260	260
query11	17937	15805	15120	15120
query12	161	109	105	105
query13	1750	552	416	416
query14	11928	7729	7447	7447
query15	247	204	200	200
query16	8011	617	440	440
query17	1558	793	602	602
query18	2138	420	318	318
query19	261	192	167	167
query20	118	114	112	112
query21	201	122	108	108
query22	4446	4544	4479	4479
query23	34121	34080	33407	33407
query24	6200	2343	2297	2297
query25	493	481	387	387
query26	866	278	150	150
query27	2127	480	337	337
query28	5678	2505	2510	2505
query29	647	590	424	424
query30	206	182	156	156
query31	964	945	835	835
query32	78	67	81	67
query33	485	364	293	293
query34	772	867	515	515
query35	813	824	732	732
query36	1034	1063	960	960
query37	121	99	80	80
query38	4240	4325	4173	4173
query39	1513	1473	1490	1473
query40	209	125	113	113
query41	46	45	41	41
query42	119	104	102	102
query43	528	553	512	512
query44	1340	828	833	828
query45	201	176	166	166
query46	895	1078	678	678
query47	2026	2000	1944	1944
query48	400	425	328	328
query49	728	484	392	392
query50	668	676	405	405
query51	7224	7249	7197	7197
query52	112	100	94	94
query53	224	257	192	192
query54	495	496	419	419
query55	84	86	82	82
query56	268	265	247	247
query57	1279	1275	1153	1153
query58	245	225	228	225
query59	3359	3279	3141	3141
query60	277	271	255	255
query61	123	113	122	113
query62	856	840	770	770
query63	236	203	191	191
query64	3501	1062	660	660
query65	3358	3265	3256	3256
query66	959	426	360	360
query67	16452	15765	15506	15506
query68	9833	774	533	533
query69	502	297	257	257
query70	1196	1154	1155	1154
query71	458	287	261	261
query72	6208	3915	3772	3772
query73	728	764	377	377
query74	10108	9424	9127	9127
query75	4687	3156	2692	2692
query76	5609	1189	762	762
query77	1043	391	283	283
query78	9974	10122	9968	9968
query79	4026	877	578	578
query80	704	512	420	420
query81	484	270	234	234
query82	269	161	125	125
query83	204	163	148	148
query84	276	86	70	70
query85	739	375	320	320
query86	364	284	298	284
query87	4529	4514	4543	4514
query88	3798	2264	2226	2226
query89	419	340	291	291
query90	2123	187	185	185
query91	132	133	109	109
query92	64	57	50	50
query93	2003	888	542	542
query94	672	374	280	280
query95	332	265	251	251
query96	490	650	282	282
query97	2743	2841	2708	2708
query98	231	204	200	200
query99	1660	1578	1438	1438
Total cold run time: 302751 ms
Total hot run time: 197643 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.58 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b0c489415647eeb0c07a2f88d1b09b18fd7d6bf0, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.04	0.04
query3	0.24	0.06	0.07
query4	1.62	0.11	0.11
query5	0.42	0.43	0.39
query6	1.18	0.65	0.65
query7	0.03	0.02	0.02
query8	0.04	0.03	0.03
query9	0.57	0.51	0.51
query10	0.55	0.58	0.56
query11	0.14	0.10	0.10
query12	0.14	0.11	0.11
query13	0.61	0.61	0.59
query14	2.71	2.73	2.74
query15	0.90	0.83	0.82
query16	0.40	0.38	0.38
query17	1.06	0.99	1.03
query18	0.23	0.21	0.20
query19	1.94	1.80	2.02
query20	0.01	0.01	0.02
query21	15.40	0.93	0.58
query22	0.76	0.89	0.66
query23	15.22	1.35	0.58
query24	3.25	1.30	1.74
query25	0.17	0.19	0.06
query26	0.25	0.15	0.14
query27	0.06	0.06	0.06
query28	14.45	1.46	1.06
query29	12.58	3.98	3.21
query30	0.25	0.09	0.06
query31	2.82	0.58	0.38
query32	3.24	0.55	0.46
query33	3.09	3.04	3.08
query34	16.77	5.24	4.54
query35	4.52	4.51	4.58
query36	0.68	0.49	0.50
query37	0.10	0.06	0.06
query38	0.04	0.04	0.04
query39	0.03	0.03	0.02
query40	0.17	0.12	0.12
query41	0.08	0.03	0.03
query42	0.04	0.03	0.02
query43	0.03	0.04	0.03
Total cold run time: 106.89 s
Total hot run time: 31.58 s

@amorynan amorynan requested a review from xiaokang December 30, 2024 09:56
Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 2, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Jan 2, 2025

PR approved by at least one committer and no changes requested.

@eldenmoon eldenmoon merged commit 8cb4830 into apache:master Jan 2, 2025
23 of 24 checks passed
github-actions bot pushed a commit that referenced this pull request Jan 2, 2025
fix coredump with new_json_reader if we read invalid data will core
which source from #43469
eldenmoon pushed a commit that referenced this pull request Jan 2, 2025
…#46253)

Cherry-picked from #45905

Co-authored-by: amory <wangqiannan@selectdb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.4-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants