Skip to content

Conversation

@suxiaogang223
Copy link
Contributor

bp: #55626

…he#55626)

Problem Summary:
This pull request improves the handling of empty string null formats and
delimiter properties for Hive external tables, ensuring more robust and
consistent behavior when parsing data.

For hive text table like this:
```sql
CREATE TABLE test_empty_null_defined_text (
  id INT,
  name STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
STORED AS TEXTFILE;

INSERT INTO TABLE test_empty_null_defined_text VALUES
  (1, 'Alice'),
  (2, NULL);
```
Query in Doris:
```sql
select * from test_empty_null_defined_text;
```
Before Result:
```text
+------+-------+
| id   | name  |
+------+-------+
|    1 | Alice |
|    2 |       |
+------+-------+
```
After Result:
```text
+------+-------+
| id   | name  |
+------+-------+
|    1 | Alice |
|    2 | NULL  |
+------+-------+
```
@suxiaogang223
Copy link
Contributor Author

run buildall

@Thearas
Copy link
Contributor

Thearas commented Sep 5, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@doris-robot
Copy link

TPC-H: Total hot run time: 39953 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3db0265d900d1b1c456feb46d2ddc611230af3f8, data reload: false

------ Round 1 ----------------------------------
q1	17717	6908	6622	6622
q2	2048	198	186	186
q3	10576	1116	1252	1116
q4	10517	737	710	710
q5	7723	2917	2765	2765
q6	212	138	136	136
q7	985	638	614	614
q8	9623	1973	2027	1973
q9	8373	6426	6434	6426
q10	7035	2279	2323	2279
q11	456	268	269	268
q12	400	221	223	221
q13	17777	2983	2992	2983
q14	236	209	219	209
q15	520	461	464	461
q16	471	375	380	375
q17	995	639	504	504
q18	7225	6683	6806	6683
q19	1399	1135	1071	1071
q20	476	196	209	196
q21	3892	3181	3284	3181
q22	1113	998	974	974
Total cold run time: 109769 ms
Total hot run time: 39953 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6610	6580	6535	6535
q2	332	230	240	230
q3	3076	2947	2952	2947
q4	2138	1855	1789	1789
q5	5691	5702	5759	5702
q6	217	128	129	128
q7	2245	1843	1777	1777
q8	3405	3532	3457	3457
q9	8849	8929	8859	8859
q10	3544	3539	3498	3498
q11	592	494	499	494
q12	797	575	613	575
q13	4917	3153	3191	3153
q14	300	276	268	268
q15	521	465	464	464
q16	497	431	451	431
q17	1879	1657	1604	1604
q18	8214	7634	7680	7634
q19	1655	1560	1511	1511
q20	2068	1889	1882	1882
q21	5357	5112	5058	5058
q22	1108	1065	1040	1040
Total cold run time: 64012 ms
Total hot run time: 59036 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192421 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3db0265d900d1b1c456feb46d2ddc611230af3f8, data reload: false

query1	940	403	404	403
query2	6239	1899	1835	1835
query3	8689	207	201	201
query4	33948	23587	23561	23561
query5	3669	455	455	455
query6	307	215	184	184
query7	4210	312	330	312
query8	303	233	230	230
query9	9507	2571	2570	2570
query10	475	265	271	265
query11	17946	15227	14976	14976
query12	163	103	103	103
query13	1558	439	424	424
query14	8616	7281	6894	6894
query15	238	168	185	168
query16	7987	467	467	467
query17	1627	660	588	588
query18	2151	338	331	331
query19	222	166	176	166
query20	138	119	117	117
query21	210	112	114	112
query22	4732	4531	4369	4369
query23	35214	34610	34206	34206
query24	11859	2932	2917	2917
query25	675	406	401	401
query26	1849	176	171	171
query27	3009	348	352	348
query28	7930	2110	2144	2110
query29	1060	435	449	435
query30	266	164	166	164
query31	1068	816	846	816
query32	100	54	56	54
query33	765	320	320	320
query34	1054	507	529	507
query35	862	713	727	713
query36	1123	974	951	951
query37	266	68	70	68
query38	4120	3963	4014	3963
query39	1510	1472	1483	1472
query40	263	100	101	100
query41	53	51	49	49
query42	114	100	107	100
query43	520	479	466	466
query44	1305	834	814	814
query45	194	172	174	172
query46	1155	725	716	716
query47	2064	1927	1902	1902
query48	473	393	385	385
query49	1064	415	399	399
query50	816	432	434	432
query51	7426	7345	7241	7241
query52	101	91	91	91
query53	271	209	186	186
query54	1387	501	473	473
query55	79	78	79	78
query56	273	254	307	254
query57	1311	1218	1193	1193
query58	223	219	213	213
query59	3204	3030	2941	2941
query60	281	250	261	250
query61	110	110	108	108
query62	885	673	698	673
query63	223	193	203	193
query64	5022	647	624	624
query65	3439	3343	3299	3299
query66	1315	295	310	295
query67	16450	15638	15558	15558
query68	4971	568	563	563
query69	425	263	266	263
query70	1204	1118	1131	1118
query71	342	250	256	250
query72	6141	4166	4126	4126
query73	761	350	361	350
query74	10649	9214	9266	9214
query75	3394	2655	2674	2655
query76	2656	1117	1044	1044
query77	371	274	283	274
query78	10538	9595	9601	9595
query79	2495	611	661	611
query80	1109	422	415	415
query81	547	218	223	218
query82	597	87	86	86
query83	232	143	141	141
query84	240	79	77	77
query85	1543	293	293	293
query86	473	291	299	291
query87	4445	4262	4250	4250
query88	4272	2400	2364	2364
query89	412	291	289	289
query90	2087	190	184	184
query91	182	146	153	146
query92	67	49	51	49
query93	2330	549	543	543
query94	935	303	300	300
query95	364	256	258	256
query96	610	281	286	281
query97	3282	3140	3177	3140
query98	222	199	194	194
query99	1501	1289	1339	1289
Total cold run time: 306023 ms
Total hot run time: 192421 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.66 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3db0265d900d1b1c456feb46d2ddc611230af3f8, data reload: false

query1	0.04	0.03	0.03
query2	0.06	0.03	0.03
query3	0.24	0.06	0.06
query4	1.62	0.10	0.10
query5	0.51	0.51	0.49
query6	1.13	0.73	0.72
query7	0.04	0.01	0.02
query8	0.04	0.04	0.03
query9	0.55	0.50	0.51
query10	0.56	0.54	0.56
query11	0.15	0.10	0.11
query12	0.14	0.11	0.11
query13	0.61	0.61	0.59
query14	0.79	0.78	0.81
query15	0.84	0.83	0.82
query16	0.40	0.41	0.37
query17	1.10	1.05	1.03
query18	0.25	0.22	0.23
query19	1.84	1.77	1.78
query20	0.02	0.01	0.01
query21	15.40	0.59	0.58
query22	2.12	1.90	2.51
query23	17.03	0.85	0.83
query24	2.95	1.30	1.48
query25	0.19	0.24	0.11
query26	0.46	0.14	0.15
query27	0.04	0.04	0.04
query28	9.79	0.52	0.49
query29	12.55	3.22	3.21
query30	0.24	0.07	0.06
query31	2.87	0.40	0.39
query32	3.22	0.47	0.46
query33	3.01	3.00	3.06
query34	16.86	4.52	4.59
query35	4.58	4.55	4.62
query36	0.66	0.48	0.47
query37	0.09	0.06	0.06
query38	0.05	0.03	0.03
query39	0.04	0.03	0.02
query40	0.16	0.12	0.12
query41	0.08	0.03	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 103.39 s
Total hot run time: 30.66 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 33.33% (1/3) 🎉
Increment coverage report
Complete coverage report

@suxiaogang223
Copy link
Contributor Author

run p0

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 32c6680 into apache:branch-3.0 Sep 13, 2025
24 checks passed
@suxiaogang223 suxiaogang223 deleted the fix_null_format3.0 branch September 23, 2025 03:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants