Skip to content

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Sep 4, 2025

Cherry-picked from #55626

### What problem does this PR solve?
Problem Summary:
This pull request improves the handling of empty string null formats and
delimiter properties for Hive external tables, ensuring more robust and
consistent behavior when parsing data.

For hive text table like this:
```sql
CREATE TABLE test_empty_null_defined_text (
  id INT,
  name STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
STORED AS TEXTFILE;

INSERT INTO TABLE test_empty_null_defined_text VALUES
  (1, 'Alice'),
  (2, NULL);
```
Query in Doris:
```sql
select * from test_empty_null_defined_text;
```
Before Result:
```text
+------+-------+
| id   | name  |
+------+-------+
|    1 | Alice |
|    2 |       |
+------+-------+
```
After Result:
```text
+------+-------+
| id   | name  |
+------+-------+
|    1 | Alice |
|    2 | NULL  |
+------+-------+
```
@github-actions github-actions bot requested a review from morrySnow as a code owner September 4, 2025 06:04
@morrySnow morrySnow closed this Sep 4, 2025
@morrySnow morrySnow reopened this Sep 4, 2025
@morrySnow
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32757 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 719628b71ad31dcba31e4a0bfb4eaa97f3d5911c, data reload: false

------ Round 1 ----------------------------------
q1	17587	5494	5381	5381
q2	2021	389	274	274
q3	12030	1233	769	769
q4	10251	895	442	442
q5	8418	2392	2141	2141
q6	184	166	134	134
q7	890	748	627	627
q8	9356	1462	1123	1123
q9	5255	4973	4983	4973
q10	6769	2325	1862	1862
q11	545	282	271	271
q12	341	367	211	211
q13	17766	3627	3051	3051
q14	226	220	206	206
q15	528	471	456	456
q16	420	430	367	367
q17	596	852	357	357
q18	6868	6530	6424	6424
q19	1201	946	541	541
q20	315	340	201	201
q21	2789	2210	1973	1973
q22	1041	1045	973	973
Total cold run time: 105397 ms
Total hot run time: 32757 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5540	5527	5491	5491
q2	230	332	230	230
q3	2271	2620	2305	2305
q4	1301	1799	1314	1314
q5	4407	4950	5002	4950
q6	169	163	134	134
q7	2080	1945	1828	1828
q8	2630	2862	2714	2714
q9	7266	7234	7288	7234
q10	2997	3296	2740	2740
q11	569	518	499	499
q12	639	776	614	614
q13	3408	3820	3176	3176
q14	281	299	272	272
q15	518	462	487	462
q16	448	483	443	443
q17	1236	1762	1256	1256
q18	7515	7470	7248	7248
q19	828	1119	1088	1088
q20	2033	2070	1905	1905
q21	5248	4951	4557	4557
q22	1124	1098	1018	1018
Total cold run time: 52738 ms
Total hot run time: 51478 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192440 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 719628b71ad31dcba31e4a0bfb4eaa97f3d5911c, data reload: false

query1	935	410	409	409
query2	6291	1885	1901	1885
query3	8686	201	200	200
query4	33384	23790	23417	23417
query5	3633	599	448	448
query6	291	206	181	181
query7	4218	509	323	323
query8	303	262	233	233
query9	9331	2578	2570	2570
query10	464	310	248	248
query11	18165	15429	15641	15429
query12	161	105	105	105
query13	1541	552	416	416
query14	10347	7305	6818	6818
query15	272	192	185	185
query16	8140	688	477	477
query17	1540	768	595	595
query18	2179	400	320	320
query19	232	179	170	170
query20	128	127	123	123
query21	210	126	103	103
query22	4423	4601	4402	4402
query23	35092	33952	33904	33904
query24	7287	2726	2736	2726
query25	503	468	409	409
query26	846	285	169	169
query27	2076	506	364	364
query28	5371	2221	2187	2187
query29	691	583	445	445
query30	245	195	172	172
query31	999	898	871	871
query32	102	61	63	61
query33	507	372	314	314
query34	736	874	520	520
query35	801	820	768	768
query36	1047	1047	937	937
query37	104	100	70	70
query38	4013	4017	3965	3965
query39	1528	1471	1484	1471
query40	212	122	108	108
query41	57	55	51	51
query42	129	120	110	110
query43	518	518	483	483
query44	1377	850	842	842
query45	192	180	178	178
query46	932	1093	695	695
query47	1965	1974	1892	1892
query48	437	475	364	364
query49	759	505	434	434
query50	689	695	435	435
query51	7302	7430	7268	7268
query52	106	105	101	101
query53	247	268	202	202
query54	592	569	489	489
query55	82	85	82	82
query56	298	305	275	275
query57	1284	1271	1219	1219
query58	258	227	227	227
query59	2946	3162	2956	2956
query60	321	308	305	305
query61	116	117	112	112
query62	797	748	677	677
query63	232	200	194	194
query64	3770	1008	653	653
query65	3396	3317	3340	3317
query66	999	416	323	323
query67	16328	15820	15707	15707
query68	7669	855	551	551
query69	495	305	268	268
query70	1216	1084	1103	1084
query71	400	306	279	279
query72	5865	3896	3855	3855
query73	652	754	352	352
query74	10399	9206	8928	8928
query75	3354	3156	2657	2657
query76	3250	1191	771	771
query77	761	387	280	280
query78	10306	10465	9623	9623
query79	3546	902	601	601
query80	809	543	446	446
query81	496	255	224	224
query82	563	124	92	92
query83	161	161	145	145
query84	237	105	86	86
query85	827	354	294	294
query86	383	315	300	300
query87	4341	4303	4231	4231
query88	5333	2447	2422	2422
query89	410	324	309	309
query90	1768	193	195	193
query91	139	138	108	108
query92	71	57	53	53
query93	2474	922	556	556
query94	653	418	311	311
query95	344	283	273	273
query96	498	621	290	290
query97	3176	3227	3198	3198
query98	222	212	198	198
query99	1562	1405	1353	1353
Total cold run time: 295543 ms
Total hot run time: 192440 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.51 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 719628b71ad31dcba31e4a0bfb4eaa97f3d5911c, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.03
query3	0.24	0.06	0.07
query4	1.63	0.11	0.10
query5	0.56	0.54	0.52
query6	1.14	0.73	0.73
query7	0.02	0.02	0.02
query8	0.05	0.03	0.03
query9	0.58	0.49	0.49
query10	0.54	0.55	0.55
query11	0.14	0.11	0.11
query12	0.14	0.11	0.12
query13	0.66	0.59	0.60
query14	0.80	0.78	0.82
query15	0.85	0.85	0.84
query16	0.37	0.38	0.38
query17	1.04	0.98	1.07
query18	0.24	0.22	0.22
query19	1.87	1.85	1.76
query20	0.01	0.01	0.02
query21	15.39	0.95	0.58
query22	0.74	0.78	0.66
query23	15.15	1.47	0.51
query24	3.27	0.69	0.85
query25	0.27	0.15	0.05
query26	0.34	0.15	0.13
query27	0.04	0.06	0.04
query28	13.12	1.06	0.45
query29	12.55	3.85	3.25
query30	0.25	0.10	0.07
query31	2.80	0.60	0.38
query32	3.22	0.54	0.48
query33	3.01	3.05	3.07
query34	16.83	5.13	4.49
query35	4.64	4.59	4.55
query36	0.64	0.49	0.48
query37	0.09	0.06	0.06
query38	0.04	0.03	0.03
query39	0.03	0.03	0.02
query40	0.16	0.13	0.12
query41	0.07	0.03	0.03
query42	0.04	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 103.72 s
Total hot run time: 28.51 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/3) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 45.53% (12746/27996)
Line Coverage 36.38% (113637/312387)
Region Coverage 34.01% (65018/191171)
Branch Coverage 31.05% (34123/109914)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 33.33% (1/3) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 76.36% (21017/27525)
Line Coverage 69.72% (217056/311338)
Region Coverage 67.68% (129876/191910)
Branch Coverage 61.19% (67559/110410)

@morrySnow morrySnow merged commit 93cb6f9 into branch-3.1 Sep 5, 2025
21 of 23 checks passed
@github-actions github-actions bot deleted the auto-pick-55626-branch-3.1 branch September 5, 2025 10:21
@morrySnow morrySnow mentioned this pull request Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants