Skip to content

Conversation

@hubgeter
Copy link
Contributor

@hubgeter hubgeter commented Jul 8, 2025

What problem does this PR solve?

Related PR: #47471

Problem Summary:
This pr is a supplement to #47471.
This pr is used to support reading hive tables that convert timestamp columns to bigint columns and display them in ms precision. (parquet/orc hive table.)

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jul 8, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hubgeter
Copy link
Contributor Author

hubgeter commented Jul 8, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33513 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3cc895a8aaa0a0c8b9628fa39cac92905ee7a996, data reload: false

------ Round 1 ----------------------------------
q1	17591	5245	5094	5094
q2	1934	297	220	220
q3	10270	1348	735	735
q4	10269	1048	525	525
q5	8601	2426	2341	2341
q6	213	161	132	132
q7	905	743	597	597
q8	9327	1312	1082	1082
q9	7302	5188	5073	5073
q10	6969	2411	1977	1977
q11	481	287	274	274
q12	373	373	220	220
q13	17782	3711	3089	3089
q14	222	217	212	212
q15	556	491	475	475
q16	429	419	386	386
q17	613	870	383	383
q18	7388	7178	7163	7163
q19	1205	947	567	567
q20	339	349	226	226
q21	3933	3207	2445	2445
q22	362	325	297	297
Total cold run time: 107064 ms
Total hot run time: 33513 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5218	5168	5105	5105
q2	241	320	216	216
q3	2223	2675	2313	2313
q4	1417	1820	1311	1311
q5	4218	4611	4525	4525
q6	216	165	127	127
q7	2072	1919	1888	1888
q8	2703	2571	2593	2571
q9	7146	7306	7256	7256
q10	3203	3308	2882	2882
q11	603	497	515	497
q12	718	812	652	652
q13	3728	4025	3364	3364
q14	302	310	286	286
q15	526	478	502	478
q16	454	483	439	439
q17	1197	1561	1414	1414
q18	8053	7795	7078	7078
q19	832	818	885	818
q20	1951	1965	1810	1810
q21	4826	4424	4267	4267
q22	628	575	533	533
Total cold run time: 52475 ms
Total hot run time: 49830 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186290 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3cc895a8aaa0a0c8b9628fa39cac92905ee7a996, data reload: false

query1	1020	400	393	393
query2	6539	1794	1823	1794
query3	6739	214	218	214
query4	26764	23297	23498	23297
query5	4362	610	438	438
query6	300	216	214	214
query7	4625	494	301	301
query8	276	229	214	214
query9	8611	2617	2636	2617
query10	449	330	283	283
query11	15427	15042	14900	14900
query12	151	108	106	106
query13	1657	552	406	406
query14	8620	5940	5954	5940
query15	208	204	172	172
query16	7716	438	264	264
query17	1324	706	593	593
query18	2015	410	315	315
query19	201	205	162	162
query20	126	120	114	114
query21	216	129	108	108
query22	4078	4175	4113	4113
query23	34163	33223	32945	32945
query24	8455	2389	2405	2389
query25	537	478	401	401
query26	1230	285	149	149
query27	2739	516	345	345
query28	4357	2133	2125	2125
query29	718	559	446	446
query30	291	220	200	200
query31	898	854	770	770
query32	102	66	62	62
query33	538	328	286	286
query34	800	842	533	533
query35	604	631	562	562
query36	964	995	870	870
query37	120	99	77	77
query38	4168	4085	4067	4067
query39	1482	1436	1396	1396
query40	212	118	107	107
query41	56	55	53	53
query42	123	113	109	109
query43	518	532	490	490
query44	1336	842	838	838
query45	173	190	166	166
query46	828	1012	648	648
query47	1727	1768	1733	1733
query48	379	422	313	313
query49	737	490	400	400
query50	662	708	431	431
query51	4235	4201	4124	4124
query52	108	107	108	107
query53	231	269	186	186
query54	571	564	504	504
query55	83	89	81	81
query56	305	313	318	313
query57	1168	1176	1121	1121
query58	264	253	260	253
query59	2703	2796	2688	2688
query60	325	314	308	308
query61	127	127	123	123
query62	808	694	675	675
query63	227	192	193	192
query64	4404	1227	824	824
query65	4245	4143	4176	4143
query66	1094	398	316	316
query67	15569	15608	15221	15221
query68	8304	890	537	537
query69	511	332	270	270
query70	1217	1099	1106	1099
query71	498	328	302	302
query72	5676	4877	4873	4873
query73	754	665	355	355
query74	9333	9226	8809	8809
query75	3885	3184	2708	2708
query76	3716	1132	711	711
query77	808	382	297	297
query78	11148	10922	10204	10204
query79	2520	837	600	600
query80	630	496	446	446
query81	464	264	225	225
query82	452	131	101	101
query83	277	300	240	240
query84	296	103	92	92
query85	784	358	313	313
query86	339	295	320	295
query87	4367	4399	4294	4294
query88	3314	2323	2306	2306
query89	391	322	294	294
query90	1924	216	218	216
query91	163	155	112	112
query92	79	63	55	55
query93	1484	991	603	603
query94	679	315	199	199
query95	377	297	285	285
query96	493	570	275	275
query97	2772	2749	2644	2644
query98	240	212	211	211
query99	1488	1400	1271	1271
Total cold run time: 276191 ms
Total hot run time: 186290 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.1 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3cc895a8aaa0a0c8b9628fa39cac92905ee7a996, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.24	0.09	0.07
query4	1.61	0.12	0.11
query5	0.42	0.44	0.41
query6	1.16	0.67	0.66
query7	0.02	0.02	0.02
query8	0.04	0.04	0.04
query9	0.61	0.51	0.51
query10	0.57	0.57	0.57
query11	0.17	0.12	0.12
query12	0.15	0.12	0.12
query13	0.63	0.61	0.61
query14	0.81	0.82	0.83
query15	0.91	0.87	0.86
query16	0.39	0.39	0.41
query17	1.07	1.09	1.06
query18	0.24	0.21	0.21
query19	2.02	1.91	1.91
query20	0.01	0.01	0.01
query21	15.43	0.91	0.54
query22	0.78	1.19	0.74
query23	14.84	1.35	0.60
query24	7.11	1.08	1.19
query25	0.55	0.15	0.13
query26	0.64	0.18	0.15
query27	0.06	0.06	0.05
query28	8.95	0.88	0.45
query29	12.56	4.03	3.30
query30	0.26	0.09	0.07
query31	2.85	0.62	0.39
query32	3.25	0.56	0.47
query33	3.10	3.12	3.12
query34	16.05	5.36	4.75
query35	4.85	4.86	4.84
query36	0.70	0.50	0.49
query37	0.09	0.07	0.07
query38	0.06	0.05	0.04
query39	0.03	0.03	0.03
query40	0.17	0.15	0.15
query41	0.08	0.02	0.03
query42	0.03	0.02	0.03
query43	0.04	0.03	0.03
Total cold run time: 103.67 s
Total hot run time: 30.1 s

@hubgeter
Copy link
Contributor Author

hubgeter commented Jul 8, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33266 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 0e9aaedc56fdb4533f65a5eb3caa46f227fb451e, data reload: false

------ Round 1 ----------------------------------
q1	17590	5182	4985	4985
q2	1919	288	180	180
q3	10330	1318	741	741
q4	10228	1018	526	526
q5	7517	2393	2330	2330
q6	174	168	128	128
q7	890	739	593	593
q8	9312	1305	1082	1082
q9	6875	5069	5085	5069
q10	6894	2388	1971	1971
q11	476	295	275	275
q12	335	347	211	211
q13	17753	3724	3114	3114
q14	233	224	218	218
q15	535	487	481	481
q16	423	426	395	395
q17	589	871	374	374
q18	7456	7171	7148	7148
q19	1212	946	544	544
q20	327	352	221	221
q21	3745	3344	2383	2383
q22	357	329	297	297
Total cold run time: 105170 ms
Total hot run time: 33266 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5118	5083	5107	5083
q2	234	319	221	221
q3	2179	2719	2369	2369
q4	1392	1784	1337	1337
q5	4230	4278	4704	4278
q6	232	166	124	124
q7	2055	1992	1852	1852
q8	2660	2630	2681	2630
q9	7405	7097	7204	7097
q10	3086	3318	2903	2903
q11	585	516	484	484
q12	691	810	666	666
q13	3865	3923	3446	3446
q14	291	287	388	287
q15	624	486	476	476
q16	442	490	425	425
q17	1151	1511	1414	1414
q18	7941	7577	7616	7577
q19	791	820	926	820
q20	2014	2167	1987	1987
q21	5112	4712	4605	4605
q22	664	601	555	555
Total cold run time: 52762 ms
Total hot run time: 50636 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186287 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 0e9aaedc56fdb4533f65a5eb3caa46f227fb451e, data reload: false

query1	1011	388	398	388
query2	6561	1804	1755	1755
query3	6753	211	216	211
query4	26414	23944	23252	23252
query5	4347	588	436	436
query6	300	200	187	187
query7	4636	493	287	287
query8	275	214	214	214
query9	8617	2586	2601	2586
query10	473	318	269	269
query11	15274	14996	14868	14868
query12	151	105	103	103
query13	1645	526	419	419
query14	8513	5865	5803	5803
query15	209	189	172	172
query16	7829	436	261	261
query17	1328	698	584	584
query18	2018	401	305	305
query19	192	193	159	159
query20	127	117	118	117
query21	211	124	105	105
query22	4174	4383	4396	4383
query23	34881	33946	33513	33513
query24	8432	2327	2365	2327
query25	514	446	380	380
query26	1222	262	143	143
query27	2735	504	339	339
query28	4289	2120	2107	2107
query29	706	552	428	428
query30	282	216	186	186
query31	909	868	752	752
query32	69	59	61	59
query33	583	321	277	277
query34	796	833	506	506
query35	575	638	578	578
query36	928	974	890	890
query37	107	98	76	76
query38	4126	4159	4135	4135
query39	1476	1415	1373	1373
query40	203	114	103	103
query41	52	54	50	50
query42	121	114	110	110
query43	508	520	488	488
query44	1303	820	819	819
query45	176	164	162	162
query46	824	1000	618	618
query47	1769	1809	1713	1713
query48	390	425	301	301
query49	741	470	389	389
query50	626	680	408	408
query51	4153	4250	4198	4198
query52	114	102	103	102
query53	220	254	184	184
query54	560	564	501	501
query55	81	80	80	80
query56	297	292	291	291
query57	1164	1175	1123	1123
query58	273	246	253	246
query59	2709	2803	2628	2628
query60	330	312	300	300
query61	130	120	122	120
query62	770	746	635	635
query63	218	179	183	179
query64	4280	1167	865	865
query65	4322	4140	4165	4140
query66	1065	396	313	313
query67	15756	15515	15171	15171
query68	8582	895	517	517
query69	488	308	269	269
query70	1191	1118	1122	1118
query71	450	323	293	293
query72	5746	4714	4868	4714
query73	735	642	349	349
query74	9077	8957	9117	8957
query75	3958	3171	2674	2674
query76	3621	1139	703	703
query77	788	400	285	285
query78	10921	11026	10397	10397
query79	2225	831	583	583
query80	665	507	432	432
query81	459	253	221	221
query82	455	123	92	92
query83	271	254	234	234
query84	298	108	77	77
query85	799	360	347	347
query86	332	296	271	271
query87	4536	4379	4263	4263
query88	2802	2238	2241	2238
query89	402	323	284	284
query90	1930	204	200	200
query91	141	144	113	113
query92	72	58	55	55
query93	1513	923	583	583
query94	677	309	198	198
query95	372	292	279	279
query96	480	579	284	284
query97	2768	2885	2610	2610
query98	218	204	204	204
query99	1498	1437	1273	1273
Total cold run time: 275313 ms
Total hot run time: 186287 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.42 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 0e9aaedc56fdb4533f65a5eb3caa46f227fb451e, data reload: false

query1	0.04	0.04	0.04
query2	0.07	0.04	0.05
query3	0.25	0.08	0.07
query4	1.61	0.10	0.10
query5	0.42	0.41	0.42
query6	1.19	0.67	0.65
query7	0.03	0.01	0.02
query8	0.05	0.04	0.03
query9	0.60	0.50	0.51
query10	0.56	0.57	0.57
query11	0.16	0.11	0.11
query12	0.15	0.12	0.11
query13	0.62	0.61	0.61
query14	0.78	0.80	0.84
query15	0.89	0.89	0.86
query16	0.41	0.37	0.40
query17	1.05	1.04	1.08
query18	0.22	0.20	0.20
query19	1.96	1.83	1.82
query20	0.02	0.01	0.02
query21	15.40	0.89	0.54
query22	0.74	1.24	0.64
query23	14.92	1.37	0.60
query24	7.52	0.74	0.70
query25	0.46	0.26	0.09
query26	0.58	0.17	0.14
query27	0.06	0.06	0.05
query28	9.06	0.91	0.44
query29	12.56	3.97	3.34
query30	0.25	0.10	0.06
query31	2.84	0.59	0.39
query32	3.23	0.56	0.46
query33	3.06	3.19	3.15
query34	16.03	5.38	4.80
query35	4.82	4.82	4.84
query36	0.68	0.51	0.49
query37	0.10	0.08	0.07
query38	0.05	0.05	0.04
query39	0.03	0.03	0.03
query40	0.18	0.15	0.13
query41	0.08	0.03	0.03
query42	0.03	0.03	0.03
query43	0.04	0.03	0.03
Total cold run time: 103.8 s
Total hot run time: 29.42 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 82.81% (53/64) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 57.33% (15510/27052)
Line Coverage 46.31% (140967/304395)
Region Coverage 45.57% (71292/156459)
Branch Coverage 40.29% (37571/93250)

auto& dst_value = reinterpret_cast<DstCppType&>(data[start_idx + i]);

int64_t ts_s = 0;
if (!src_value.unix_timestamp(&ts_s, cctz::utc_time_zone())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to consider the timestamp with timezone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need. Timezone conversion should be handled during the conversion from the Parquet physical type (int64) to the Parquet logical type (timestamp). Ref : struct Int64ToTimestamp : public PhysicalToLogicalConverter

Copy link
Contributor

@kaka11chen kaka11chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Jul 10, 2025
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit 2d48f1a into apache:master Jul 10, 2025
29 of 33 checks passed
hubgeter added a commit to hubgeter/doris that referenced this pull request Jul 15, 2025
… to bigint. (apache#52954)

Related PR: apache#47471

Problem Summary:
This pr is a supplement to apache#47471.
This pr is used to support reading hive tables that convert timestamp
columns to bigint columns and display them in `ms` precision.
(parquet/orc hive table.)
hubgeter added a commit to hubgeter/doris that referenced this pull request Jul 15, 2025
… to bigint. (apache#52954)

Related PR: apache#47471

Problem Summary:
This pr is a supplement to apache#47471.
This pr is used to support reading hive tables that convert timestamp
columns to bigint columns and display them in `ms` precision.
(parquet/orc hive table.)
morrySnow pushed a commit that referenced this pull request Jul 16, 2025
…ables after schema changes. #51341 #52964 #52954 #53055 (#53170)

bp #51341: support read hudi/paimon/iceberg schema change
bp #52964: add hudi orc  reader
bp #52954: support timestamp to bigint
bp #53055: fix paimon docker version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.7-merged dev/3.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants