Skip to content

Conversation

@Jibing-Li
Copy link
Contributor

@Jibing-Li Jibing-Li commented Apr 16, 2025

What problem does this PR solve?

Use utf-8 when convert string like literal to double.
StringLike columns in Doris are all stored with utf-8 encoding. So we need to use utf-8 encoding to read the column statistics min/max value. Otherwise, Java will use the system default encoding. In this case, doris may read wrong statistics min/max value.

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Apr 16, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Jibing-Li Jibing-Li marked this pull request as ready for review April 16, 2025 05:19
@Jibing-Li
Copy link
Contributor Author

run buildall

@Jibing-Li
Copy link
Contributor Author

run p0

@Jibing-Li
Copy link
Contributor Author

run performance

@Jibing-Li
Copy link
Contributor Author

run p0

@doris-robot
Copy link

TPC-H: Total hot run time: 33776 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e3bd3315925c5d10acc7e1b2afb79a7d4afdfb52, data reload: false

------ Round 1 ----------------------------------
q1	25849	5041	4992	4992
q2	2057	273	186	186
q3	10393	1276	687	687
q4	10232	998	537	537
q5	7532	2431	2298	2298
q6	181	166	132	132
q7	944	760	628	628
q8	9317	1269	1101	1101
q9	6834	5151	5122	5122
q10	6869	2283	1895	1895
q11	486	284	274	274
q12	350	352	218	218
q13	17779	3678	3067	3067
q14	226	222	218	218
q15	545	481	488	481
q16	446	444	400	400
q17	578	853	368	368
q18	7488	7191	7042	7042
q19	1725	945	560	560
q20	326	329	218	218
q21	3892	3377	2419	2419
q22	1052	1020	933	933
Total cold run time: 115101 ms
Total hot run time: 33776 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5147	5133	5086	5086
q2	233	332	242	242
q3	2153	2620	2282	2282
q4	1380	1782	1384	1384
q5	4430	4391	4402	4391
q6	217	184	123	123
q7	1986	1938	1744	1744
q8	2588	2563	2500	2500
q9	7270	7184	6861	6861
q10	3003	3184	2729	2729
q11	576	497	515	497
q12	683	775	598	598
q13	3560	3874	3242	3242
q14	266	306	266	266
q15	544	467	505	467
q16	456	537	476	476
q17	1125	1578	1340	1340
q18	7630	7518	7377	7377
q19	789	839	904	839
q20	2004	2026	1919	1919
q21	5195	4798	4896	4798
q22	1079	1033	1000	1000
Total cold run time: 52314 ms
Total hot run time: 50161 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191780 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e3bd3315925c5d10acc7e1b2afb79a7d4afdfb52, data reload: false

query1	1424	1076	1084	1076
query2	6214	1840	1852	1840
query3	11001	4580	4448	4448
query4	54652	25745	23001	23001
query5	5041	501	444	444
query6	348	197	183	183
query7	4957	490	285	285
query8	341	250	238	238
query9	6222	2552	2561	2552
query10	443	328	261	261
query11	15098	15083	14815	14815
query12	159	111	102	102
query13	1093	514	389	389
query14	10094	6315	6345	6315
query15	196	206	186	186
query16	7155	665	504	504
query17	1103	775	622	622
query18	1691	419	332	332
query19	216	200	170	170
query20	132	136	125	125
query21	207	134	109	109
query22	4330	4405	4304	4304
query23	33998	33490	33316	33316
query24	6893	2467	2424	2424
query25	457	470	403	403
query26	702	280	153	153
query27	2354	513	338	338
query28	2925	2139	2135	2135
query29	583	565	455	455
query30	269	224	186	186
query31	846	874	780	780
query32	73	61	66	61
query33	440	368	342	342
query34	770	872	525	525
query35	804	837	740	740
query36	943	1014	894	894
query37	111	96	77	77
query38	4171	4299	4183	4183
query39	1495	1458	1427	1427
query40	219	120	107	107
query41	51	53	51	51
query42	128	110	109	109
query43	503	520	511	511
query44	1312	827	805	805
query45	182	170	167	167
query46	858	1020	654	654
query47	1812	1891	1754	1754
query48	388	422	305	305
query49	710	509	414	414
query50	707	708	419	419
query51	4263	4295	4197	4197
query52	118	120	111	111
query53	239	268	188	188
query54	591	582	516	516
query55	84	80	81	80
query56	298	305	307	305
query57	1143	1179	1128	1128
query58	270	257	255	255
query59	2716	2828	2717	2717
query60	347	348	310	310
query61	131	126	121	121
query62	725	751	714	714
query63	245	194	193	193
query64	1845	1024	693	693
query65	4438	4206	4233	4206
query66	716	395	301	301
query67	15904	15805	15352	15352
query68	7368	882	511	511
query69	526	302	262	262
query70	1193	1123	1046	1046
query71	502	311	288	288
query72	5984	4756	4792	4756
query73	1345	652	346	346
query74	9292	8903	8650	8650
query75	3689	3211	2732	2732
query76	4221	1184	773	773
query77	694	377	291	291
query78	9944	10082	9254	9254
query79	2288	840	602	602
query80	604	524	447	447
query81	485	247	223	223
query82	444	130	96	96
query83	381	243	231	231
query84	301	101	86	86
query85	789	400	305	305
query86	374	289	251	251
query87	4446	4535	4327	4327
query88	3530	2200	2177	2177
query89	393	319	283	283
query90	1910	215	218	215
query91	141	141	109	109
query92	72	61	57	57
query93	1693	982	587	587
query94	684	422	299	299
query95	369	291	288	288
query96	484	564	277	277
query97	3149	3241	3123	3123
query98	225	208	207	207
query99	1444	1424	1258	1258
Total cold run time: 299858 ms
Total hot run time: 191780 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.08 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e3bd3315925c5d10acc7e1b2afb79a7d4afdfb52, data reload: false

query1	0.03	0.03	0.03
query2	0.12	0.10	0.11
query3	0.25	0.19	0.19
query4	1.60	0.19	0.11
query5	0.57	0.55	0.56
query6	1.20	0.71	0.70
query7	0.02	0.02	0.01
query8	0.03	0.03	0.04
query9	0.57	0.52	0.51
query10	0.57	0.56	0.56
query11	0.16	0.11	0.11
query12	0.15	0.11	0.12
query13	0.62	0.60	0.60
query14	1.20	1.22	1.19
query15	0.88	0.86	0.86
query16	0.40	0.41	0.40
query17	1.05	1.00	1.03
query18	0.21	0.20	0.20
query19	1.93	1.85	1.75
query20	0.02	0.01	0.01
query21	15.42	0.91	0.53
query22	0.74	1.14	0.70
query23	14.91	1.42	0.64
query24	6.79	1.41	0.48
query25	0.53	0.19	0.09
query26	0.59	0.16	0.15
query27	0.06	0.05	0.04
query28	9.58	0.93	0.44
query29	12.55	4.00	3.32
query30	0.25	0.09	0.07
query31	2.83	0.59	0.38
query32	3.22	0.54	0.47
query33	3.00	3.01	3.09
query34	15.82	5.16	4.47
query35	4.53	4.52	4.48
query36	0.67	0.50	0.48
query37	0.09	0.06	0.05
query38	0.05	0.04	0.04
query39	0.03	0.02	0.02
query40	0.17	0.14	0.13
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.04	0.04	0.03
Total cold run time: 103.57 s
Total hot run time: 29.08 s

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Apr 17, 2025
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@Jibing-Li Jibing-Li merged commit dc96600 into apache:master Apr 18, 2025
29 of 30 checks passed
@morrySnow morrySnow added the usercase Important user case type label label Apr 18, 2025
@Jibing-Li Jibing-Li deleted the utf8 branch April 18, 2025 01:58
Jibing-Li added a commit to Jibing-Li/incubator-doris that referenced this pull request Apr 18, 2025
…pache#50085)

Use utf-8 when convert string like literal to double.
StringLike columns in Doris are all stored with utf-8 encoding. So we
need to use utf-8 encoding to read the column statistics min/max value.
Otherwise, Java will use the system default encoding. In this case,
doris may read wrong statistics min/max value.
Jibing-Li added a commit to Jibing-Li/incubator-doris that referenced this pull request Apr 18, 2025
…pache#50085)

Use utf-8 when convert string like literal to double.
StringLike columns in Doris are all stored with utf-8 encoding. So we
need to use utf-8 encoding to read the column statistics min/max value.
Otherwise, Java will use the system default encoding. In this case,
doris may read wrong statistics min/max value.
yiguolei pushed a commit that referenced this pull request Apr 19, 2025
dataroaring pushed a commit that referenced this pull request Apr 22, 2025
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…pache#50085)

### What problem does this PR solve?

Use utf-8 when convert string like literal to double.
StringLike columns in Doris are all stored with utf-8 encoding. So we
need to use utf-8 encoding to read the column statistics min/max value.
Otherwise, Java will use the system default encoding. In this case,
doris may read wrong statistics min/max value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.10-merged dev/3.0.6-merged reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants