Skip to content

Conversation

@vinlee19
Copy link
Contributor

@vinlee19 vinlee19 commented Aug 20, 2025

What problem does this PR solve?

In PR #49623, we implemented conversion from Paimon VARCHAR/CHAR types to Doris VARCHAR/CHAR types. However, there are significant differences in the maximum length constraints between these systems:

Apache Paimon:

  • CHAR : Fixed-length character string declared using CHAR(n) where n is the number of code points. n must have a value between 1 and 2,147,483,647 (inclusive). Defaults to n=1 if no length is specified.
  • VARCHAR: Variable-length character string declared using VARCHAR(n) where n is the maximum number of code points. n must have a value between 1 and 2,147,483,647 (inclusive). Defaults to n=1 if no length is specified.

Apache Doris:

  • CHAR : Maximum length is 255 characters
  • VARCHAR : Maximum length is 65,533 characters

Solution:
This PR addresses the length constraint mismatch by automatically converting oversized Paimon VARCHAR/CHAR types to Doris STRING type when they exceed Doris limits:

  • Paimon VARCHAR with length > 65,533 → Doris STRING
  • Paimon CHAR with length > 255 → Doris STRING

This ensures compatibility while preserving data integrity during type mapping from Paimon to Doris.
Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@vinlee19
Copy link
Contributor Author

run buildall

@vinlee19
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34695 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1bd69c4e4a6a1ea2db3f94f435f4453ac30d43c1, data reload: false

------ Round 1 ----------------------------------
q1	14870	5257	5274	5257
q2	1919	273	177	177
q3	9663	1256	705	705
q4	9016	1005	540	540
q5	6694	2368	2339	2339
q6	181	162	130	130
q7	915	747	606	606
q8	8472	1312	1153	1153
q9	6902	5262	5088	5088
q10	6900	2386	1953	1953
q11	481	288	274	274
q12	342	357	229	229
q13	17760	3615	3047	3047
q14	227	240	213	213
q15	544	470	479	470
q16	454	425	374	374
q17	599	865	362	362
q18	7611	6997	7049	6997
q19	933	948	574	574
q20	357	336	214	214
q21	3956	3191	3013	3013
q22	1085	1050	980	980
Total cold run time: 99881 ms
Total hot run time: 34695 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5185	5192	5164	5164
q2	242	319	225	225
q3	2137	2657	2305	2305
q4	1409	1797	1363	1363
q5	4207	4115	4347	4115
q6	216	184	148	148
q7	2084	1945	1819	1819
q8	2615	2689	2627	2627
q9	7293	7294	7346	7294
q10	3071	3313	2928	2928
q11	598	524	527	524
q12	694	800	664	664
q13	3588	3951	3198	3198
q14	293	304	278	278
q15	514	481	472	472
q16	447	479	460	460
q17	1207	1631	1325	1325
q18	7910	7892	7665	7665
q19	865	860	950	860
q20	1916	2036	1939	1939
q21	4814	4369	4441	4369
q22	1099	1059	1016	1016
Total cold run time: 52404 ms
Total hot run time: 50758 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185749 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1bd69c4e4a6a1ea2db3f94f435f4453ac30d43c1, data reload: false

query1	978	380	401	380
query2	6517	1675	1739	1675
query3	6754	219	219	219
query4	27087	23301	23348	23301
query5	4324	593	492	492
query6	302	241	193	193
query7	4624	494	290	290
query8	276	219	219	219
query9	8547	2892	2824	2824
query10	462	339	285	285
query11	15970	15037	14880	14880
query12	162	117	107	107
query13	1653	577	433	433
query14	8642	5736	5687	5687
query15	207	181	195	181
query16	7134	656	462	462
query17	1182	690	597	597
query18	1970	401	316	316
query19	188	200	154	154
query20	123	115	122	115
query21	210	117	103	103
query22	4189	4388	4521	4388
query23	34043	33117	33314	33117
query24	7720	2365	2353	2353
query25	552	483	418	418
query26	1239	271	156	156
query27	2753	497	342	342
query28	4158	2221	2236	2221
query29	767	557	444	444
query30	279	212	184	184
query31	877	768	734	734
query32	78	79	71	71
query33	477	365	342	342
query34	786	829	513	513
query35	804	818	791	791
query36	975	1018	932	932
query37	119	101	81	81
query38	4018	4017	4022	4017
query39	1497	1438	1396	1396
query40	219	121	116	116
query41	60	54	54	54
query42	120	106	113	106
query43	496	500	481	481
query44	1361	856	870	856
query45	171	169	162	162
query46	856	1004	650	650
query47	1772	1816	1718	1718
query48	391	423	306	306
query49	731	464	391	391
query50	662	665	417	417
query51	4120	4104	4082	4082
query52	109	112	105	105
query53	234	273	202	202
query54	592	586	526	526
query55	94	84	83	83
query56	308	310	304	304
query57	1189	1191	1135	1135
query58	278	274	269	269
query59	2674	2657	2597	2597
query60	354	340	348	340
query61	156	146	148	146
query62	853	723	661	661
query63	234	195	200	195
query64	4174	1014	715	715
query65	4257	4219	4188	4188
query66	1140	415	327	327
query67	15715	15257	15281	15257
query68	7960	916	574	574
query69	489	319	278	278
query70	1184	1116	1125	1116
query71	396	321	301	301
query72	5561	4757	4913	4757
query73	707	653	353	353
query74	9006	9009	8971	8971
query75	3412	3064	2613	2613
query76	3550	1139	754	754
query77	567	385	325	325
query78	9677	9960	8828	8828
query79	2157	856	589	589
query80	657	532	525	525
query81	483	255	224	224
query82	213	139	104	104
query83	261	256	239	239
query84	260	106	86	86
query85	773	378	338	338
query86	337	304	303	303
query87	4192	4237	4104	4104
query88	2857	2194	2198	2194
query89	393	317	287	287
query90	1829	229	224	224
query91	140	140	105	105
query92	81	68	70	68
query93	1240	1002	644	644
query94	638	406	314	314
query95	402	315	308	308
query96	482	600	274	274
query97	2597	2650	2611	2611
query98	236	222	214	214
query99	1325	1404	1322	1322
Total cold run time: 270025 ms
Total hot run time: 185749 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.36 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1bd69c4e4a6a1ea2db3f94f435f4453ac30d43c1, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.05	0.04
query3	0.24	0.07	0.07
query4	1.62	0.10	0.11
query5	0.43	0.41	0.40
query6	1.17	0.64	0.65
query7	0.02	0.02	0.01
query8	0.04	0.04	0.03
query9	0.59	0.52	0.51
query10	0.56	0.56	0.58
query11	0.15	0.11	0.10
query12	0.15	0.12	0.12
query13	0.62	0.60	0.61
query14	0.81	0.86	0.81
query15	0.86	0.85	0.87
query16	0.38	0.41	0.38
query17	1.00	1.05	1.05
query18	0.21	0.19	0.19
query19	1.89	1.79	1.83
query20	0.01	0.01	0.01
query21	15.41	0.93	0.57
query22	0.80	1.19	0.79
query23	14.77	1.36	0.61
query24	7.20	0.70	1.24
query25	0.46	0.24	0.09
query26	0.66	0.17	0.13
query27	0.07	0.05	0.05
query28	9.80	0.90	0.42
query29	12.61	3.88	3.23
query30	3.08	3.01	2.92
query31	2.82	0.59	0.39
query32	3.24	0.55	0.48
query33	3.07	3.04	3.15
query34	16.19	5.45	4.86
query35	4.89	4.94	4.96
query36	0.67	0.50	0.49
query37	0.09	0.08	0.07
query38	0.05	0.05	0.05
query39	0.03	0.02	0.02
query40	0.17	0.14	0.14
query41	0.09	0.03	0.03
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 107.11 s
Total hot run time: 32.36 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 50.00% (4/8) 🎉
Increment coverage report
Complete coverage report

@vinlee19
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33797 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit bc6fad582b4de6cfd67a64a11f3fd45f96f08abd, data reload: false

------ Round 1 ----------------------------------
q1	17583	5271	5072	5072
q2	1925	301	180	180
q3	10294	1273	718	718
q4	10220	973	533	533
q5	7518	2463	2250	2250
q6	177	159	128	128
q7	898	754	602	602
q8	9291	1283	1116	1116
q9	6936	5160	5049	5049
q10	6894	2396	1952	1952
q11	465	301	274	274
q12	348	342	223	223
q13	17779	3659	3019	3019
q14	228	230	215	215
q15	565	483	482	482
q16	413	432	374	374
q17	612	852	362	362
q18	7774	7151	7042	7042
q19	1080	949	547	547
q20	347	343	224	224
q21	4160	3298	2443	2443
q22	1065	997	992	992
Total cold run time: 106572 ms
Total hot run time: 33797 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5172	5102	5085	5085
q2	245	320	225	225
q3	2157	2722	2314	2314
q4	1351	1777	1294	1294
q5	4196	4544	4485	4485
q6	203	161	123	123
q7	2138	1974	1798	1798
q8	2662	2549	2545	2545
q9	7382	7349	7351	7349
q10	3077	3292	2880	2880
q11	575	502	489	489
q12	889	832	628	628
q13	3423	3978	3209	3209
q14	281	308	277	277
q15	515	474	468	468
q16	442	487	474	474
q17	1192	1645	1342	1342
q18	8057	7634	7498	7498
q19	814	910	1053	910
q20	2091	2034	1877	1877
q21	4780	4370	4486	4370
q22	1101	1039	1011	1011
Total cold run time: 52743 ms
Total hot run time: 50651 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185477 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit bc6fad582b4de6cfd67a64a11f3fd45f96f08abd, data reload: false

query1	1020	389	406	389
query2	6531	1776	1745	1745
query3	6744	227	216	216
query4	25968	23679	22867	22867
query5	4338	627	511	511
query6	289	205	202	202
query7	4635	495	280	280
query8	267	217	214	214
query9	8580	2876	2891	2876
query10	500	346	287	287
query11	15725	15069	15277	15069
query12	169	112	113	112
query13	1667	548	422	422
query14	9106	5761	5732	5732
query15	207	185	168	168
query16	7226	663	468	468
query17	1194	735	615	615
query18	2010	424	331	331
query19	193	208	168	168
query20	135	121	120	120
query21	220	135	109	109
query22	4256	4214	4093	4093
query23	34329	33395	33307	33307
query24	8213	2345	2360	2345
query25	545	498	389	389
query26	1216	275	159	159
query27	2754	499	352	352
query28	4337	2256	2214	2214
query29	799	621	441	441
query30	292	231	189	189
query31	871	809	719	719
query32	83	73	79	73
query33	554	375	332	332
query34	792	836	510	510
query35	793	863	732	732
query36	960	1009	912	912
query37	125	110	87	87
query38	4086	4069	4060	4060
query39	1485	1424	1408	1408
query40	222	128	114	114
query41	61	56	56	56
query42	116	110	116	110
query43	511	510	473	473
query44	1369	849	849	849
query45	178	173	165	165
query46	885	1008	644	644
query47	1764	1789	1736	1736
query48	380	421	312	312
query49	729	499	386	386
query50	649	701	387	387
query51	4106	4138	4023	4023
query52	114	111	105	105
query53	237	262	194	194
query54	586	585	526	526
query55	90	88	86	86
query56	324	310	311	310
query57	1181	1196	1118	1118
query58	277	267	256	256
query59	2647	2723	2581	2581
query60	338	341	332	332
query61	155	123	129	123
query62	807	721	690	690
query63	233	190	194	190
query64	4421	1023	695	695
query65	4332	4286	4223	4223
query66	1168	406	345	345
query67	15822	15283	15326	15283
query68	8119	945	571	571
query69	470	332	286	286
query70	1254	1075	1081	1075
query71	447	332	327	327
query72	5590	4786	4771	4771
query73	721	630	361	361
query74	9080	9127	8980	8980
query75	3728	3089	2634	2634
query76	3634	1134	734	734
query77	803	435	340	340
query78	9479	9826	8825	8825
query79	2157	850	593	593
query80	810	553	465	465
query81	467	254	216	216
query82	418	139	113	113
query83	280	250	242	242
query84	294	105	89	89
query85	880	380	335	335
query86	343	330	281	281
query87	4280	4291	4235	4235
query88	3156	2203	2192	2192
query89	389	317	284	284
query90	1921	222	239	222
query91	149	142	112	112
query92	84	72	65	65
query93	1164	994	641	641
query94	673	391	306	306
query95	396	322	306	306
query96	492	570	273	273
query97	2619	2682	2586	2586
query98	272	222	214	214
query99	1425	1414	1268	1268
Total cold run time: 272910 ms
Total hot run time: 185477 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.21 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit bc6fad582b4de6cfd67a64a11f3fd45f96f08abd, data reload: false

query1	0.05	0.04	0.04
query2	0.08	0.04	0.04
query3	0.26	0.08	0.07
query4	1.61	0.11	0.11
query5	0.44	0.42	0.40
query6	1.16	0.64	0.64
query7	0.02	0.02	0.01
query8	0.05	0.04	0.03
query9	0.61	0.52	0.51
query10	0.57	0.56	0.57
query11	0.15	0.11	0.11
query12	0.15	0.12	0.12
query13	0.62	0.61	0.60
query14	0.80	0.82	0.86
query15	0.87	0.84	0.88
query16	0.38	0.39	0.41
query17	1.06	1.05	1.05
query18	0.21	0.19	0.19
query19	1.96	1.83	1.89
query20	0.01	0.00	0.01
query21	15.41	0.92	0.58
query22	0.76	1.15	0.81
query23	14.81	1.40	0.64
query24	6.68	1.14	1.53
query25	0.48	0.23	0.21
query26	0.72	0.19	0.14
query27	0.06	0.05	0.04
query28	9.60	0.87	0.42
query29	12.56	3.97	3.28
query30	3.05	3.04	2.98
query31	2.82	0.57	0.39
query32	3.24	0.57	0.48
query33	3.17	3.16	3.04
query34	16.04	5.51	4.95
query35	4.96	4.88	4.89
query36	0.67	0.51	0.50
query37	0.09	0.08	0.08
query38	0.05	0.05	0.04
query39	0.03	0.03	0.03
query40	0.17	0.15	0.14
query41	0.08	0.03	0.03
query42	0.03	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.58 s
Total hot run time: 33.21 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 50.00% (4/8) 🎉
Increment coverage report
Complete coverage report

Copy link
Contributor

@suxiaogang223 suxiaogang223 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 28, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morningman morningman merged commit 6622f50 into apache:master Aug 28, 2025
30 checks passed
github-actions bot pushed a commit that referenced this pull request Aug 28, 2025
…type mapping (#55051)

### What problem does this PR solve?
In PR #49623, we implemented
conversion from Paimon `VARCHAR/CHAR` types to Doris `VARCHAR/CHAR`
types. However, there are significant differences in the maximum length
constraints between these systems:

**Apache Paimon:**
- `CHAR` : Fixed-length character string declared using CHAR(n) where n
is the number of code points. n must have a value between `1` and
`2,147,483,647` (inclusive). Defaults to n=1 if no length is specified.
- `VARCHAR`: Variable-length character string declared using VARCHAR(n)
where n is the maximum number of code points. n must have a value
between `1` and `2,147,483,647` (inclusive). Defaults to n=1 if no
length is specified.

**Apache Doris:**
- `CHAR `: Maximum length is `255` characters
- `VARCHAR` : Maximum length is `65,533` characters

**Solution:**
This PR addresses the length constraint mismatch by automatically
converting oversized Paimon VARCHAR/CHAR types to Doris STRING type when
they exceed Doris limits:
- Paimon `VARCHAR` with length > 65,533 → Doris `STRING`
- Paimon `CHAR` with length > 255 → Doris `STRING`

This ensures compatibility while preserving data integrity during type
mapping from Paimon to Doris.
vinlee19 pushed a commit to vinlee19/doris that referenced this pull request Sep 1, 2025
…type mapping (apache#55051)

### What problem does this PR solve?
In PR apache#49623, we implemented
conversion from Paimon `VARCHAR/CHAR` types to Doris `VARCHAR/CHAR`
types. However, there are significant differences in the maximum length
constraints between these systems:

**Apache Paimon:**
- `CHAR` : Fixed-length character string declared using CHAR(n) where n
is the number of code points. n must have a value between `1` and
`2,147,483,647` (inclusive). Defaults to n=1 if no length is specified.
- `VARCHAR`: Variable-length character string declared using VARCHAR(n)
where n is the maximum number of code points. n must have a value
between `1` and `2,147,483,647` (inclusive). Defaults to n=1 if no
length is specified.

**Apache Doris:**
- `CHAR `: Maximum length is `255` characters
- `VARCHAR` : Maximum length is `65,533` characters

**Solution:**
This PR addresses the length constraint mismatch by automatically
converting oversized Paimon VARCHAR/CHAR types to Doris STRING type when
they exceed Doris limits:
- Paimon `VARCHAR` with length > 65,533 → Doris `STRING`
- Paimon `CHAR` with length > 255 → Doris `STRING`

This ensures compatibility while preserving data integrity during type
mapping from Paimon to Doris.

(cherry picked from commit 6622f50)
morningman pushed a commit that referenced this pull request Sep 4, 2025
### What problem does this PR solve?
Related PR: #55051 #55070
#55051 changed the paimon Type from `varchar(2147483647)` to `text`
#55070 added the case
morrySnow pushed a commit that referenced this pull request Sep 4, 2025
wenzhenghu pushed a commit to wenzhenghu/doris that referenced this pull request Sep 8, 2025
### What problem does this PR solve?
Related PR: apache#55051 apache#55070
apache#55051 changed the paimon Type from `varchar(2147483647)` to `text`
apache#55070 added the case
@morrySnow morrySnow mentioned this pull request Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants