Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](hive) Make doris read hive text table parameters and behavior consistent with hive #37638

Merged
merged 4 commits into from
Jul 12, 2024

Conversation

suxiaogang223
Copy link
Contributor

Proposed changes

When hive reads the text table, it will first try to parse "field.delim" into Byte type. If it fails, it will take the first character as Byte. If "field.delim" is not set, use the same method to parse "serialization.format".

separatorCandidates.add(LazyUtils.getByte(tableProperties.getProperty(serdeConstants.FIELD_DELIM,
        tableProperties.getProperty(serdeConstants.SERIALIZATION_FORMAT)), DefaultSeparators[0]));
...
  public static byte getByte(String altValue, byte defaultVal) {
    if (altValue != null && altValue.length() > 0) {
      try {
        return Byte.parseByte(altValue);
      } catch (NumberFormatException e) {
        return (byte) altValue.charAt(0);
      }
    }
    return defaultVal;
  }

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39798 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit dd791a97d359bb5075bf02493706d6893565d9f2, data reload: false

------ Round 1 ----------------------------------
q1	17606	4327	4226	4226
q2	2010	188	185	185
q3	10495	1195	1013	1013
q4	10188	719	771	719
q5	7546	2693	2629	2629
q6	220	140	139	139
q7	954	599	607	599
q8	9206	2073	2051	2051
q9	8830	6549	6593	6549
q10	8836	3750	3814	3750
q11	455	240	241	240
q12	472	224	224	224
q13	18785	2958	3003	2958
q14	284	241	244	241
q15	528	474	482	474
q16	505	380	379	379
q17	968	659	688	659
q18	8046	7507	7516	7507
q19	8431	1418	1392	1392
q20	811	335	325	325
q21	4928	3214	3769	3214
q22	383	331	325	325
Total cold run time: 120487 ms
Total hot run time: 39798 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4396	4306	4235	4235
q2	356	283	266	266
q3	3169	2950	2931	2931
q4	2007	1802	1729	1729
q5	5553	5513	5614	5513
q6	224	139	140	139
q7	2226	1846	1852	1846
q8	3277	3407	3398	3398
q9	8776	8971	8792	8792
q10	4054	3805	3862	3805
q11	624	513	508	508
q12	836	655	622	622
q13	17166	3182	3197	3182
q14	321	288	296	288
q15	512	483	483	483
q16	479	443	437	437
q17	1819	1541	1506	1506
q18	8103	7855	7872	7855
q19	1727	1580	1599	1580
q20	2233	1857	1878	1857
q21	5002	4833	4949	4833
q22	772	546	565	546
Total cold run time: 73632 ms
Total hot run time: 56351 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 175888 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit dd791a97d359bb5075bf02493706d6893565d9f2, data reload: false

query1	908	371	361	361
query2	6357	2439	2320	2320
query3	6655	203	211	203
query4	28123	17636	17177	17177
query5	3545	491	494	491
query6	257	165	172	165
query7	4587	296	295	295
query8	299	283	282	282
query9	8402	2425	2404	2404
query10	441	283	283	283
query11	10582	10179	10187	10179
query12	121	88	84	84
query13	1643	377	382	377
query14	9989	7915	7748	7748
query15	237	188	195	188
query16	7121	327	316	316
query17	1741	561	530	530
query18	1806	276	277	276
query19	194	149	151	149
query20	92	83	82	82
query21	202	127	129	127
query22	4415	4105	4062	4062
query23	33958	33864	33687	33687
query24	10637	2998	2958	2958
query25	609	393	397	393
query26	712	158	153	153
query27	2205	275	278	275
query28	6319	2154	2149	2149
query29	882	637	644	637
query30	262	150	151	150
query31	985	773	775	773
query32	95	57	57	57
query33	648	340	297	297
query34	901	503	506	503
query35	682	627	579	579
query36	1123	1007	1005	1005
query37	151	84	92	84
query38	3017	2861	2818	2818
query39	902	860	824	824
query40	206	122	122	122
query41	54	50	53	50
query42	116	102	98	98
query43	576	558	574	558
query44	1133	748	746	746
query45	194	167	165	165
query46	1096	744	733	733
query47	1853	1768	1777	1768
query48	386	289	297	289
query49	827	419	419	419
query50	790	395	393	393
query51	6874	6817	6812	6812
query52	109	94	96	94
query53	359	297	296	296
query54	868	455	455	455
query55	77	75	74	74
query56	280	274	281	274
query57	1126	1083	1058	1058
query58	249	256	263	256
query59	3268	3198	3144	3144
query60	300	294	295	294
query61	124	118	118	118
query62	799	655	662	655
query63	347	309	308	308
query64	9310	2217	1691	1691
query65	3174	3121	3107	3107
query66	747	329	329	329
query67	15710	15033	15124	15033
query68	4506	541	538	538
query69	657	435	342	342
query70	1158	1202	1162	1162
query71	391	291	288	288
query72	7328	5594	5704	5594
query73	744	327	322	322
query74	6004	5638	5595	5595
query75	3383	2685	2730	2685
query76	2458	1007	1035	1007
query77	661	308	303	303
query78	11966	10231	9047	9047
query79	2964	534	537	534
query80	1047	480	498	480
query81	587	223	220	220
query82	521	137	130	130
query83	312	166	167	166
query84	268	88	84	84
query85	709	335	311	311
query86	468	316	317	316
query87	3321	3141	3118	3118
query88	3734	2388	2362	2362
query89	483	401	373	373
query90	1726	195	198	195
query91	138	104	102	102
query92	59	50	50	50
query93	3277	514	522	514
query94	1081	286	213	213
query95	411	330	317	317
query96	602	284	275	275
query97	3176	3067	3007	3007
query98	224	196	204	196
query99	1559	1262	1300	1262
Total cold run time: 278680 ms
Total hot run time: 175888 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.11 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit dd791a97d359bb5075bf02493706d6893565d9f2, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.04	0.04
query3	0.23	0.06	0.05
query4	1.73	0.08	0.08
query5	0.49	0.48	0.47
query6	1.13	0.72	0.71
query7	0.02	0.01	0.01
query8	0.06	0.04	0.05
query9	0.56	0.49	0.48
query10	0.55	0.55	0.54
query11	0.15	0.12	0.11
query12	0.15	0.12	0.13
query13	0.61	0.59	0.58
query14	0.75	0.80	0.80
query15	0.85	0.82	0.82
query16	0.36	0.37	0.37
query17	1.02	1.02	1.00
query18	0.22	0.21	0.21
query19	1.73	1.71	1.69
query20	0.02	0.01	0.01
query21	15.41	0.77	0.66
query22	4.04	7.51	2.32
query23	18.33	1.37	1.27
query24	2.05	0.22	0.24
query25	0.17	0.09	0.08
query26	0.30	0.22	0.21
query27	0.45	0.24	0.23
query28	13.28	1.01	1.00
query29	12.58	3.27	3.32
query30	0.25	0.05	0.05
query31	2.89	0.38	0.39
query32	3.25	0.49	0.48
query33	2.89	2.91	2.87
query34	17.06	4.40	4.37
query35	4.45	4.48	4.50
query36	0.64	0.48	0.48
query37	0.19	0.15	0.15
query38	0.15	0.15	0.15
query39	0.04	0.03	0.03
query40	0.15	0.13	0.12
query41	0.09	0.05	0.04
query42	0.05	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 109.5 s
Total hot run time: 31.11 s

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 12, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@kaka11chen kaka11chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit b554534 into apache:master Jul 12, 2024
26 of 29 checks passed
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Jul 12, 2024
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Jul 15, 2024
…onsistent with hive (apache#37638)

## Proposed changes

When hive reads the text table, it will first try to parse "field.delim"
into Byte type. If it fails, it will take the first character as Byte.
If "field.delim" is not set, use the same method to parse
"serialization.format".

```java

separatorCandidates.add(LazyUtils.getByte(tableProperties.getProperty(serdeConstants.FIELD_DELIM,
        tableProperties.getProperty(serdeConstants.SERIALIZATION_FORMAT)), DefaultSeparators[0]));
...
  public static byte getByte(String altValue, byte defaultVal) {
    if (altValue != null && altValue.length() > 0) {
      try {
        return Byte.parseByte(altValue);
      } catch (NumberFormatException e) {
        return (byte) altValue.charAt(0);
      }
    }
    return defaultVal;
  }
```
yiguolei pushed a commit that referenced this pull request Jul 16, 2024
…and behavior consistent with hive (#37840)

## Proposed changes

pick from master #37638

<!--Describe your changes.-->
@suxiaogang223 suxiaogang223 deleted the hive_text_column_sperator_fix branch July 16, 2024 15:31
seawinde pushed a commit to seawinde/doris that referenced this pull request Jul 17, 2024
…onsistent with hive (apache#37638)

## Proposed changes

When hive reads the text table, it will first try to parse "field.delim"
into Byte type. If it fails, it will take the first character as Byte.
If "field.delim" is not set, use the same method to parse
"serialization.format".

```java

separatorCandidates.add(LazyUtils.getByte(tableProperties.getProperty(serdeConstants.FIELD_DELIM,
        tableProperties.getProperty(serdeConstants.SERIALIZATION_FORMAT)), DefaultSeparators[0]));
...
  public static byte getByte(String altValue, byte defaultVal) {
    if (altValue != null && altValue.length() > 0) {
      try {
        return Byte.parseByte(altValue);
      } catch (NumberFormatException e) {
        return (byte) altValue.charAt(0);
      }
    }
    return defaultVal;
  }
```
dataroaring pushed a commit that referenced this pull request Jul 17, 2024
…onsistent with hive (#37638)

## Proposed changes

When hive reads the text table, it will first try to parse "field.delim"
into Byte type. If it fails, it will take the first character as Byte.
If "field.delim" is not set, use the same method to parse
"serialization.format".

```java

separatorCandidates.add(LazyUtils.getByte(tableProperties.getProperty(serdeConstants.FIELD_DELIM,
        tableProperties.getProperty(serdeConstants.SERIALIZATION_FORMAT)), DefaultSeparators[0]));
...
  public static byte getByte(String altValue, byte defaultVal) {
    if (altValue != null && altValue.length() > 0) {
      try {
        return Byte.parseByte(altValue);
      } catch (NumberFormatException e) {
        return (byte) altValue.charAt(0);
      }
    }
    return defaultVal;
  }
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.5-merged dev/3.0.1-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants