Skip to content

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Mar 27, 2024

Proposed changes

  1. Fix iceberg catalog bug

    This PR [feature](multi-catalog)support hms catalog create and drop table/db #30198 change the logic of IcebergHMSExternalCatalog.java,
    to get locationUrl by calling hive metastore's getCatalog() method.
    But this method only exists in hive 3+. So it will fail if we using hive 2.x.

    I temporary remove this logic, because this logic is only used from iceberg table writing.
    Which is still under development. We will rethink this logic later.

  2. Fix test cases

    Some of P2 test cases missed order_qt. And because the output format of the floating point
    type is changed, some result in out files need to be regenerated.

Further comments

Issue #31442

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@morningman morningman marked this pull request as draft March 27, 2024 08:51
@morningman
Copy link
Contributor Author

run buildall

@morningman morningman marked this pull request as ready for review March 27, 2024 10:11
@doris-robot
Copy link

TPC-H: Total hot run time: 38411 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3b790d4bc7b9cd10d4f57ecf0370b30a0763da59, data reload: false

------ Round 1 ----------------------------------
q1	18032	4939	4315	4315
q2	2489	167	157	157
q3	11581	1159	1243	1159
q4	10800	791	808	791
q5	7757	3098	3037	3037
q6	208	128	125	125
q7	1042	613	570	570
q8	9347	2036	2020	2020
q9	7188	6649	6629	6629
q10	8379	3490	3566	3490
q11	424	223	226	223
q12	373	200	203	200
q13	17823	2866	2860	2860
q14	243	202	221	202
q15	512	456	479	456
q16	479	370	385	370
q17	959	549	631	549
q18	7359	6500	6395	6395
q19	1553	1449	1498	1449
q20	567	294	273	273
q21	3621	2977	2829	2829
q22	361	327	312	312
Total cold run time: 111097 ms
Total hot run time: 38411 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4085	4050	4059	4050
q2	332	236	232	232
q3	3008	2840	2878	2840
q4	1862	1579	1564	1564
q5	5356	5343	5395	5343
q6	192	116	118	116
q7	2279	1876	1871	1871
q8	3206	3285	3297	3285
q9	8818	8743	8673	8673
q10	3784	3795	3781	3781
q11	536	429	435	429
q12	726	538	538	538
q13	16893	2880	2862	2862
q14	279	242	268	242
q15	488	454	462	454
q16	465	430	428	428
q17	1733	1487	1460	1460
q18	7548	7288	7212	7212
q19	1639	1536	1452	1452
q20	1895	1759	1698	1698
q21	4911	4571	4697	4571
q22	540	456	463	456
Total cold run time: 70575 ms
Total hot run time: 53557 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 181592 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3b790d4bc7b9cd10d4f57ecf0370b30a0763da59, data reload: false

query1	927	388	353	353
query2	6533	2016	1912	1912
query3	6696	210	216	210
query4	31857	21344	21380	21344
query5	4276	396	398	396
query6	276	192	175	175
query7	4638	285	308	285
query8	229	176	180	176
query9	8997	2344	2313	2313
query10	571	253	247	247
query11	17362	14212	14279	14212
query12	136	94	91	91
query13	1631	423	414	414
query14	10262	7089	8052	7089
query15	268	200	195	195
query16	8244	263	265	263
query17	2010	575	550	550
query18	2110	290	285	285
query19	346	155	152	152
query20	97	85	94	85
query21	209	129	133	129
query22	4987	4786	4837	4786
query23	33627	32909	32980	32909
query24	10557	2893	2822	2822
query25	623	407	391	391
query26	1433	156	157	156
query27	2971	351	347	347
query28	7411	1857	1881	1857
query29	925	673	632	632
query30	304	151	153	151
query31	991	746	745	745
query32	95	59	59	59
query33	782	268	245	245
query34	1042	492	497	492
query35	823	617	612	612
query36	1019	894	900	894
query37	127	70	67	67
query38	3560	3454	3418	3418
query39	1487	1433	1428	1428
query40	213	115	114	114
query41	53	49	49	49
query42	106	99	96	96
query43	480	461	458	458
query44	1183	726	727	726
query45	289	268	259	259
query46	1126	684	690	684
query47	1926	1859	1847	1847
query48	457	366	361	361
query49	1111	345	337	337
query50	766	375	370	370
query51	6704	6638	6677	6638
query52	120	95	94	94
query53	353	278	279	278
query54	298	247	253	247
query55	85	83	81	81
query56	258	231	230	230
query57	1229	1140	1155	1140
query58	241	208	208	208
query59	2819	2552	2665	2552
query60	271	245	250	245
query61	116	112	112	112
query62	679	451	506	451
query63	314	284	285	284
query64	5940	3920	4080	3920
query65	3061	3019	3028	3019
query66	1251	367	353	353
query67	15556	15292	15219	15219
query68	7709	518	520	518
query69	641	384	404	384
query70	1286	1135	1131	1131
query71	518	265	269	265
query72	6828	2721	2554	2554
query73	721	315	326	315
query74	8330	6523	6359	6359
query75	3953	2158	2222	2158
query76	5120	904	911	904
query77	686	253	257	253
query78	10919	10295	10037	10037
query79	8382	520	532	520
query80	1422	397	376	376
query81	547	220	221	220
query82	911	93	89	89
query83	210	144	145	144
query84	290	79	79	79
query85	1531	330	317	317
query86	482	328	279	279
query87	3717	3603	3563	3563
query88	4899	2311	2308	2308
query89	519	370	368	368
query90	2010	180	181	180
query91	176	156	138	138
query92	60	50	50	50
query93	6790	493	480	480
query94	1115	180	179	179
query95	440	337	341	337
query96	627	280	267	267
query97	2664	2447	2483	2447
query98	222	228	209	209
query99	1268	905	907	905
Total cold run time: 312437 ms
Total hot run time: 181592 ms

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 3b790d4bc7b9cd10d4f57ecf0370b30a0763da59 with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       13.8 seconds inserted 10000000 Rows, about 724K ops/s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 27, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit b29d395 into apache:master Mar 27, 2024
morningman added a commit that referenced this pull request Mar 27, 2024
1. Fix iceberg catalog bug

    This PR #30198 change the logic of `IcebergHMSExternalCatalog.java`,
    to get locationUrl by calling hive metastore's `getCatalog()` method.
    But this method only exists in hive 3+. So it will fail if we using hive 2.x.

    I temporary remove this logic, because this logic is only used from iceberg table writing.
    Which is still under development. We will rethink this logic later.

2. Fix test cases

    Some of P2 test cases missed `order_qt`. And because the output format of the floating point
    type is changed, some result in `out` files need to be regenerated.
Jibing-Li added a commit that referenced this pull request Mar 29, 2024
* [fix](merge cloud) Fix cloud be set be tag map (#32864)

* [chore] Add gavinchou to collaborators (#32881)

* [chore](show) support statement to show views from table (#32358)

MySQL [test]> show views;
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
| t2_view        |
+----------------+
2 rows in set (0.00 sec)

MySQL [test]> show views like '%t1%';
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
+----------------+
1 row in set (0.01 sec)

MySQL [test]> show views where create_time > '2024-03-18';
+----------------+
| Tables_in_test |
+----------------+
| t2_view        |
+----------------+
1 row in set (0.02 sec)

* [Enhancement](ranger) Disable some permission operations when Ranger or LDAP are enabled (#32538)

Disable some permission operations when Ranger or LDAP are enabled.

* [chore](ci) exclude unstable trino_connector case (#32892)

Co-authored-by: stephen <hello-stephen@qq.com>

* [fix](Nereids) NPE when create table with implicit index type (#32893)

* [improvement](mtmv) Support more join types for query rewriting by materialized view (#32685)

This pattern of rewriting is supported for multi-table joins and supported join types is as following:

INNER JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
LEFT SEMI JOIN
RIGHT SEMI JOIN
LEFT ANTI JOIN
RIGHT ANTI JOIN

* [Serde](Variant) support arrow serialization for varint type (#32780)

* [fix](multicatalog) fix no data error when read hive table on cosn (#32815)

Currently, when reading a hive on cosn table, doris return empty result, but the table has data.
iceberg on cosn is ok.
The reason is misuse of cosn's file sytem. according to cosn's doc, its fs.cosn.impl should be org.apache.hadoop.fs.CosFileSystem

* [fix](nereids)EliminateGroupByConstant should replace agg's output after removing constant group by keys (#32878)

* [Fix](executor)Fix regression test for test_active_queries/test_backend_active_tasks #32899

* [fix](iceberg) fix iceberg catalog bug and p2 test cases (#32898)

1. Fix iceberg catalog bug

    This PR #30198 change the logic of `IcebergHMSExternalCatalog.java`,
    to get locationUrl by calling hive metastore's `getCatalog()` method.
    But this method only exists in hive 3+. So it will fail if we using hive 2.x.

    I temporary remove this logic, because this logic is only used from iceberg table writing.
    Which is still under development. We will rethink this logic later.

2. Fix test cases

    Some of P2 test cases missed `order_qt`. And because the output format of the floating point
    type is changed, some result in `out` files need to be regenerated.

* [revert](jni) revert part of #32455 (#32904)

* [fix](spill) Avoid releasing resources while spill tasks are executing (#32783)

* [chore](log) print query id before logging profile in be.INFO (#32922)

* [fix](grace-exit) Stop incorrectly of reportwork cause heap use after free #32929

* [improvement](decommission be) decommission check replica num (#32748)

* [fix](arrow-flight) Fix reach limit of connections error (#32911)

Fix Reach limit of connections error
in fe.conf , arrow_flight_token_cache_size is mandatory less than qe_max_connection/2. arrow flight sql is a stateless protocol, connection is usually not actively disconnected, bearer token is evict from the cache will unregister ConnectContext.

Fix ConnectContext.command not be reset to COM_SLEEP in time, this will result in frequent kill connection after query timeout.

Fix bearer token evict log and exception.

TODO: use arrow flight session: https://mail.google.com/mail/u/0/#inbox/FMfcgzGxRdxBLQLTcvvtRpqsvmhrHpdH

* [bugfix](cloud) few variable not initialized (#32868)

../../cloud/src/recycler/meta_checker.cpp
can cause uninitialised memory read.

* [fix](arrow-flight) Fix arrow flight sql compatible with JDK 17 and upgrade arrow 15.0.2 (#32796)

--add-opens=java.base/java.nio=ALL-UNNAMED, see: https://arrow.apache.org/docs/java/install.html#java-compatibility
groovy use flight sql connection to execute query SUM(MAX(c1) OVER (PARTITION BY)) report error: AGGREGATE clause must not contain analytic expressions, but no problem in Java execute it with jdbc::arrow-flight-sql.
groovy not support print arrow array type, throw IndexOutOfBoundsException.
"arrow_flight_sql" not support two phase read
./run-regression-test.sh --run --clean -g arrow_flight_sql

* [fix](spill) SpillStream's writer maybe may not have been finalized (#32931)

* [improvement](spill) Disable DistinctStreamingAgg when spill is enabled (#32932)

* [Improve](inverted_index) update clucene and improve array inverted index writer  (#32436)

* [Performance](exec) replace SipHash in function by XXHash (#32919)

* [feature](agg) add aggregate function sum0 (#32541)

* [improvement](mtmv) Support to get tables in materialized view when collecting table in plan (#32797)

Support to get tables in materialized view when collecting table in plan

table scehma as fllowing:

create materialized view mv1
BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
DISTRIBUTED BY RANDOM BUCKETS 1 
PROPERTIES ('replication_num' = '1')
 as 
select 
  t1.c1, 
  t3.c2 
from 
  table1 t1 
  inner join table3 t3 on t1.c1 = t3.c2

if get table from the plan as follwoing, we can get [table1, table3, table2], the mv1 is expanded to get base tables;

SELECT 
  mv1.*, 
  uuid() 
FROM 
  mv1 LEFT SEMI 
  JOIN table2 ON mv1.c1 = table2.c1 
WHERE 
  mv1.c1 IN (
    SELECT 
      c1 
    FROM 
      table2
  ) 
  OR mv1.c1 < 10

* [enhance](mtmv)support olap table partition column is null (#32698)

* [enhancement](cloud) add table version to cloud (#32738)

Add table version to cloud.

In Fe:
Get: If Fe is cloud mode, get table version from meta service.
Update: Op drop/replace temp partition, commit transaction.

In meta service:
Add: create Index. init value is 1.
Remove: by recycler.
Update: commit/drop partition rpc, commit txn rpc. Atomic++.

* [fix](cloud) schema change from not null to null (#32913)

1. Use equals instead of == for type comparing
2. null bitmap size is reisze by size of ref column.

* [feature](Nereids): add ColumnPruningPostProcessor. (#32800)

* [case](rowpolicy)fix row policy has been exist (#32880)

* [fix](pipeline) fix use error row desc when origin block clear (#32803)

* [fix](Nereids) support variant column with index when create table (#32948)

* [opt](Nereids) support create table with variant type (#32953)

* [test](insert-overwrite) Add insert overwrite auto detect concurrency cases (#32935)

* [fix](compile) fe cannot compile in idea (#32955)

* [enhancement](plsql) Support select * from routines (#32866)

Support show of plsql procedure using select * from routines.

* [fix](trino-connector) fix `NoClassDefFoundError` of hudi `Utils` class (#32846)

Due to the change of this PR #32455 , the `trino-connector-scanner` package cannot access the `hudi_scanner` package, so the exception NoclassDeffounderror will appear.

We need to write a separate Utils class.

* [exec](column) change some complex column move to noexcept (#32954)

* [Enhancement](data skew) extends show data skew (#32732)

* [chore](test) let suite compatible with Nereids (#32964)

* Support identical column name in different index. (#32792)

* Limit the max string length to 1024 while collecting column stats to control BE memory usage. (#32470)

* [fix](merge-iterator) fix NOT_IMPLEMENTED_ERROR when read next block view (#32961)

* [improvement](executor)Add tag property for workload group #32874

* [fix](auth)unified workload and resource permission logic (#32907)

- `Grant resource` can no longer grant global `usage_priv`
-  `grant resource %` instead of `grant resource *`

before change:
```
grant usage_priv on resource * to f;
show grants for f\G
*************************** 1. row ***************************
      UserIdentity: 'f'@'%'
           Comment: 
          Password: No
             Roles: 
       GlobalPrivs: Usage_priv 
      CatalogPrivs: NULL
     DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv 
        TablePrivs: NULL
          ColPrivs: NULL
     ResourcePrivs: NULL
 CloudClusterPrivs: NULL
WorkloadGroupPrivs: normal: Usage_priv 
```
after change
```
grant usage_priv on resource '%' to f;
show grants for f\G
*************************** 1. row ***************************
      UserIdentity: 'f'@'%'
           Comment: 
          Password: No
             Roles: 
       GlobalPrivs: NULL
      CatalogPrivs: NULL
     DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv 
        TablePrivs: NULL
          ColPrivs: NULL
     ResourcePrivs: %: Usage_priv 
 CloudClusterPrivs: NULL
WorkloadGroupPrivs: normal: Usage_priv 

```

---------

Co-authored-by: yujun <yu.jun.reach@gmail.com>
Co-authored-by: Gavin Chou <gavineaglechou@gmail.com>
Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com>
Co-authored-by: yongjinhou <109586248+yongjinhou@users.noreply.github.com>
Co-authored-by: Dongyang Li <hello_stephen@qq.com>
Co-authored-by: stephen <hello-stephen@qq.com>
Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com>
Co-authored-by: seawinde <149132972+seawinde@users.noreply.github.com>
Co-authored-by: lihangyu <15605149486@163.com>
Co-authored-by: Yulei-Yang <yulei.yang0699@gmail.com>
Co-authored-by: starocean999 <40539150+starocean999@users.noreply.github.com>
Co-authored-by: wangbo <wangbo@apache.org>
Co-authored-by: Mingyu Chen <morningman@163.com>
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
Co-authored-by: zhiqiang <seuhezhiqiang@163.com>
Co-authored-by: Xinyi Zou <zouxinyi02@gmail.com>
Co-authored-by: Vallish Pai <vallishpai@gmail.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: Jensen <czjourney@163.com>
Co-authored-by: zhangdong <493738387@qq.com>
Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>
Co-authored-by: jakevin <jakevingoo@gmail.com>
Co-authored-by: Mryange <59914473+Mryange@users.noreply.github.com>
Co-authored-by: zclllyybb <zhaochangle@selectdb.com>
Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com>
Co-authored-by: Xin Liao <liaoxinbit@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants