Skip to content

Conversation

@eldenmoon
Copy link
Member

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@eldenmoon
Copy link
Member Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

variant.insert(field);
}

void DataTypeObjectSerDe::write_column_to_arrow(const IColumn& column, const NullMap* null_map,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method 'write_column_to_arrow' can be made static [readability-convert-member-functions-to-static]

Suggested change
void DataTypeObjectSerDe::write_column_to_arrow(const IColumn& column, const NullMap* null_map,
static void DataTypeObjectSerDe::write_column_to_arrow(const IColumn& column, const NullMap* null_map,

be/src/vec/data_types/serde/data_type_object_serde.cpp:99:

-                                                 int end) const {
+                                                 int end) {

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

variant.insert(field);
}

void DataTypeObjectSerDe::write_column_to_arrow(const IColumn& column, const NullMap* null_map,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method 'write_column_to_arrow' can be made static [readability-convert-member-functions-to-static]

Suggested change
void DataTypeObjectSerDe::write_column_to_arrow(const IColumn& column, const NullMap* null_map,
static void DataTypeObjectSerDe::write_column_to_arrow(const IColumn& column, const NullMap* null_map,

be/src/vec/data_types/serde/data_type_object_serde.cpp:99:

-                                                 int end, const cctz::time_zone& ctz) const {
+                                                 int end, const cctz::time_zone& ctz) {

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

variant.insert(field);
}

void DataTypeObjectSerDe::write_column_to_arrow(const IColumn& column, const NullMap* null_map,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method 'write_column_to_arrow' can be made static [readability-convert-member-functions-to-static]

Suggested change
void DataTypeObjectSerDe::write_column_to_arrow(const IColumn& column, const NullMap* null_map,
static void DataTypeObjectSerDe::write_column_to_arrow(const IColumn& column, const NullMap* null_map,

be/src/vec/data_types/serde/data_type_object_serde.cpp:100:

-                                                 int end, const cctz::time_zone& ctz) const {
+                                                 int end, const cctz::time_zone& ctz) {

Copy link
Contributor

@qidaye qidaye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Mar 26, 2024
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

"write_column_to_arrow with type " + column.get_name());
}
const cctz::time_zone& ctz) const override;
void read_column_from_arrow(IColumn& column, const arrow::Array* arrow_array, int start,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dose we need to support read from arrow? for some spark situation : make string value to variant data type value into doris

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently this pr is to solve the connector problem for variant, for read from arrow we could use cast expr like cast(xxx as variant)

@eldenmoon
Copy link
Member Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.26% (8736/24777)
Line Coverage: 27.04% (71524/264533)
Region Coverage: 26.27% (37097/141204)
Branch Coverage: 23.17% (18969/81882)
Coverage Report: http://coverage.selectdb-in.cc/coverage/65bb1da3d50fd8e77e8f0c9ee0a72708be2984f8_65bb1da3d50fd8e77e8f0c9ee0a72708be2984f8/report/index.html

@eldenmoon
Copy link
Member Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.27% (8738/24777)
Line Coverage: 27.05% (71550/264533)
Region Coverage: 26.29% (37119/141204)
Branch Coverage: 23.18% (18983/81882)
Coverage Report: http://coverage.selectdb-in.cc/coverage/2c53792a941906acb40a89ac84d5148cf9c3f0b0_2c53792a941906acb40a89ac84d5148cf9c3f0b0/report/index.html

@eldenmoon eldenmoon requested a review from amorynan March 26, 2024 07:42
@eldenmoon
Copy link
Member Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.27% (8738/24777)
Line Coverage: 27.05% (71556/264533)
Region Coverage: 26.28% (37115/141204)
Branch Coverage: 23.18% (18980/81882)
Coverage Report: http://coverage.selectdb-in.cc/coverage/9d903d5a0d0eba294827a5cacbf1523f2e458ff8_9d903d5a0d0eba294827a5cacbf1523f2e458ff8/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 37764 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9d903d5a0d0eba294827a5cacbf1523f2e458ff8, data reload: false

------ Round 1 ----------------------------------
q1	17672	4206	4102	4102
q2	2128	153	148	148
q3	10658	1167	1195	1167
q4	10241	764	786	764
q5	7454	3023	2961	2961
q6	207	126	123	123
q7	1041	586	564	564
q8	9328	1961	1967	1961
q9	7185	6611	6493	6493
q10	8420	3371	3543	3371
q11	430	220	220	220
q12	429	210	201	201
q13	17791	2879	2852	2852
q14	232	195	198	195
q15	509	452	485	452
q16	489	371	367	367
q17	953	638	584	584
q18	7067	6437	6422	6422
q19	1863	1389	1436	1389
q20	544	252	257	252
q21	3496	2894	2867	2867
q22	364	309	311	309
Total cold run time: 108501 ms
Total hot run time: 37764 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4150	4100	4082	4082
q2	322	232	232	232
q3	2973	2831	2862	2831
q4	1860	1581	1588	1581
q5	5300	5377	5305	5305
q6	192	113	116	113
q7	2230	1867	1865	1865
q8	3140	3278	3246	3246
q9	8686	8655	8650	8650
q10	3778	3786	3779	3779
q11	547	452	468	452
q12	726	556	554	554
q13	16913	2836	2875	2836
q14	279	252	263	252
q15	498	455	469	455
q16	465	423	423	423
q17	1740	1491	1437	1437
q18	7558	7192	7072	7072
q19	1604	1569	1497	1497
q20	1931	1704	1731	1704
q21	4781	4679	4796	4679
q22	534	438	467	438
Total cold run time: 70207 ms
Total hot run time: 53483 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 181126 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 9d903d5a0d0eba294827a5cacbf1523f2e458ff8, data reload: false

query1	929	365	350	350
query2	6549	2089	1975	1975
query3	6708	211	212	211
query4	31701	21372	21264	21264
query5	4236	392	396	392
query6	264	172	176	172
query7	4636	288	288	288
query8	227	176	180	176
query9	9419	2348	2325	2325
query10	569	242	246	242
query11	17157	14469	14232	14232
query12	145	94	87	87
query13	1618	410	421	410
query14	9774	7445	7843	7445
query15	280	189	193	189
query16	8168	260	253	253
query17	1957	554	540	540
query18	2100	276	273	273
query19	333	149	151	149
query20	90	86	88	86
query21	201	132	125	125
query22	5065	4848	4723	4723
query23	33542	32572	32625	32572
query24	10840	2829	2875	2829
query25	636	396	392	392
query26	1257	157	158	157
query27	2602	346	356	346
query28	7431	1892	1905	1892
query29	890	678	620	620
query30	304	148	152	148
query31	955	743	706	706
query32	101	62	55	55
query33	773	252	256	252
query34	1021	489	494	489
query35	834	623	610	610
query36	1069	884	885	884
query37	127	64	64	64
query38	3564	3469	3434	3434
query39	1480	1434	1406	1406
query40	209	113	118	113
query41	52	47	48	47
query42	106	97	97	97
query43	492	439	438	438
query44	1130	721	710	710
query45	283	256	261	256
query46	1119	697	699	697
query47	1934	1858	1891	1858
query48	455	359	355	355
query49	1129	333	345	333
query50	758	371	372	371
query51	6898	6694	6770	6694
query52	106	94	87	87
query53	350	274	276	274
query54	310	232	243	232
query55	93	78	78	78
query56	252	226	227	226
query57	1214	1139	1143	1139
query58	249	211	213	211
query59	2932	2533	2721	2533
query60	270	251	245	245
query61	117	112	111	111
query62	656	444	443	443
query63	302	275	277	275
query64	5347	3998	4090	3998
query65	3046	2996	3022	2996
query66	879	370	372	370
query67	15377	14783	14664	14664
query68	9040	524	531	524
query69	668	394	386	386
query70	1372	1179	1172	1172
query71	525	269	270	269
query72	6682	2725	2535	2535
query73	1593	325	312	312
query74	8062	6375	6538	6375
query75	3855	2198	2225	2198
query76	5688	901	865	865
query77	657	245	254	245
query78	11001	10267	10159	10159
query79	11675	516	512	512
query80	2053	371	364	364
query81	514	210	212	210
query82	244	86	82	82
query83	226	141	144	141
query84	283	85	81	81
query85	1222	316	310	310
query86	361	273	299	273
query87	3785	3576	3551	3551
query88	5414	2365	2363	2363
query89	476	371	359	359
query90	2029	172	172	172
query91	175	136	135	135
query92	62	48	46	46
query93	5804	492	482	482
query94	1344	181	177	177
query95	430	331	327	327
query96	607	266	268	266
query97	2639	2504	2520	2504
query98	227	212	210	210
query99	1103	907	913	907
Total cold run time: 315433 ms
Total hot run time: 181126 ms

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 9d903d5a0d0eba294827a5cacbf1523f2e458ff8 with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       20.2 seconds inserted 10000000 Rows, about 495K ops/s

Copy link
Contributor

@amorynan amorynan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eldenmoon eldenmoon merged commit b2eb64a into apache:master Mar 27, 2024
Jibing-Li added a commit that referenced this pull request Mar 29, 2024
* [fix](merge cloud) Fix cloud be set be tag map (#32864)

* [chore] Add gavinchou to collaborators (#32881)

* [chore](show) support statement to show views from table (#32358)

MySQL [test]> show views;
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
| t2_view        |
+----------------+
2 rows in set (0.00 sec)

MySQL [test]> show views like '%t1%';
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
+----------------+
1 row in set (0.01 sec)

MySQL [test]> show views where create_time > '2024-03-18';
+----------------+
| Tables_in_test |
+----------------+
| t2_view        |
+----------------+
1 row in set (0.02 sec)

* [Enhancement](ranger) Disable some permission operations when Ranger or LDAP are enabled (#32538)

Disable some permission operations when Ranger or LDAP are enabled.

* [chore](ci) exclude unstable trino_connector case (#32892)

Co-authored-by: stephen <hello-stephen@qq.com>

* [fix](Nereids) NPE when create table with implicit index type (#32893)

* [improvement](mtmv) Support more join types for query rewriting by materialized view (#32685)

This pattern of rewriting is supported for multi-table joins and supported join types is as following:

INNER JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
LEFT SEMI JOIN
RIGHT SEMI JOIN
LEFT ANTI JOIN
RIGHT ANTI JOIN

* [Serde](Variant) support arrow serialization for varint type (#32780)

* [fix](multicatalog) fix no data error when read hive table on cosn (#32815)

Currently, when reading a hive on cosn table, doris return empty result, but the table has data.
iceberg on cosn is ok.
The reason is misuse of cosn's file sytem. according to cosn's doc, its fs.cosn.impl should be org.apache.hadoop.fs.CosFileSystem

* [fix](nereids)EliminateGroupByConstant should replace agg's output after removing constant group by keys (#32878)

* [Fix](executor)Fix regression test for test_active_queries/test_backend_active_tasks #32899

* [fix](iceberg) fix iceberg catalog bug and p2 test cases (#32898)

1. Fix iceberg catalog bug

    This PR #30198 change the logic of `IcebergHMSExternalCatalog.java`,
    to get locationUrl by calling hive metastore's `getCatalog()` method.
    But this method only exists in hive 3+. So it will fail if we using hive 2.x.

    I temporary remove this logic, because this logic is only used from iceberg table writing.
    Which is still under development. We will rethink this logic later.

2. Fix test cases

    Some of P2 test cases missed `order_qt`. And because the output format of the floating point
    type is changed, some result in `out` files need to be regenerated.

* [revert](jni) revert part of #32455 (#32904)

* [fix](spill) Avoid releasing resources while spill tasks are executing (#32783)

* [chore](log) print query id before logging profile in be.INFO (#32922)

* [fix](grace-exit) Stop incorrectly of reportwork cause heap use after free #32929

* [improvement](decommission be) decommission check replica num (#32748)

* [fix](arrow-flight) Fix reach limit of connections error (#32911)

Fix Reach limit of connections error
in fe.conf , arrow_flight_token_cache_size is mandatory less than qe_max_connection/2. arrow flight sql is a stateless protocol, connection is usually not actively disconnected, bearer token is evict from the cache will unregister ConnectContext.

Fix ConnectContext.command not be reset to COM_SLEEP in time, this will result in frequent kill connection after query timeout.

Fix bearer token evict log and exception.

TODO: use arrow flight session: https://mail.google.com/mail/u/0/#inbox/FMfcgzGxRdxBLQLTcvvtRpqsvmhrHpdH

* [bugfix](cloud) few variable not initialized (#32868)

../../cloud/src/recycler/meta_checker.cpp
can cause uninitialised memory read.

* [fix](arrow-flight) Fix arrow flight sql compatible with JDK 17 and upgrade arrow 15.0.2 (#32796)

--add-opens=java.base/java.nio=ALL-UNNAMED, see: https://arrow.apache.org/docs/java/install.html#java-compatibility
groovy use flight sql connection to execute query SUM(MAX(c1) OVER (PARTITION BY)) report error: AGGREGATE clause must not contain analytic expressions, but no problem in Java execute it with jdbc::arrow-flight-sql.
groovy not support print arrow array type, throw IndexOutOfBoundsException.
"arrow_flight_sql" not support two phase read
./run-regression-test.sh --run --clean -g arrow_flight_sql

* [fix](spill) SpillStream's writer maybe may not have been finalized (#32931)

* [improvement](spill) Disable DistinctStreamingAgg when spill is enabled (#32932)

* [Improve](inverted_index) update clucene and improve array inverted index writer  (#32436)

* [Performance](exec) replace SipHash in function by XXHash (#32919)

* [feature](agg) add aggregate function sum0 (#32541)

* [improvement](mtmv) Support to get tables in materialized view when collecting table in plan (#32797)

Support to get tables in materialized view when collecting table in plan

table scehma as fllowing:

create materialized view mv1
BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
DISTRIBUTED BY RANDOM BUCKETS 1 
PROPERTIES ('replication_num' = '1')
 as 
select 
  t1.c1, 
  t3.c2 
from 
  table1 t1 
  inner join table3 t3 on t1.c1 = t3.c2

if get table from the plan as follwoing, we can get [table1, table3, table2], the mv1 is expanded to get base tables;

SELECT 
  mv1.*, 
  uuid() 
FROM 
  mv1 LEFT SEMI 
  JOIN table2 ON mv1.c1 = table2.c1 
WHERE 
  mv1.c1 IN (
    SELECT 
      c1 
    FROM 
      table2
  ) 
  OR mv1.c1 < 10

* [enhance](mtmv)support olap table partition column is null (#32698)

* [enhancement](cloud) add table version to cloud (#32738)

Add table version to cloud.

In Fe:
Get: If Fe is cloud mode, get table version from meta service.
Update: Op drop/replace temp partition, commit transaction.

In meta service:
Add: create Index. init value is 1.
Remove: by recycler.
Update: commit/drop partition rpc, commit txn rpc. Atomic++.

* [fix](cloud) schema change from not null to null (#32913)

1. Use equals instead of == for type comparing
2. null bitmap size is reisze by size of ref column.

* [feature](Nereids): add ColumnPruningPostProcessor. (#32800)

* [case](rowpolicy)fix row policy has been exist (#32880)

* [fix](pipeline) fix use error row desc when origin block clear (#32803)

* [fix](Nereids) support variant column with index when create table (#32948)

* [opt](Nereids) support create table with variant type (#32953)

* [test](insert-overwrite) Add insert overwrite auto detect concurrency cases (#32935)

* [fix](compile) fe cannot compile in idea (#32955)

* [enhancement](plsql) Support select * from routines (#32866)

Support show of plsql procedure using select * from routines.

* [fix](trino-connector) fix `NoClassDefFoundError` of hudi `Utils` class (#32846)

Due to the change of this PR #32455 , the `trino-connector-scanner` package cannot access the `hudi_scanner` package, so the exception NoclassDeffounderror will appear.

We need to write a separate Utils class.

* [exec](column) change some complex column move to noexcept (#32954)

* [Enhancement](data skew) extends show data skew (#32732)

* [chore](test) let suite compatible with Nereids (#32964)

* Support identical column name in different index. (#32792)

* Limit the max string length to 1024 while collecting column stats to control BE memory usage. (#32470)

* [fix](merge-iterator) fix NOT_IMPLEMENTED_ERROR when read next block view (#32961)

* [improvement](executor)Add tag property for workload group #32874

* [fix](auth)unified workload and resource permission logic (#32907)

- `Grant resource` can no longer grant global `usage_priv`
-  `grant resource %` instead of `grant resource *`

before change:
```
grant usage_priv on resource * to f;
show grants for f\G
*************************** 1. row ***************************
      UserIdentity: 'f'@'%'
           Comment: 
          Password: No
             Roles: 
       GlobalPrivs: Usage_priv 
      CatalogPrivs: NULL
     DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv 
        TablePrivs: NULL
          ColPrivs: NULL
     ResourcePrivs: NULL
 CloudClusterPrivs: NULL
WorkloadGroupPrivs: normal: Usage_priv 
```
after change
```
grant usage_priv on resource '%' to f;
show grants for f\G
*************************** 1. row ***************************
      UserIdentity: 'f'@'%'
           Comment: 
          Password: No
             Roles: 
       GlobalPrivs: NULL
      CatalogPrivs: NULL
     DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv 
        TablePrivs: NULL
          ColPrivs: NULL
     ResourcePrivs: %: Usage_priv 
 CloudClusterPrivs: NULL
WorkloadGroupPrivs: normal: Usage_priv 

```

---------

Co-authored-by: yujun <yu.jun.reach@gmail.com>
Co-authored-by: Gavin Chou <gavineaglechou@gmail.com>
Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com>
Co-authored-by: yongjinhou <109586248+yongjinhou@users.noreply.github.com>
Co-authored-by: Dongyang Li <hello_stephen@qq.com>
Co-authored-by: stephen <hello-stephen@qq.com>
Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com>
Co-authored-by: seawinde <149132972+seawinde@users.noreply.github.com>
Co-authored-by: lihangyu <15605149486@163.com>
Co-authored-by: Yulei-Yang <yulei.yang0699@gmail.com>
Co-authored-by: starocean999 <40539150+starocean999@users.noreply.github.com>
Co-authored-by: wangbo <wangbo@apache.org>
Co-authored-by: Mingyu Chen <morningman@163.com>
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
Co-authored-by: zhiqiang <seuhezhiqiang@163.com>
Co-authored-by: Xinyi Zou <zouxinyi02@gmail.com>
Co-authored-by: Vallish Pai <vallishpai@gmail.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: Jensen <czjourney@163.com>
Co-authored-by: zhangdong <493738387@qq.com>
Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>
Co-authored-by: jakevin <jakevingoo@gmail.com>
Co-authored-by: Mryange <59914473+Mryange@users.noreply.github.com>
Co-authored-by: zclllyybb <zhaochangle@selectdb.com>
Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com>
Co-authored-by: Xin Liao <liaoxinbit@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants