Release Note 2.0.3 #27909

xiaokang · 2023-12-03T04:02:35Z

Thanks to our community users and developers, about 1000 improvements and bug fixes have been made in Doris 2.0.3 version, including optimizer statistics, inverted index, complex datatypes, data lake, replica management.

1 Behavior change

The output format of the complex data type array/map/struct has been changed to be consistent to the input format and JSON specification. The main changes from the previous version are that DATE/DATETIME and STRING/VARCHAR are enclosed in double quotes and null values inside ARRAY/MAP are displayed as null instead of NULL.
- [Fix](Serde) Fix content displayed by complex types in MySQL Client #25946
SHOW_VIEW permission is supported. Users with SELECT or LOAD permission will no longer be able to execute the 'SHOW CREATE VIEW' statement and must be granted the SHOW_VIEW permission separately.
- [improvement](auth) support show view priv #25370

2 New features

2.1 Support collecting statistics for optimizer automatically

Collecting statistics helps the optimizer understand the data distribution characteristics and choose a better plan to greatly improve query performance. It is officially supported starting from version 2.0.3 and is enabled all day by default.

see more：https://doris.apache.org/docs/query-acceleration/statistics/

2.2 Support complex datatypes for more datalake source

Support complex datatypes for JAVA UDF, JDBC and Hudi MOR
- [feature](jni) support complex types in jni framework #24810
- [feature](tvf)(jni-avro)jni-avro scanner add complex data types #26236
Support complex datatypes for Paimon
- [feature](paimon)paimon catalog supports complex types #25364
Suport Paimon version 0.5
- [improvement](catalog)compatible with paimon 0.5 #24985

2.3 Add more builtin functions

Support the BitmapAgg function in new optimizer
- [feature](fe) add function 'BitmapAgg' in nereids #25508
Supports SHA series digest functions
- [feature](function) Support SHA family functions #24342
Support the BITMAP datatype in the aggregate functions min_by and max_by
- [feature](function) support bitmap type in min/max_by agg function #25430
Add milliseconds/microseconds_add/sub/diff functions
- [feature](datetime-func)support milliseconds_add/sub/diff and microseconds_diff #24114
Add some json functions: json_insert, json_replace, json_set
- [feature](json-function) add json_insert, json_replace, json_set functions #24384

3 Improvement and optimizations

3.1 Performance optimizations

When the inverted index MATCH WHERE condition with a high filter rate is combined with the common WHERE condition with a low filter rate, the I/O of the index column is greatly reduced.
Optimize the efficiency of random data access after the where filter.
Optimizes the performance of the old get_json_xx function on JSON data types by 2~4x.
Supports the configuration to reduce the priority of the data read thread, ensuring the CPU resources for real-time writing.
Adds uuid-numeric function that returns largeint, which is 20 times faster than uuid function that returns string.
Optimized the performance of case when by 3x.
Cut out unnecessary predicate calculations in storage engine execution.
Accelerate count performance by pushing down count operator to storage tier.
Optimizes the computation performance of the nullable type in and or expressions.
Supports rewriting the limit operator before join in more scenarios to improve query performance.
Eliminate useless order by operators from inline view to improve query performance.
Optimizes the accuracy of cardinality estimates and cost models in some cases.
Optimized jdbc catalog predicate pushdown logic.
Optimized the read efficiency of the file cache when it's enable for the first time.
Optimizes the hive table sql cache policy and uses the partition update time stored in HMS to improve the cache hit ratio.
Optimize mow compaction efficiency.
Optimized thread allocation logic for external table query to reduce memory usage
Optimize memory usage for column reader.

3.2 Distributed replica management improvements

Distributed replica management improvements include skipping partition deletion, colocate group deletion, balance failure due to continuous write, and hot and cold seperation table balance.

3.3 Security enhancement

The audit log plug-in uses a token instead of a plaintext password to enhance security
- [Feature](auditloader) Plugin auditloader use auth token to avoid using cleartext passwords in config #26278
log4j configures security enhancement
- [Enhancement](log) Improve Safety and Robustness of Log4j Configuration #24861
Sensitive user information is not displayed in logs
- [improvement](log) log desensitization without displaying user info #26912

4 Bugfix and stability

4.1 Complex datatypes

Fix issues that fixed-length CHAR(n) was not truncated correctly in map/struct.
- [FIX](collectiontype) fix shrink char column in map/struct #25725
Fix write failure for struct datatype nested for map/array
- [FIX](complextype)fix struct nested complex collection type and and regresstest #26973
Fix the issue that count distinct did not support array/map/struct
- [FIX](func) fix count distinct do not support arr/map/struct #25483
Fix be crash in updating to 2.0.3 after the delete complex type appeared in query
- [FIX](upgrade)fix upgrade for predict column delete collection type will make core #26006
Fix be crash when JSON datatype is in WHERE clause.
- [FIX](jsonb)fix jsonb is not in predict column #27325
Fix be crash when ARRAY datatype is in OUTER JOIN clause.
- [FIX](resize) fix array and map offsets resize with default value #25669
Fix reading incorrect result for DECIMAL datatype in ORC format.

4.2 Inverted index

Fix incorrect result for OR NOT combination in WHERE clause were incorrect when disable inverted index query.
- [Fix](inverted index) fix compound query result error when disable inverted_index_query session variable #26327
Fix be crash when write a empty with inverted index
- [Fix](inverted index) fix empty array index writer bug #25984
Fix be crash in index compaction when the output of compaction is empty.
- [opt](index compaction) optimize checks before index compaction #25486
Fixed the problem of adding an inverted index to be crashed when no data is written to the newly added column.
Fix be crash when BUILD INDEX after ADD COLUMN without new data written.
- [fix](build index) fix core when build index for a new column which without data #27276
Fix missing and leak problem of hardlink for inverted index file.
- [fix](build index) Fix inverted index hardlink leak and missing problem #26903
Fix index file corrupt when disk is full temporarilly
- [Fix](inverted index) fix compound directory flush buffer error #28191
Fix incorrect result due to optimization for skip reading index column
- [Fix](inverted index) fix need read data optimize problem #28104

4.3 Materialized View

Fix the problem of BE crash caused by repeated expressions in the group by statement
Fix be crash when there are duplicate expressions in group by statements.
- [Bug](materialized-view) add limitation for duplicate expr on materialized view #27523
Disables the float/doubld type in the group by clause when a view is created.
- [Bug](materialized-view) add limit for group by with float/double on create mv #25823
Improve the function of select query matching materialized view
- [Bug](materialized-view) enable rewrite on select materialized index with aggregate mode #24691
Fix an issue that materialized views could not be matched when a table alias was used
- [Bug](materialized-view) fix not match mv when some alias on agg #25321
Fix the problem using percentile_approx when creating materialized views
- [Bug](materialized-view) fix some bugs on create mv with percentile_approx #26528

4.4 Table sample

Fix the problem that table sample query can not work on table with partitions.
- [fix](planner) Fix sample partition table #25912
Fix the problem that table sample query can not work when specify tablet.
- [fix](planner) Fix select table tablet not effective #25378

Others

Fix BE crash when the order of columns in a table is changed and then upgraded to 2.0.3.
- [Fix](schema change) disable convert light schema change #28205

See the complete list of improvements and bug fixes on github dev/2.0.3-merged .

Big Thanks

Thanks all who contribute to this release:

@adonis0147
@airborne12
@amorynan
@AshinGau
@BePPPower
@bigben0204
@BiteTheDDDDt
@bobhan1
@ByteYue
@CalvinKirs
@CanGuan
@caoliang-web
@catpineapple
@csun5285
@dataroaring
@deadlinefen
@deardeng
@DongLiang-0
@Doris-Extras
@dutyu
@eldenmoon
@englefly
@freemandealer
@fsilent
@Gabriel39
@GoGoWen
@HappenLee
@hello-stephen
@hf200012
@HHoflittlefish777
@HowardQin
@hubgeter
@hust-hhb
@JackDrogon
@jacktengg
@jackwener
@jeffreys-cat
@Jibing-Li
@JingDas
@kaijchen
@kaka11chen
@KassieZ
@Kikyou1997
@Lchangliang
@LemonLiTree
@liaoxin01
@LiBinfeng-01
@liugddx
@liutang123
@lsy3993
@luozenglin
@luwei16
@mongo360
@morningman
@morrySnow
@mrhhsg
@Mryange
@mymeiyi
@neuyilan
@nextdreamblue
@Nitin-Kashyap
@pingchunzhang
@platoneko
@qidaye
@ryanzryu
@seawinde
@shuke987
@sohardforaname
@starocean999
@SWJTU-ZhangLei
@TangSiyang2001
@Tanya-W
@TsukiokaKogane
@vinlee19
@w41ter
@wangbo
@whutpencil
@WinkerDu
@wsjz
@wuwenchi
@Xiaoccer
@xiaokang
@xiedeyantu
@XieJiann
@xinyiZzz
@xuefengze
@XuJianxu
@xy720
@xzj7019
@yagagagaga
@yiguolei
@yujun777
@Yukang-Lian
@Yulei-Yang
@zclllyybb
@zddr
@zfr9527
@zgxme
@zhangguoqiang666
@zhangstar333
@zhangy5
@zhannngchen
@zhiqiang-hhhh
@zy-kkk
@zzzxl1993
@zzzzzzzs

The text was updated successfully, but these errors were encountered:

bingwill · 2023-12-04T09:08:56Z

我们在使用部分列更新功能的时候遇到这样一个问题（使用的是2.02版本）：
1、开启了部分列更新的功能，然后发现批量insert 多values 写入数据时会出错，单条写入没有问题。
2、数据删除后，再写入会不成功，然后把主键的值改改就可以写入了。

wj215318 · 2023-12-05T08:50:43Z

撒时候正式发布啊

hawkingrei · 2023-12-06T14:17:14Z

对 Automatic Collection、统计信息有点兴趣，最近有什么开发计划吗？

EmmyMiao87 · 2023-12-07T02:17:26Z

@morningman
@hawkingrei 想参加 Doris 统计信息的开发，他在这方面很有经验 ~

bobhan1 · 2023-12-07T05:41:02Z

我们在使用部分列更新功能的时候遇到这样一个问题（使用的是2.02版本）： 1、开启了部分列更新的功能，然后发现批量insert 多values 写入数据时会出错，单条写入没有问题。 2、数据删除后，再写入会不成功，然后把主键的值改改就可以写入了。

You can describe your problems more explicitly in https://github.com/apache/doris/issues or https://github.com/apache/doris/discussions.

morningman · 2023-12-07T08:04:51Z

对 Automatic Collection、统计信息有点兴趣，最近有什么开发计划吗？

Welcome, I will contact you later

xiaokang · 2023-12-07T08:34:54Z

撒时候正式发布啊

This week. Thanks for your attention.

xiaokang · 2023-12-07T08:36:23Z

中文版本的Release Note

感谢Doris社区参与2.0.3版本的100多位开发者和用户，这版本包含了统计信息、倒排索引、复杂数据类型、数据湖、分布式副本管理等近1000项改进与修复。

1 行为变更

复杂数据类型array/map/struct的输出格式改成跟输入格式以及JSON规范保持一致，跟之前版本的主要变化是日期和字符串用双引号括起来，ARRAY/MAP内部的空值显示为null而不是NULL。
- [Fix](Serde) Fix content displayed by complex types in MySQL Client #25946
默认情况下，当用户属性 resource_tags.location 没有设置时，只能使用 default 资源组的节点，而之前版本中可以访问任意节点。
- [improvement](resource-tag) limit the default user's resource tag to 'default' #25331
支持 SHOW_VIEW 权限，拥有 SELECT 或 LOAD 权限的用户将不再能够执行 SHOW CREATE VIEW 语句，必须单独授予 SHOW_VIEW 权限。
- [improvement](auth) support show view priv #25370

2 新功能

2.1 支持自动统计信息收集

通过收集统计信息有助于优化器了解数据分布特性，选择更优的计划以大幅提升查询效率。从2.0.3版本开始正式支持，默认为全天开启状态。

更多信息请参考：https://doris.apache.org/docs/query-acceleration/statistics/

2.2 数据湖更多系统支持复杂数据类型

JAVA UDF、JDBC、Hudi MOR 表等功能支持复杂数据类型
- [feature](jni) support complex types in jni framework #24810
- [feature](tvf)(jni-avro)jni-avro scanner add complex data types #26236
Paimon catalog 支持复杂数据类型
- [feature](paimon)paimon catalog supports complex types #25364
Paimon catalog 支持 paimon 0.5 版本
- [improvement](catalog)compatible with paimon 0.5 #24985

2.3 增加更多内置函数

新优化器支持BitmapAgg函数
- [feature](fe) add function 'BitmapAgg' in nereids #25508
支持SHA系列摘要函数
- [feature](function) Support SHA family functions #24342
聚合函数min_by和max_by支持bitmap数据类型
- [feature](function) support bitmap type in min/max_by agg function #25430
增加milliseconds/microseconds_add/sub/diff函数
- [feature](datetime-func)support milliseconds_add/sub/diff and microseconds_diff #24114
增加json_insert, json_replace, json_set JSON函数
- [feature](json-function) add json_insert, json_replace, json_set functions #24384

3 改进和优化

3.1 性能优化

在过滤率高的倒排索引MATCH WHERE条件和过滤率低的普通WHERE条件组合时，大幅降低索引列的IO
优化经过where条件过滤后随机读数据的效率
优化在JSON数据类型上使用老的get_json_xx 函数的性能，提升2-4倍
支持配置降低读数据线程的优先级，保证写入的CPU资源和实时性
增加返回largeint的uuid-numeric函数，性能比返回string的uuid函数快20倍
优化了case when 的性能提升3倍
在存储引擎执行中裁剪不必要的谓词计算
支持 count 算子下推到存储层
优化支持 and or 表达式中包含nullable 类型的计算性能
支持更多场景下limit算子提前到join前执行的改写，以提升执行效率
增加消除inline view中的无用的order by算子，以提升执行效率
优化了部分情况下的基数估计和代价模型的准确性，以提升执行效率
优化了jdbc catalog 的谓词下推逻辑和大小写逻辑
优化了file cache的第一次开启后的读取效率
优化hive 表sql cache 策略，使用 hms 中存储的分区更新时间作为 cache 是否失效的判断，提高cache命中率。
优化mow compaction 效率
优化了外表查询的线程分配逻辑，降低内存使用
优化column reader 的内存使用

3.2 分布式副本管理改进

包括跳过删除分区，colocate group ，持续写时均衡失败，冷热分层表不能均衡等

3.3 安全性提升

审计日志插件的配置使用token代替明文密码增强安全性
- [Feature](auditloader) Plugin auditloader use auth token to avoid using cleartext passwords in config #26278
log4j配置安全性增强
- [Enhancement](log) Improve Safety and Robustness of Log4j Configuration #24861
日志中不显示用户敏感信息
- [improvement](log) log desensitization without displaying user info #26912

4 bugfix和稳定性提升

4.1 复杂数据类型

修复了map/struct对定长CHAR(n)没有正确截断的问题
- [FIX](collectiontype) fix shrink char column in map/struct #25725
修复了struct嵌套map/array写入失败的问题
- [FIX](complextype)fix struct nested complex collection type and and regresstest #26973
修复了count distinct不支持array/map/struct的问题
- [FIX](func) fix count distinct do not support arr/map/struct #25483
解决query中出现delete复杂类型之后，升级到2.0.3 过程中出现be crash的问题
- [FIX](upgrade)fix upgrade for predict column delete collection type will make core #26006
修复了jsonb在where条件中be crash的问题
- [FIX](jsonb)fix jsonb is not in predict column #27325
修复了outer join中有array类型时be crash的问题
- [FIX](resize) fix array and map offsets resize with default value #25669
修复 orc 格式 decimal 类型读取错误的问题

4.2 倒排索引

修复了关闭倒排索引查询时OR NOT组合where条件结果错误的问题
- [Fix](inverted index) fix compound query result error when disable inverted_index_query session variable #26327
修复了空数组的倒排索引写入时be crash的问题
- [Fix](inverted index) fix empty array index writer bug #25984
修复输出为空的情况下index compaction be crash的问题
- [opt](index compaction) optimize checks before index compaction #25486
修复新增列没有写入数据时，增加倒排索引be crash的问题
- [fix](build index) fix core when build index for a new column which without data #27276
修复1.2误建倒排索引后升级2.0等情况下倒排索引硬链缺失和泄露的问题
- [fix](build index) Fix inverted index hardlink leak and missing problem #26903

4.3 物化视图

修复group by语句中包括重复表达式导致BE crash 的问题
- [Bug](materialized-view) add limitation for duplicate expr on materialized view #27523
禁止视图创建时group by 子句中使用float/doubld 类型
- [Bug](materialized-view) add limit for group by with float/double on create mv #25823
增强支持了select 查询命中物化视图的功能
- [Bug](materialized-view) enable rewrite on select materialized index with aggregate mode #24691
修复当使用了表的alias时，物化视图不能命中的问题
- [Bug](materialized-view) fix not match mv when some alias on agg #25321
修复了创建物化视图中使用percentile_approx的问题
- [Bug](materialized-view) fix some bugs on create mv with percentile_approx #26528

4.4 采样查询

修复table sample功能在partition table 上无法正常工作的问题
- [fix](planner) Fix sample partition table #25912
修复table sample指定tablet 无法工作的问题
- [fix](planner) Fix select table tablet not effective #25378

4.5 主键表

修复基于主键条件更新的空指针异常
- [fix](partial update) Fix NPE when the query statement of an update statement is a point query in OriginPlanner #26881
修复部分列更新字段名大小写问题
- [fix](partial update) keep case insensitivity and use the columns' origin names in partialUpdateCols in origin planner #27223
修复 schema change 时 mow 会出现重复 key 的问题
- [fix](merge-on-write) fix duplicate key in schema change #25705

4.6 导入和compaction

修复 routineload 一流多表时 unkown slot descriptor 错误
- [fix](multi-table) fix unknown source slot descriptor when load multi table #25762
修复内存统计并发访问导致be crash问题
- [fix](load) add lock in active_memtable_mem_consumption #27101
修复重复取消导入导致be crash的问题
- [fix](load) skip cancel already cancelled channels #27111
修复broker load 时 broker 连接报错问题
- [fix](broker-read) refactor broker reading process to avoid null broker connection #26050
修复compaction 和 scan 并发下 delete 谓词可能导致的查询结果不对
- [Bug](ScanNode) Fix potential incorrect query result caused by concurrent NewOlapScanNode initialization and Compaction #24638
修复compaction task存在时打印大量stacktrace日志的问题
- [chore](compaction) Do not print the stack trace when the compaction task already exists #25597

4.7 数据湖兼容性

解决iceberg 表中包含特殊字符导致查询失败的问题
- [fix](iceberg) iceberg use custom method to encode special characters in column name #27108
修复hive metastore 不同版本的兼容性问题
- [fix](hms) fix compatibility issue of hive metastore client #27327
修复读取 max compute 分区表错误的问题
- [fix](multi-catalog)fix maxcompute partition filter and session creation #24911
修复备份到对象存储失败的问题
- [fix](backup) fix backup fail on s3 #25496
- [fix](backup) missing use_path_style properties for minio #25803

4.8 JDBC外表兼容性

修复jdbc catalog处理Oracle日期类型格式错误的问题
- [fix](jdbc catalog) fix handle oracle date format #25487
修复jdbc catalog读取MySQL 0000-00-00日期异常的问题
- [fix](jdbc catalog) fix mysql zero date #26569
修复从Mariadb读取数据时间类型默认值为current_timestamp时空指针异常问题
- [fix](multicatalog)fix jdbc catalog current_timestamp default #25016
修复jdbc catalog处理bitmap类型时be crash的问题
- [fix](jdbc catalog) fix jdbc catalog read bitmap data crash #25034
- [BugFix](JDBC Catalog) fix jdbc catalog query bitmap may cause be core sometimes #26933

4.9 SQL规划和优化

修复了部分场景下分区裁剪错误的问题
修复了部分场景下子查询处理不正确的问题
修复了部分语义解析的错误
- [fix](Nereids) should not replace slot by Alias when do NormalizeSlot #24928
- [fix](nereids)fix bug of duplicate name of inline view #25627
修复right outer/anti join时，有可能丢失数据的问题
- [fix](Nereids) ban right outer, right anti, full outer with bucket shuffle #26529
修复了谓词被错误的下推穿过聚合算子的问题
- [fix](Nereids) non-slot filter should not be push through aggregate #25525
修正了部分情况下返回的结果header不正确的问题
- [opt](Nereids) use correct column label when execute query in FE #25372
包含有nullsafeEquals表达式(<=>)作为连接条件时，可以正确对规划出hash join
- [fix](Nereids): NullSafeEqual should be in HashJoinCondition #27127
修复了set operation算子中无法正确列裁剪的问题
- [fix](Nereids) column pruning under union broken unexpectedly #26884

DA1OOO · 2023-12-13T01:56:59Z

发版了吗，咋官网还没更新

xiaokang · 2023-12-13T09:56:41Z

发版了吗，咋官网还没更新

It's in the voting process and will be released tomorrow if vote passed.

myokok · 2023-12-19T02:12:02Z

我们的开发语言是 .net，使用的是 ODBC连接Doris 。Doris后期不在维护ODBC了？

xiaokang changed the title ~~Release Note 2.0.3~~ [DRAFT] Release Note 2.0.3 Dec 3, 2023

morningman added the release notes label Dec 7, 2023

xiaokang changed the title ~~[DRAFT] Release Note 2.0.3~~ Release Note 2.0.3 Dec 7, 2023

luzhijing pinned this issue Dec 11, 2023

xiaokang mentioned this issue Jan 12, 2024

Release Note 2.0.4 #29906

Open

luzhijing unpinned this issue Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release Note 2.0.3 #27909

Release Note 2.0.3 #27909

xiaokang commented Dec 3, 2023 •

edited

Loading

bingwill commented Dec 4, 2023

wj215318 commented Dec 5, 2023

hawkingrei commented Dec 6, 2023

EmmyMiao87 commented Dec 7, 2023

bobhan1 commented Dec 7, 2023

morningman commented Dec 7, 2023

xiaokang commented Dec 7, 2023

xiaokang commented Dec 7, 2023

DA1OOO commented Dec 13, 2023

xiaokang commented Dec 13, 2023

myokok commented Dec 19, 2023

Release Note 2.0.3 #27909

Release Note 2.0.3 #27909

Comments

xiaokang commented Dec 3, 2023 • edited Loading

1 Behavior change

2 New features

2.1 Support collecting statistics for optimizer automatically

2.2 Support complex datatypes for more datalake source

2.3 Add more builtin functions

3 Improvement and optimizations

3.1 Performance optimizations

3.2 Distributed replica management improvements

3.3 Security enhancement

4 Bugfix and stability

4.1 Complex datatypes

4.2 Inverted index

4.3 Materialized View

4.4 Table sample

4.5 Unique with merge on write

4.6 Load and compaction

4.7 Data Lake compatibility

4.8 JDBC external table compatibility

4.9 SQL Planner and Optimizer

Others

Big Thanks

bingwill commented Dec 4, 2023

wj215318 commented Dec 5, 2023

hawkingrei commented Dec 6, 2023

EmmyMiao87 commented Dec 7, 2023

bobhan1 commented Dec 7, 2023

morningman commented Dec 7, 2023

xiaokang commented Dec 7, 2023

xiaokang commented Dec 7, 2023

1 行为变更

2 新功能

2.1 支持自动统计信息收集

2.2 数据湖更多系统支持复杂数据类型

2.3 增加更多内置函数

3 改进和优化

3.1 性能优化

3.2 分布式副本管理改进

3.3 安全性提升

4 bugfix和稳定性提升

4.1 复杂数据类型

4.2 倒排索引

4.3 物化视图

4.4 采样查询

4.5 主键表

4.6 导入和compaction

4.7 数据湖兼容性

4.8 JDBC外表兼容性

4.9 SQL规划和优化

DA1OOO commented Dec 13, 2023

xiaokang commented Dec 13, 2023

myokok commented Dec 19, 2023

xiaokang commented Dec 3, 2023 •

edited

Loading