Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release Note 2.0.3 #27909

Open
xiaokang opened this issue Dec 3, 2023 · 11 comments
Open

Release Note 2.0.3 #27909

xiaokang opened this issue Dec 3, 2023 · 11 comments

Comments

@xiaokang
Copy link
Contributor

xiaokang commented Dec 3, 2023

Previous Release Note 2.0.2

Thanks to our community users and developers, about 1000 improvements and bug fixes have been made in Doris 2.0.3 version, including optimizer statistics, inverted index, complex datatypes, data lake, replica management.

1 Behavior change

  • The output format of the complex data type array/map/struct has been changed to be consistent to the input format and JSON specification. The main changes from the previous version are that DATE/DATETIME and STRING/VARCHAR are enclosed in double quotes and null values inside ARRAY/MAP are displayed as null instead of NULL.
  • SHOW_VIEW permission is supported. Users with SELECT or LOAD permission will no longer be able to execute the 'SHOW CREATE VIEW' statement and must be granted the SHOW_VIEW permission separately.

2 New features

2.1 Support collecting statistics for optimizer automatically

Collecting statistics helps the optimizer understand the data distribution characteristics and choose a better plan to greatly improve query performance. It is officially supported starting from version 2.0.3 and is enabled all day by default.

see more:https://doris.apache.org/docs/query-acceleration/statistics/

2.2 Support complex datatypes for more datalake source

2.3 Add more builtin functions

3 Improvement and optimizations

3.1 Performance optimizations

  • When the inverted index MATCH WHERE condition with a high filter rate is combined with the common WHERE condition with a low filter rate, the I/O of the index column is greatly reduced.
  • Optimize the efficiency of random data access after the where filter.
  • Optimizes the performance of the old get_json_xx function on JSON data types by 2~4x.
  • Supports the configuration to reduce the priority of the data read thread, ensuring the CPU resources for real-time writing.
  • Adds uuid-numeric function that returns largeint, which is 20 times faster than uuid function that returns string.
  • Optimized the performance of case when by 3x.
  • Cut out unnecessary predicate calculations in storage engine execution.
  • Accelerate count performance by pushing down count operator to storage tier.
  • Optimizes the computation performance of the nullable type in and or expressions.
  • Supports rewriting the limit operator before join in more scenarios to improve query performance.
  • Eliminate useless order by operators from inline view to improve query performance.
  • Optimizes the accuracy of cardinality estimates and cost models in some cases.
  • Optimized jdbc catalog predicate pushdown logic.
  • Optimized the read efficiency of the file cache when it's enable for the first time.
  • Optimizes the hive table sql cache policy and uses the partition update time stored in HMS to improve the cache hit ratio.
  • Optimize mow compaction efficiency.
  • Optimized thread allocation logic for external table query to reduce memory usage
  • Optimize memory usage for column reader.

3.2 Distributed replica management improvements

Distributed replica management improvements include skipping partition deletion, colocate group deletion, balance failure due to continuous write, and hot and cold seperation table balance.

3.3 Security enhancement

4 Bugfix and stability

4.1 Complex datatypes

4.2 Inverted index

4.3 Materialized View

4.4 Table sample

4.5 Unique with merge on write

4.6 Load and compaction

4.7 Data Lake compatibility

4.8 JDBC external table compatibility

4.9 SQL Planner and Optimizer

Others

See the complete list of improvements and bug fixes on github dev/2.0.3-merged .

Big Thanks

Thanks all who contribute to this release:

@adonis0147
@airborne12
@amorynan
@AshinGau
@BePPPower
@bigben0204
@BiteTheDDDDt
@bobhan1
@ByteYue
@CalvinKirs
@CanGuan
@caoliang-web
@catpineapple
@csun5285
@dataroaring
@deadlinefen
@deardeng
@DongLiang-0
@Doris-Extras
@dutyu
@eldenmoon
@englefly
@freemandealer
@fsilent
@Gabriel39
@GoGoWen
@HappenLee
@hello-stephen
@hf200012
@HHoflittlefish777
@HowardQin
@hubgeter
@hust-hhb
@JackDrogon
@jacktengg
@jackwener
@jeffreys-cat
@Jibing-Li
@JingDas
@kaijchen
@kaka11chen
@KassieZ
@Kikyou1997
@Lchangliang
@LemonLiTree
@liaoxin01
@LiBinfeng-01
@liugddx
@liutang123
@lsy3993
@luozenglin
@luwei16
@mongo360
@morningman
@morrySnow
@mrhhsg
@Mryange
@mymeiyi
@neuyilan
@nextdreamblue
@Nitin-Kashyap
@pingchunzhang
@platoneko
@qidaye
@ryanzryu
@seawinde
@shuke987
@sohardforaname
@starocean999
@SWJTU-ZhangLei
@TangSiyang2001
@Tanya-W
@TsukiokaKogane
@vinlee19
@w41ter
@wangbo
@whutpencil
@WinkerDu
@wsjz
@wuwenchi
@Xiaoccer
@xiaokang
@xiedeyantu
@XieJiann
@xinyiZzz
@xuefengze
@XuJianxu
@xy720
@xzj7019
@yagagagaga
@yiguolei
@yujun777
@Yukang-Lian
@Yulei-Yang
@zclllyybb
@zddr
@zfr9527
@zgxme
@zhangguoqiang666
@zhangstar333
@zhangy5
@zhannngchen
@zhiqiang-hhhh
@zy-kkk
@zzzxl1993
@zzzzzzzs

@xiaokang xiaokang changed the title Release Note 2.0.3 [DRAFT] Release Note 2.0.3 Dec 3, 2023
@bingwill
Copy link

bingwill commented Dec 4, 2023

我们在使用部分列更新功能的时候遇到这样一个问题(使用的是2.02版本):
1、开启了部分列更新的功能,然后发现批量insert 多values 写入数据时会出错,单条写入没有问题。
2、数据删除后,再写入会不成功,然后把主键的值改改就可以写入了。

@wj215318
Copy link

wj215318 commented Dec 5, 2023

撒时候正式发布啊

@hawkingrei
Copy link

对 Automatic Collection、统计信息有点兴趣,最近有什么开发计划吗?

@EmmyMiao87
Copy link
Contributor

@morningman
@hawkingrei 想参加 Doris 统计信息的开发,他在这方面很有经验 ~

@bobhan1
Copy link
Contributor

bobhan1 commented Dec 7, 2023

我们在使用部分列更新功能的时候遇到这样一个问题(使用的是2.02版本): 1、开启了部分列更新的功能,然后发现批量insert 多values 写入数据时会出错,单条写入没有问题。 2、数据删除后,再写入会不成功,然后把主键的值改改就可以写入了。

You can describe your problems more explicitly in https://github.com/apache/doris/issues or https://github.com/apache/doris/discussions.

@morningman
Copy link
Contributor

对 Automatic Collection、统计信息有点兴趣,最近有什么开发计划吗?

Welcome, I will contact you later

@xiaokang
Copy link
Contributor Author

xiaokang commented Dec 7, 2023

撒时候正式发布啊

This week. Thanks for your attention.

@xiaokang
Copy link
Contributor Author

xiaokang commented Dec 7, 2023

中文版本的Release Note

感谢Doris社区参与2.0.3版本的100多位开发者和用户,这版本包含了统计信息、倒排索引、复杂数据类型、数据湖、分布式副本管理等近1000项改进与修复。

1 行为变更

2 新功能

2.1 支持自动统计信息收集

通过收集统计信息有助于优化器了解数据分布特性,选择更优的计划以大幅提升查询效率。从2.0.3版本开始正式支持,默认为全天开启状态。

更多信息请参考:https://doris.apache.org/docs/query-acceleration/statistics/

2.2 数据湖更多系统支持复杂数据类型

2.3 增加更多内置函数

3 改进和优化

3.1 性能优化

  • 在过滤率高的倒排索引MATCH WHERE条件和过滤率低的普通WHERE条件组合时,大幅降低索引列的IO
  • 优化经过where条件过滤后随机读数据的效率
  • 优化在JSON数据类型上使用老的get_json_xx 函数的性能,提升2-4倍
  • 支持配置降低读数据线程的优先级,保证写入的CPU资源和实时性
  • 增加返回largeint的uuid-numeric函数,性能比返回string的uuid函数快20倍
  • 优化了case when 的性能提升3倍
  • 在存储引擎执行中裁剪不必要的谓词计算
  • 支持 count 算子下推到存储层
  • 优化支持 and or 表达式中包含nullable 类型的计算性能
  • 支持更多场景下limit算子提前到join前执行的改写,以提升执行效率
  • 增加消除inline view中的无用的order by算子,以提升执行效率
  • 优化了部分情况下的基数估计和代价模型的准确性,以提升执行效率
  • 优化了jdbc catalog 的谓词下推逻辑和大小写逻辑
  • 优化了file cache的第一次开启后的读取效率
  • 优化hive 表sql cache 策略,使用 hms 中存储的分区更新时间作为 cache 是否失效的判断,提高cache命中率。
  • 优化mow compaction 效率
  • 优化了外表查询的线程分配逻辑,降低内存使用
  • 优化column reader 的内存使用

3.2 分布式副本管理改进

包括跳过删除分区,colocate group ,持续写时均衡失败,冷热分层表不能均衡等

3.3 安全性提升

4 bugfix和稳定性提升

4.1 复杂数据类型

4.2 倒排索引

4.3 物化视图

4.4 采样查询

4.5 主键表

4.6 导入和compaction

4.7 数据湖兼容性

4.8 JDBC外表兼容性

4.9 SQL规划和优化

@xiaokang xiaokang changed the title [DRAFT] Release Note 2.0.3 Release Note 2.0.3 Dec 7, 2023
@luzhijing luzhijing pinned this issue Dec 11, 2023
@DA1OOO
Copy link

DA1OOO commented Dec 13, 2023

发版了吗,咋官网还没更新

@xiaokang
Copy link
Contributor Author

发版了吗,咋官网还没更新

It's in the voting process and will be released tomorrow if vote passed.

@myokok
Copy link

myokok commented Dec 19, 2023

我们的开发语言是 .net,使用的是 ODBC连接Doris 。Doris后期不在维护ODBC了?

@luzhijing luzhijing unpinned this issue Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants