-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
New Feature
Query spill to disk
Doris supports query spill to disk in sorting and window functions. When the enable_spilling is true and memory limit is reached, the query will spill to disk so as to avoid the problem of unable to query due to memory bottleneck. The 0.13 version supports spill in sort and window function.
Support bitmap_union, hll_union and count in materialized view
Materialized view supports richer aggregate functions: bitmap_union, hll_union and count. In the Order scenario, user needs to analyze the number of orders in different dimensions by count. Also the pre-calculation of bitmap and hll function can be performed for some deduplication analysis scenarios such as analyzing PV and UV data in website traffic. Doris can automatically match the user's query to an optimal materialized view to speed up the query.
[#3651] [#3677] [#3705] [#3873] [#4014] [#3677]
Spark load
Spark load implements the preprocessing of imported data through external Spark resources, improves the import performance of Doris large data volume and saves Doris cluster computing resources. It is mainly used for scenarios where a large amount of data is imported into Doris during the initial migration.
[#3418] [#3712] [#3715] [#3716]
Support load json-data into Doris by RoutineLoad or StreamLoad
RoutineLoad and StreamLoad support a new data format: json. The data in json format is finally imported into Doris through the transform rules in the load statement. This function is especially beneficial for log services whose original data format is json. Users no longer need to process the data into csv format in the outer layer.
[#3553]
Modify routine load
The properties of routine load such as concurrency, Kafka consumption progress could be modify by ALTER ROUTINE LOAD stmt. Only jobs in the PAUSED state can be modified. After routine load is modified, the newly set properties will be used to plan the task when the task is scheduled again.
[#4158]
Support fetch _id from ES and create table with wildcard or aliase index of ES
There is _id field from native ES document which is primary-key for ES index. This field could be fetch by Doris on ES. Also, Doris support create external table with aliases or wildcard index such as log_*. User can easily search all those index by using aliases and wildcards to match those indexes.
Logstash Doris output plugin
Logstash plugin is used to output data to Doris for logstash. Use the HTTP protocol to interact with the Doris FE Http interface
Load data through Doris's stream load.
[#3800]
Support SELECT INTO OUTFILE
Doris currently supports exporting query results to a third-party file system such as HDFS, S3, BOS. The grammar is referenced from the MySQL grammar manual. The export format is CSV. The export query results could be provide to other users to download or further processing by other systems. Especially good for this kind that the result reset is too large to through the MySQL protocol such as a large number of ids by bitmap_to_string.
[#3584]
Support in predicate in delete statement
The delete statement supports conditions for IN or NOT IN predicate. Users can delete rows that meet different values through this function.
[#4006]
Enhancement
Compaction rules optimization
This optimization updated the strategy for triggering compaction, a version merging strategy that compromises write amplification, space amplification, and read performance (it tends to merge files of adjacent sizes). When the number of the same version is the same, the number of merges is reduced and the total number of files is reduced.
[#4212]
Simplify the delete process to make it fast
The load checker of the rotation training during deletion is cancelled and replaced by txn callback, which will reduce the corresponding time of the delete command to the millisecond level.
[#3191]
Support simple transitivity on join predicate pushdown
When the columns involved in the query filter predicate are consistent with the columns involved in the join condition, the filter predicate can conduct column transmission and also filter another table in the join, reducing the amount of data and achieving the effect of improving the query speed.
[#3453]
Non blocking OlapTableSink
In this optimization, the sending process and the adding row process are executed concurrently in OlapTableSink, and the load performance is always improved. After testing, 56G broker load, the origin ver will run for 4 hours, the multi-ver can halve the time.
[#3143]
Support txn management in db level and use ArrayDeque to improve txn task performance
The transaction management part supports the division of db levels, and each db does not block each other, which improves the execution efficiency of transaction tasks
[#3369]
Improve the performance of query with IN predicate
Add a new config max_pushdown_conditions_per_column to limit the number of conditions of a single column that can be pushed down to the storage engine. It is different from the previous configuration that controls the split scan key. The default value alone is 1024. After the two configurations are separated, the qps of Doris has improved, and the CPU usage rate has also decreased.
[#3694]
Optimized the speed of reading parquet files
There is a cache buffer array in broker reading process when reading parquet file. When a broker about to seek for a position and get data from remote parquet file, try reading with this position in the cache buffer array. Once the expected data hits the cache buffer array, then we don't bother to read data from remote parquet file. After testing, the load time of parquet file in broker or spark load can halve the time.
[#3878]
New Built-in Functions
bitmap_intersect[Support bitmap_intersect #3571]orthogonal_bitmap_intersectin UDAF [Add bitmap longitudinal cutting udaf #4198]orthogonal_bitmap_intersect_countin UDAF [Add bitmap longitudinal cutting udaf #4198]orthogonal_bitmap_union_countin UDAF [Add bitmap longitudinal cutting udaf #4198]
Other
- Support to modify configs when BE is running without restarting ([config] Support to modify configs when BE is running without restarting #3264)
- Support setting replica quota in db level (Support setting replica quota in db level #3283)
- [Doris On ES][Bug-fix] Solve the problem of time format processing.([Doris On ES][Bug-fix] Solve the problem of time format processing #3941)
- [Doris On ES][Bug-Fix] Incorrect result for docvalue scan mode.([Doris On ES][Bug-Fix] Incorrect result for docvalue scan mode #3751)
- [Doris On ES][Bug-Fix] ES queries always route at same 3 BE nodes ([Doris On ES][Bug] ES queries always route at same 3 BE nodes #4351) ([Doris On ES][Bug-Fix] ES queries always route at same 3 BE nodes (#4351) #4352)
- [Doris On ES][Bug-Fix] Resolve NullPointerException when multi fields with text type([Doris On ES][Bug-Fix] Resolve NullPointerException when multi fields with
texttype #4300) - [CodeRefactor] Modify FE modules ([CodeRefactor] Modify FE modules #4146)
- [CodeRefactor] Generate jave files using maven (generate jave files using maven #4133)
- [Compaction] Add delayed deletion of rowsets function, fix -230 error. (Add delayed deletion of rowsets function, fix -230 error. #4039)
- [DOCS] documents rebuild with Vuepress ([Enhancement] documents rebuild with Vuepress (#3408) #3414)
- [Webserver] Make BE webserver more pretty ([webserver] Make BE webserver more pretty #4050)
- [Webserver] Introduce mustache to simplify BE's website render ([webserver] Introduce mustache to simplify BE's website render #4062)
- [Doris On ES][Enhancement] Add docvalue limitation for doc_values scan and enable doc_values scan default ([Doris On ES] Add docvalue limitation for doc_values scan and enable doc_values scan default #4055)
- [Doris On ES][Enhancement] refactor and enchanment ES sync meta logic. ([Doris On ES][Refactor] refactor and enchanment ES sync meta logic #4012)
- [Doris On ES][Enhancement] Ignore _total node for efficiency and fully trusted document count ([Doris On ES][Optimization] Ignore _total node for efficiency and fully trusted document count #3932)
- [ColocateJoin] Support table join itself by colocate join ([ColocateJoin] ColocateJoin support table join itself (#4230) #4231)
- [Load] Support import true or false as boolean value (Support import true or false as boolean value #3898)
- [Meta tool] Add segment v2 footer meta viewer (Add segment v2 footer meta viewer #3822)
API Change
- [DynamicPartition] Optimize the rule of creating dynamic partition ([DynamicPartition] Optimize the rule of creating dynamic partition #3679)
- [SegmentV2] Change the default storage format to SegmentV2 ([SegmentV2] Change the default storage format to SegmentV2 #4387)
- [License] Organize and modify the license of the code ([License] Organize and modify the license of the code #4371)
- [UDF] Fix large string val allocation failure (Fix large string val allocation failure #3724)
- Support utf-8 encoding in instr, locate, locate_pos, lpad, rpad (Support utf-8 encoding in instr, locate, locate_pos, lpad, rpad #3638)
Credits
@ZhangYu0123
@wfjcmcb
@Fullstop000
@sduzh
@stalary
@worker24h
@chaoyli
@vagetablechicken
@jmk1011
@funyeah
@wutiangan
@gengjun-git
@xinghuayu007
@EmmyMiao87
@songenjie
@acelyc111
@yangzhg
@Seaven
@hexian55
@ChenXiaoFei
@WingsGo
@kangpinghuang
@wangbo
@weizuo93
@sdgshawn
@skyduy
@wyb
@gaodayue
@HappenLee
@kangkaisen
@wuyunfeng
@HangyuanLiu
@xy720
@liutang123
@caiconghui
@liyuance
@spaces-X
@hffariel
@decster
@blackfox1983
@Astralidea
@morningman
@hf200012
@xbyang18
@Youngwb
@imay
@marising
@caoyang10