Skip to content

Release Notes 0.13.0 #4370

@EmmyMiao87

Description

@EmmyMiao87

New Feature

Query spill to disk

Doris supports query spill to disk in sorting and window functions. When the enable_spilling is true and memory limit is reached, the query will spill to disk so as to avoid the problem of unable to query due to memory bottleneck. The 0.13 version supports spill in sort and window function.

[#3820] [#4151] [#4152]

Support bitmap_union, hll_union and count in materialized view

Materialized view supports richer aggregate functions: bitmap_union, hll_union and count. In the Order scenario, user needs to analyze the number of orders in different dimensions by count. Also the pre-calculation of bitmap and hll function can be performed for some deduplication analysis scenarios such as analyzing PV and UV data in website traffic. Doris can automatically match the user's query to an optimal materialized view to speed up the query.

[#3651] [#3677] [#3705] [#3873] [#4014] [#3677]

Spark load

Spark load implements the preprocessing of imported data through external Spark resources, improves the import performance of Doris large data volume and saves Doris cluster computing resources. It is mainly used for scenarios where a large amount of data is imported into Doris during the initial migration.

[#3418] [#3712] [#3715] [#3716]

Support load json-data into Doris by RoutineLoad or StreamLoad

RoutineLoad and StreamLoad support a new data format: json. The data in json format is finally imported into Doris through the transform rules in the load statement. This function is especially beneficial for log services whose original data format is json. Users no longer need to process the data into csv format in the outer layer.

[#3553]

Modify routine load

The properties of routine load such as concurrency, Kafka consumption progress could be modify by ALTER ROUTINE LOAD stmt. Only jobs in the PAUSED state can be modified. After routine load is modified, the newly set properties will be used to plan the task when the task is scheduled again.

[#4158]

Support fetch _id from ES and create table with wildcard or aliase index of ES

There is _id field from native ES document which is primary-key for ES index. This field could be fetch by Doris on ES. Also, Doris support create external table with aliases or wildcard index such as log_*. User can easily search all those index by using aliases and wildcards to match those indexes.

[#3900] [#3968]

Logstash Doris output plugin

Logstash plugin is used to output data to Doris for logstash. Use the HTTP protocol to interact with the Doris FE Http interface
Load data through Doris's stream load.

[#3800]

Support SELECT INTO OUTFILE

Doris currently supports exporting query results to a third-party file system such as HDFS, S3, BOS. The grammar is referenced from the MySQL grammar manual. The export format is CSV. The export query results could be provide to other users to download or further processing by other systems. Especially good for this kind that the result reset is too large to through the MySQL protocol such as a large number of ids by bitmap_to_string.

[#3584]

Support in predicate in delete statement

The delete statement supports conditions for IN or NOT IN predicate. Users can delete rows that meet different values through this function.

[#4006]

Enhancement

Compaction rules optimization

This optimization updated the strategy for triggering compaction, a version merging strategy that compromises write amplification, space amplification, and read performance (it tends to merge files of adjacent sizes). When the number of the same version is the same, the number of merges is reduced and the total number of files is reduced.

[#4212]

Simplify the delete process to make it fast

The load checker of the rotation training during deletion is cancelled and replaced by txn callback, which will reduce the corresponding time of the delete command to the millisecond level.

[#3191]

Support simple transitivity on join predicate pushdown

When the columns involved in the query filter predicate are consistent with the columns involved in the join condition, the filter predicate can conduct column transmission and also filter another table in the join, reducing the amount of data and achieving the effect of improving the query speed.

[#3453]

Non blocking OlapTableSink

In this optimization, the sending process and the adding row process are executed concurrently in OlapTableSink, and the load performance is always improved. After testing, 56G broker load, the origin ver will run for 4 hours, the multi-ver can halve the time.

[#3143]

Support txn management in db level and use ArrayDeque to improve txn task performance

The transaction management part supports the division of db levels, and each db does not block each other, which improves the execution efficiency of transaction tasks

[#3369]

Improve the performance of query with IN predicate

Add a new config max_pushdown_conditions_per_column to limit the number of conditions of a single column that can be pushed down to the storage engine. It is different from the previous configuration that controls the split scan key. The default value alone is 1024. After the two configurations are separated, the qps of Doris has improved, and the CPU usage rate has also decreased.

[#3694]

Optimized the speed of reading parquet files

There is a cache buffer array in broker reading process when reading parquet file. When a broker about to seek for a position and get data from remote parquet file, try reading with this position in the cache buffer array. Once the expected data hits the cache buffer array, then we don't bother to read data from remote parquet file. After testing, the load time of parquet file in broker or spark load can halve the time.

[#3878]

New Built-in Functions

Other

API Change

Credits

@ZhangYu0123
@wfjcmcb
@Fullstop000
@sduzh
@stalary
@worker24h
@chaoyli
@vagetablechicken
@jmk1011
@funyeah
@wutiangan
@gengjun-git
@xinghuayu007
@EmmyMiao87
@songenjie
@acelyc111
@yangzhg
@Seaven
@hexian55
@ChenXiaoFei
@WingsGo
@kangpinghuang
@wangbo
@weizuo93
@sdgshawn
@skyduy
@wyb
@gaodayue
@HappenLee
@kangkaisen
@wuyunfeng
@HangyuanLiu
@xy720
@liutang123
@caiconghui
@liyuance
@spaces-X
@hffariel
@decster
@blackfox1983
@Astralidea
@morningman
@hf200012
@xbyang18
@Youngwb
@imay
@marising
@caoyang10

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions