Release Note 1.1.0 #9949

morningman · 2022-06-02T12:16:37Z

Release Note 1.1.0

Upgrade Notes

the enable_vectorized_engine session variable is set to true by default

So that all queries will be executed in vectorized query engine.
the BE binary file renames to doris_be

If you have previously relied on the process name for cluster management and other operations, please pay attention to modifying the relevant scripts.
Segment v1 format will no longer be supported in the next version, please complete the data conversion in version 1.1.

Features

1. Support Random Distribution (Experimental)

This feature is suitable for scenarios such as logs. In this distribution method, the data in an load task will be randomly
written into a single tablet to reduce data fanout during the loading process, reduce resource overhead and improve
load stability.

2. Support for creating Iceberg external table

Supports creation of Iceberg external tables and query data on it.
Supports automatic synchronization of all Iceberg tables in a database.
http://doris.apache.org/docs/ecosystem/external-table/iceberg-of-doris.html

3. Support the compression method of a table

The default compression method for Doris table is LZ4F.
Optionally specify the compression method as ZSTD for higher compression ratios.

Improvements

1. More comprehensive vectorization engine support

Support vectorized implementation of all built-in functions.
The storage layer is vectorized and supports dictionary optimization for low-cardinality string columns.
Optimize and fix a large number of performance and stability issues of vectorization engines.

Compared with the 0.15 and 1.0 version, there is a significant improvement:
http://doris.apache.org/docs/benchmark/ssb.html

2. Optimize the compaction logic

Optimize the rowset selection strategy, quickly merge the newly imported data versions, greatly reduce the number of data versions, and improve query performance.

For 20 concurrency load task, 5000 rows per job, 1s interval, the compaction score is stable below 50.

3. Optimize the read speed of Parquet and ORC files

Using multi-threaded prefetching, the reading speed is increased by 5X.

4. Safer metadata checkpoint

Through the double-check of image files generated after metadata checkpoint and the feature of
retaining historical image files, the problem of metadata corruption caused by image file errors is solved.

Bug Fix

1. Fix the problem that the data cannot be queried due to the missing data version.(Serious)

This issue was introduced in version 1.0 and may result in the loss of data versions for multiple replicas.
If you encounter this problem, you can try to fix it with #9266.

2. Fix the problem that the resource isolation is invalid for the resource usage limit of loading tasks (Moderate)

In 1.1, the broker load and routine load will use Backends with specified resource tags to do the load.

3. Use HTTP BRPC to transfer network data packets over 2GB (Moderate)

In the previous version, when the data transmitted between Backends through BRPC exceeded 2GB,
it may cause data transmission errors.

Behavior Changes

1. Query layer and storage layer vectorization is enabled by default

2. Disable Mini Load

The /_load interface is disabled by default, please use the /_stream_load interface uniformly.
Of course, you can re-enable it by turning off the FE configuration item disable_mini_load.

The Mini Load interface will be completely removed in version 1.2.

3. Completely disable the SegmentV1 storage format

Data in SegmentV1 format is no longer allowed to be created. Existing data can continue to be accessed normally.
You can use the ADMIN SHOW TABLET STORAGE FORMAT statement to check whether the data in SegmentV1 format
still exists in the cluster. And convert to SegmentV2 through the data conversion command

Access to SegmentV1 data will no longer be supported in version 1.2.

4. Limit the maximum length of String type

#8567
In previous versions, String types were allowed a maximum length of 2GB.
In version 1.1, we will limit the maximum length of the string type to 1MB. Strings longer than this length cannot be written anymore.
At the same time, using the String type as a partitioning or bucketing column of a table is no longer supported.

The String type that has been written can be accessed normally.

5. Fix fastjson related vulnerabilities

#9763

6. Added `ADMIN DIAGNOSE TABLET` command

Used to quickly diagnose problems with the specified tablet.

Thanks

Thanks to everyone who has contributed to this release:

@adonis0147
@airborne12
@amosbird
@aopangzi
@arthuryangcs
@awakeljw
@BePPPower
@BiteTheDDDDt
@bridgeDream
@caiconghui
@cambyzju
@ccoffline
@chenlinzhong
@daikon12
@DarvenDuan
@dataalive
@dataroaring
@deardeng
@Doris-Extras
@emerkfu
@EmmyMiao87
@englefly
@Gabriel39
@GoGoWen
@gtchaos
@HappenLee
@hello-stephen
@Henry2SS
@hewei-nju
@hf200012
@jacktengg
@jackwener
@Jibing-Li
@JNSimba
@kangshisen
@Kikyou1997
@kylinmac
@Lchangliang
@leo65535
@liaoxin01
@liutang123
@lovingfeel
@luozenglin
@luwei16
@luzhijing
@mklzl
@morningman
@morrySnow
@nextdreamblue
@Nivane
@pengxiangyu
@qidaye
@qzsee
@SaintBacchus
@SleepyBear96
@smallhibiscus
@spaces-X
@stalary
@starocean999
@steadyBoy
@SWJTU-ZhangLei
@Tanya-W
@tarepanda1024
@tianhui5
@Userwhite
@wangbo
@wangyf0555
@weizuo93
@whutpencil
@wsjz
@wunan1210
@xiaokang
@xinyiZzz
@xlwh
@xy720
@yangzhg
@Yankee24
@yiguolei
@yinzhijian
@yixiutt
@zbtzbtzbt
@zenoyang
@zhangstar333
@zhangyifan27
@zhannngchen
@zhengshengjun
@zhengshiJ
@zingdle
@zuochunwei
@zy-kkk

The text was updated successfully, but these errors were encountered:

Gabriel39 · 2022-06-02T16:35:44Z

Maybe Java UDF should be mentioned in features as we have supported all types except HLL and Bitmap

morningman · 2022-06-17T12:02:54Z

Maybe Java UDF should be mentioned in features as we have supported all types except HLL and Bitmap

The Java UDF is not fully tested, and UDAF is still work in progress. So I suggest to leave it to the next release.

kpfly · 2022-06-24T03:13:05Z

support hive on s3 ?
some performance improvements ?

morningman · 2022-07-03T14:14:51Z

support hive on s3 ?

some performance improvements ?

hive on s3 is supported in this release. And the performance improvements can be seen here: http://doris.apache.org/docs/benchmark/ssb.html

vkingnew · 2022-07-13T09:28:28Z

This edtion have a full test of TPC-DS ?

morningman · 2022-07-14T05:42:02Z

Not yet. we just test it with TPCDS 1000. It works with 96/99 SQL, but no performance tunning.
This may be done in later releases.

morningman added the release notes label Jun 2, 2022

morningman pinned this issue Jun 2, 2022

morningman changed the title ~~Release note 1.1 (Draft)~~ Release Note 1.1 (Draft) Jun 2, 2022

morningman changed the title ~~Release Note 1.1 (Draft)~~ Release Note 1.1.0 Jul 3, 2022

yiguolei unpinned this issue Jul 30, 2022

morningman closed this as completed Oct 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release Note 1.1.0 #9949

Release Note 1.1.0 #9949

morningman commented Jun 2, 2022 •

edited

Loading

Gabriel39 commented Jun 2, 2022

morningman commented Jun 17, 2022

kpfly commented Jun 24, 2022

morningman commented Jul 3, 2022 •

edited

Loading

vkingnew commented Jul 13, 2022

morningman commented Jul 14, 2022

Release Note 1.1.0 #9949

Release Note 1.1.0 #9949

Comments

morningman commented Jun 2, 2022 • edited Loading

Release Note 1.1.0

Upgrade Notes

Features

1. Support Random Distribution (Experimental)

2. Support for creating Iceberg external table

3. Support the compression method of a table

Improvements

1. More comprehensive vectorization engine support

2. Optimize the compaction logic

3. Optimize the read speed of Parquet and ORC files

4. Safer metadata checkpoint

Bug Fix

1. Fix the problem that the data cannot be queried due to the missing data version.(Serious)

2. Fix the problem that the resource isolation is invalid for the resource usage limit of loading tasks (Moderate)

3. Use HTTP BRPC to transfer network data packets over 2GB (Moderate)

Behavior Changes

1. Query layer and storage layer vectorization is enabled by default

2. Disable Mini Load

3. Completely disable the SegmentV1 storage format

4. Limit the maximum length of String type

5. Fix fastjson related vulnerabilities

6. Added ADMIN DIAGNOSE TABLET command

Thanks

Gabriel39 commented Jun 2, 2022

morningman commented Jun 17, 2022

kpfly commented Jun 24, 2022

morningman commented Jul 3, 2022 • edited Loading

vkingnew commented Jul 13, 2022

morningman commented Jul 14, 2022

morningman commented Jun 2, 2022 •

edited

Loading

6. Added `ADMIN DIAGNOSE TABLET` command

morningman commented Jul 3, 2022 •

edited

Loading