Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doris Roadmap 2024 #30669

Open
morningman opened this issue Jan 31, 2024 · 25 comments
Open

Doris Roadmap 2024 #30669

morningman opened this issue Jan 31, 2024 · 25 comments
Labels
Discuss kind/community Issues or PRs related to Doris community

Comments

@morningman
Copy link
Contributor

morningman commented Jan 31, 2024

Roadmap 2023
Roadmap 2022

Separation of Storage and Computation

  • Flexibility & Stateless
    • Stateless BE node
    • Stateless FE node
  • Load Isolation
    • Multi cluster support
    • Read & write isolation
  • More storage support
    • AWS S3
    • Aliyun OSS
    • Tencent Cloud COS
    • Huawei Cloud OBS
    • Baidu Cloud BOS
    • GCP
    • Azure
    • HDFS
  • Performance
    • Optimized cache policy
    • Optimization for cold data querying
  • Support data deletion
  • SLA
    • Upgrade BE with no impact
    • Upgrade FE with no impact
  • Reliability
    • Snapshot & Time travel
    • Enhanced backup & restore
  • Data sharing

Async Materialized View

  • Build materialized view

    • Support full refresh
    • Support partition level refresh
    • Support building mv from olap table
    • Support building mv from hive table
    • Support building mv from iceberg table
    • Support building mv from hudi table
    • Nested materialized view with DAG
    • Incremental building for external table with partition granularity
    • Support partition rollup
    • Support partition TTL
    • Support REPLACE operation
    • Support refresh materialized view by time range
  • Transparent Rewriting

    • Support aggregation and rollup
    • Support join
    • Query Partial rewriting
    • Rewriting supports nested materialized view
  • Materialized view management

    • Materialized view recommendation

Semi Structure Data Analysis

  • Inverted Index

    • Support Inverted Index
    • Merging index files
    • Working with separation of storage and computation
    • Speed up the data loading with inverted index
  • VARIANT data type

    • Support VARIANT data type
    • Working with inverted index

Query Optimizer

  • Basic framework

    • Fully supports DQL, DML and DDL
    • Optimized memory consumption
    • Optimized apply order of RBO rules
    • Improved efficiency of Cascades enumeration
  • Planning quality

    • Statistics
      • Support statistical for synced materialized views
      • Support partition level statistics collection
      • Supports histogram statistics collection
    • New distributed cost model
      • Optimized distributed cost model framework
      • Support runtime cost revaluation
      • Supports more accurate operator cost fitting models
    • Rules and enumerations
      • Expand RBO rules
      • Improve the quality of Cascades enumeration plan
      • Enhanced dphyper enumeration framework function, supports outer join enumeration and CDC
    • Enhance runtime filter adaptive capability
      • Adaptive runtime filter size
      • Adaptive runtime filter type
      • Adaptive runtime filter waiting time
    • Supports histogram-based data skew adaptive processing framework

DataLake Analysis

  • Support more file format

    • RCFile
    • SequenceFile
  • Support more lake format

    • Support Iceberg with ORC
    • Support Iceberg Equality Delete
    • Support more systable on Hudi
    • Support CDC scan on Hudi
    • Support more systable on Paimon
  • Trino Connector compatibility

    • Trino Connector compatibility framework
    • Support Trino DeltaLake Connector
    • Support Trino Bigquery Connector
    • Support Trino Cassandra Connector
  • Datalake write back

    • Hive
      • Support unpartitioned table
      • Support partitioned table
      • Support INSERT OVERWRITE
      • Support INSERT
    • Iceberg
      • Support unpartitioned table
      • Support partitioned table
      • Support update and delete
    • Hudi
    • Paimon
  • Enhanced JDBC Catalog

    • Support DB2
    • Support sharded database
    • Support query concurrency
  • Enhanced file analysis

    • Support insert into table value function
  • Enhanced file cache

    • Support memory-level file cache
    • Enhanced cache statistic and hits analysis
  • Integrate with Apache Ranger

    • Support Catalog/Database/Table/Resource/WorkloadGroup auth
    • Support row policy
    • Support data mask
    • Support column level privilege
  • SQL dialect support

    • Presto/Trino
    • Spark
    • Hive
    • Clickhouse
    • Oracle
    • Postgres

Query Processing

  • Resource Isolation
    • Support hard/soft resource isolation for Query & Load
    • Enhance the visibility of resource usage
    • Automatically workload management at runtime
  • Support store procedure
  • Support Spill to disk
    • Sort Operator
    • Aggregate Operator
    • Join Operator
  • Working with shuffle service
  • Stage by stage query processing

Storage Engine

  • Data Loading
    • Support auto partiton when loading
    • Zero-ETL: Built-in data integration from OLTP CDC to Doris
    • Support transactional multi table INSERT INTO
    • Support MERGE INTO
  • Data Modeling
  • Cross cluster replication
    • Support Master/Slave switch
    • Support cross region deployment
    • Work with separation of storage and computation
  • Support data binlog
  • Enhanced Z-order index
  • Optimized high-concurrency point query

Ecosystem & Tools

  • Cluster Manager for Apache Doris
    • Support agent mode
    • Support k8s
    • Enhanced monitor and alert management
    • Visualized profile analysis
    • Support Notebook
    • Built-in visualized BI reports
  • Doris StreamLoader tool
  • Doris Operator
  • X2Doris
    • Support Hive to Doris
    • Support Doris to Doris
    • Support Kudu to Doris
    • Support StarRocks to Doris
    • Support Clickhouse to Doris
  • BI tools compatibility
    • Superset
    • Metabase
    • Navicat
    • Datagrip
    • Dbeaver
    • SmartBI
    • FineBI
  • Data Integration
    • Kettle
@morningman morningman added kind/community Issues or PRs related to Doris community Discuss labels Jan 31, 2024
@morningman morningman pinned this issue Jan 31, 2024
@vinlee19
Copy link
Contributor

Currently, I have completed the development and testing of the JDBC catalog for Apache Druid. If possible, I would like to contribute this feature. PR:#27270

@vinlee19
Copy link
Contributor

vinlee19 commented Jan 31, 2024

Flink-connector-doris will use FlinkCDC to synchronize multiple tables or the entire database from MongoDB and DB2 to Doris.

@michael1991
Copy link
Contributor

typo "Mutlt cluster support" => "Multi cluster support"

@morningman morningman changed the title [Draft] Doris Roadmap 2024 Doris Roadmap 2024 Feb 2, 2024
@Hanchers
Copy link

Hanchers commented Feb 4, 2024

Looking forward to version 2.1

@longzmkm
Copy link

longzmkm commented Feb 6, 2024

Looking forward to version 2.1

me too

@vonwind
Copy link

vonwind commented Feb 6, 2024

Walking with innovators

@zhbdesign
Copy link

Support generating columns

@morningman
Copy link
Contributor Author

Currently, I have completed the development and testing of the JDBC catalog for Apache Druid. If possible, I would like to contribute this feature. PR:#27270

HI @vinlee19 , thanks for your contribution. I'm not sure if it is suitable for Druid to using JDBC as data connector? I'm concerning the performance issue. But indeed Trino is using JDBC connect Druid.
I will take a look at this feature, and could you please also provide test cases (eg. druid docker compose)?

@liugddx
Copy link
Member

liugddx commented Feb 18, 2024

dbeaver/dbeaver#22836

@liugddx
Copy link
Member

liugddx commented Feb 18, 2024

@cs3163077
Copy link

Looking forward to version 2.1

me too

1 similar comment
@sdhzwc
Copy link
Contributor

sdhzwc commented Feb 19, 2024

Looking forward to version 2.1

me too

@dragonkid
Copy link

why there is no 'Support building mv from Paimon table'

@qianmoQ
Copy link

qianmoQ commented Feb 28, 2024

BI tools compatibility Can it be adapted to https://github.com/devlive-community/datacap?

@morningman
Copy link
Contributor Author

why there is no 'Support building mv from Paimon table'

It will be supported

@morningman
Copy link
Contributor Author

BI tools compatibility Can it be adapted to https://github.com/devlive-community/datacap?
Hi @qianmoQ,
I am not familiar with datacap, but you are very welcome to helping Doris adapt to it.
I saw the Doris is on the log wall, maybe you can post a blog on Doris website about how to connect to Doris using datacap?

@ShawshankLin
Copy link

ShawshankLin commented Mar 9, 2024

Support transactional multi table DELETE INSERT for adapting aggregate tables in dbt's Incremental models
https://docs.getdbt.com/docs/build/incremental-models

@JiangJamm
Copy link

Look forward to supporting DataGrip and kettle!

@zhangm365
Copy link

zhangm365 commented May 16, 2024

The correct url for async-materialized-view item is as follows:
https://doris.apache.org/docs/query/view-materialized-view/async-materialized-view

@shiliming
Copy link

binlog,binlog,binlog!

@mohuaiyuan
Copy link

Which version of Doris is preparing to support DB2?

@johnpyp
Copy link

johnpyp commented Jul 12, 2024

On the "Index Overview" page in the docs, I see that Inverted Indexes have Accelerates LIKE marked as "COMING" - is that part of the 2024 roadmap? That would be amazing :)

@rudyricci
Copy link

HI,
The 2024 roadmap lacks support for AWS S3 via the IAM role, an activity that was marked in the 2023 roadmap. I think it is very important to avoid having hardcoded credentials for security reasons.
See #35928

@malthe
Copy link

malthe commented Aug 9, 2024

According to this roadmap, inverted indexes are not yet "working with separation of storage and computation", but is there an issue tracking this?

morningman pushed a commit that referenced this issue Sep 23, 2024
…al/hdfs/s3 (#41080)

## Proposed changes

Issue Number: #30669 

<!--Describe your changes.-->

This change supports reading the contents of external file tables from
rcbinary, rctext, and sequence files via the JNI connector.

todo-lists:
- [x] Support read rc_binary files using local tvf
- [x] Support read rc_text/sequence files using local tvf
- [x] Support using s3/hdfs tvf

Example:

**sequence file:**
input:
``` mysql
select * from local( "file_path" = "test/test.seq", "format" = "sequence", "backend_id" = "10011", "hive_schema"="k1:tinyint;k2:smallint;k3:int;k4:bigint;k5:float;k6:double;k7:decimal(10,2);k8:string;k9:char(10);k10:varchar(20);k11:boolean;k12:timestamp;k13:date;k14:array<string>;k15:map<string,int>;k16:struct<name:string,age:int>");
```
output:
```
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
| k1   | k2   | k3   | k4          | k5   | k6    | k7    | k8    | k9         | k10     | k11  | k12                 | k13        | k14             | k15                  | k16                       |
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
|    7 |   13 |   74 | 13000000000 | 6.15 | 4.376 | 57.30 | world | Char       | Varchar |    1 | 2022-01-01 10:00:00 | 2022-01-01 | ["A", "B", "C"] | {"key2":2, "key1":1} | {"name":"John", "age":30} |
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
1 row in set (0.07 sec)
```

**rc_binary file:**
input:
```mysql
select * from local( "file_path" = "test/test.rcbinary", "format" = "rc_binary", "backend_id" = "10011", "hive_schema"="k1:tinyint;k2:smallint;k3:int;k4:bigint;k5:float;k6:double;k7:decimal(10,2);k8:string;k9:char(10);k10:varchar(20);k11:boolean;k12:timestamp;k13:date;k14:array<string>;k15:m
ap<string,int>;k16:struct<name:string,age:int>");
```
output:
```
+------+------+------+-------------+------+------+--------+------+------------+-----------+------+---------------------+------------+-----------------+------------------+-------------------------------+
| k1   | k2   | k3   | k4          | k5   | k6   | k7     | k8   | k9         | k10       | k11  | k12                 | k13        | k14             | k15              | k16                           |
+------+------+------+-------------+------+------+--------+------+------------+-----------+------+---------------------+------------+-----------------+------------------+-------------------------------+
|    1 |    2 |    3 | 10000000000 | 1.23 | 3.14 | 100.50 | you  | are        | beautiful |    0 | 2023-10-29 02:00:00 | 2023-10-29 | ["D", "E", "F"] | {"k2":5, "k1":3} | {"name":"chandler", "age":54} |
+------+------+------+-------------+------+------+--------+------+------------+-----------+------+---------------------+------------+-----------------+------------------+-------------------------------+
1 row in set (0.12 sec)
```

**rc_text file:**
input:
``` mysql
select * from local( "file_path" = "test/test.rctext", "format" = "rc_text", "backend_id" = "10011", "hive_schema"="k1:tiny
int;k2:smallint;k3:int;k4:bigint;k5:float;k6:double;k7:decimal(10,2);k8:string;k9:char(10);k10:varchar(20);k11:boolean;k12:timestamp;k13:date;k14:array<string>;k15:
map<string,int>;k16:struct<name:string,age:int>");
```
output:
```
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
| k1   | k2   | k3   | k4          | k5   | k6    | k7    | k8    | k9         | k10     | k11  | k12                 | k13        | k14             | k15                  | k16                       |
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
|    7 |   13 |   74 | 13000000000 | 6.15 | 4.376 | 57.30 | world | Char       | Varchar |    1 | 2022-01-01 10:00:00 | 2022-01-01 | ["A", "B", "C"] | {"key2":2, "key1":1} | {"name":"John", "age":30} |
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
1 row in set (0.06 sec)
```
morningman pushed a commit that referenced this issue Sep 24, 2024
…al/hdfs/s3 (#41080)

Issue Number: #30669

<!--Describe your changes.-->

This change supports reading the contents of external file tables from
rcbinary, rctext, and sequence files via the JNI connector.

todo-lists:
- [x] Support read rc_binary files using local tvf
- [x] Support read rc_text/sequence files using local tvf
- [x] Support using s3/hdfs tvf

Example:

**sequence file:**
input:
``` mysql
select * from local( "file_path" = "test/test.seq", "format" = "sequence", "backend_id" = "10011", "hive_schema"="k1:tinyint;k2:smallint;k3:int;k4:bigint;k5:float;k6:double;k7:decimal(10,2);k8:string;k9:char(10);k10:varchar(20);k11:boolean;k12:timestamp;k13:date;k14:array<string>;k15:map<string,int>;k16:struct<name:string,age:int>");
```
output:
```
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
| k1   | k2   | k3   | k4          | k5   | k6    | k7    | k8    | k9         | k10     | k11  | k12                 | k13        | k14             | k15                  | k16                       |
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
|    7 |   13 |   74 | 13000000000 | 6.15 | 4.376 | 57.30 | world | Char       | Varchar |    1 | 2022-01-01 10:00:00 | 2022-01-01 | ["A", "B", "C"] | {"key2":2, "key1":1} | {"name":"John", "age":30} |
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
1 row in set (0.07 sec)
```

**rc_binary file:**
input:
```mysql
select * from local( "file_path" = "test/test.rcbinary", "format" = "rc_binary", "backend_id" = "10011", "hive_schema"="k1:tinyint;k2:smallint;k3:int;k4:bigint;k5:float;k6:double;k7:decimal(10,2);k8:string;k9:char(10);k10:varchar(20);k11:boolean;k12:timestamp;k13:date;k14:array<string>;k15:m
ap<string,int>;k16:struct<name:string,age:int>");
```
output:
```
+------+------+------+-------------+------+------+--------+------+------------+-----------+------+---------------------+------------+-----------------+------------------+-------------------------------+
| k1   | k2   | k3   | k4          | k5   | k6   | k7     | k8   | k9         | k10       | k11  | k12                 | k13        | k14             | k15              | k16                           |
+------+------+------+-------------+------+------+--------+------+------------+-----------+------+---------------------+------------+-----------------+------------------+-------------------------------+
|    1 |    2 |    3 | 10000000000 | 1.23 | 3.14 | 100.50 | you  | are        | beautiful |    0 | 2023-10-29 02:00:00 | 2023-10-29 | ["D", "E", "F"] | {"k2":5, "k1":3} | {"name":"chandler", "age":54} |
+------+------+------+-------------+------+------+--------+------+------------+-----------+------+---------------------+------------+-----------------+------------------+-------------------------------+
1 row in set (0.12 sec)
```

**rc_text file:**
input:
``` mysql
select * from local( "file_path" = "test/test.rctext", "format" = "rc_text", "backend_id" = "10011", "hive_schema"="k1:tiny
int;k2:smallint;k3:int;k4:bigint;k5:float;k6:double;k7:decimal(10,2);k8:string;k9:char(10);k10:varchar(20);k11:boolean;k12:timestamp;k13:date;k14:array<string>;k15:
map<string,int>;k16:struct<name:string,age:int>");
```
output:
```
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
| k1   | k2   | k3   | k4          | k5   | k6    | k7    | k8    | k9         | k10     | k11  | k12                 | k13        | k14             | k15                  | k16                       |
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
|    7 |   13 |   74 | 13000000000 | 6.15 | 4.376 | 57.30 | world | Char       | Varchar |    1 | 2022-01-01 10:00:00 | 2022-01-01 | ["A", "B", "C"] | {"key2":2, "key1":1} | {"name":"John", "age":30} |
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
1 row in set (0.06 sec)
```
@msridhar78
Copy link
Contributor

Stateless FE feature looks interesting. What does Stateless FE mean? Storing no local states/contexts? Like session states, user authentication states? Another FE can takeover in between query execution? Can you please elaborate on what is planned for this at the high level?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discuss kind/community Issues or PRs related to Doris community
Projects
None yet
Development

No branches or pull requests