Skip to content

Commit

Permalink
[Feature][API & Connector & Doc] add parallelism and column projectio…
Browse files Browse the repository at this point in the history
…n interface (apache#3829)

* add transform doc

* add transform v2 document

* remove transform v1 from document

* improve document

* fix dead link

* fix dead link

* fix dead link

* update supported connnector num

* Update docs/en/transform-v2/replace.md

Co-authored-by: Zongwen Li <zongwen.li.tech@gmail.com>

* fix ci

* fix ci error

* add Parallelism and SchemaProjection inteface to Source Connector

* update schemaprojection to columnprojection

* fix code style

* tmp

* revert FactoryUtil update

Co-authored-by: Zongwen Li <zongwen.li.tech@gmail.com>
  • Loading branch information
2 people authored and lhyundeadsoul committed Jan 3, 2023
1 parent 212ac5d commit 0e7638d
Show file tree
Hide file tree
Showing 104 changed files with 211 additions and 147 deletions.
12 changes: 5 additions & 7 deletions docs/en/concept/connector-v2-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,13 @@ and then locate the **Split** and **offset** read last time and continue to send

For example `File`, `Kafka`.

### schema projection
### column projection

If the source connector supports selective reading of certain columns or redefine columns order or supports the data format read through `schema` params, we think it supports schema projection.
If the connector supports reading only specified columns from the data source (note that if you read all columns first and then filter unnecessary columns through the schema, this method is not a real column projection)

For example `JDBCSource` can use sql define read columns, `KafkaSource` can use `schema` params to define the read schema.
For example `JDBCSource` can use sql define read columns.

`KafkaSource` will read all content from topic and then use `schema` to filter unnecessary columns, This is not `column projection`.

### batch

Expand Down Expand Up @@ -60,10 +62,6 @@ For sink connector, the sink connector supports exactly-once if any piece of dat
* The target database supports key deduplication. For example `MySQL`, `Kudu`.
* The target support **XA Transaction**(This transaction can be used across sessions. Even if the program that created the transaction has ended, the newly started program only needs to know the ID of the last transaction to resubmit or roll back the transaction). Then we can use **Two-phase Commit** to ensure **exactly-once**. For example `File`, `MySQL`.

### schema projection

If a sink connector supports the fields and their types or redefine columns order written in the configuration, we think it supports schema projection.

### cdc(change data capture)

If a sink connector supports writing row kinds(INSERT/UPDATE_BEFORE/UPDATE_AFTER/DELETE) based on primary key, we think it supports cdc(change data capture).
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/AmazonDynamoDB.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ Write data to Amazon DynamoDB
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Assert.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ A flink sink plugin which can assert illegal data by user defined rules
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [x] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Cassandra.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ Write data to Apache Cassandra.
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Clickhouse.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ Used to write data to Clickhouse.

The Clickhouse sink plug-in can achieve accuracy once by implementing idempotent writing, and needs to cooperate with aggregatingmergetree and other engines that support deduplication.

- [ ] [schema projection](../../concept/connector-v2-features.md)
- [x] [cdc](../../concept/connector-v2-features.md)

:::tip
Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/ClickhouseFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ should be `true`. Supports Batch and Streaming mode.
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

:::tip

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Console.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ Used to send data to Console. Both support streaming and batch mode.
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Datahub.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ A sink plugin which use send message to DataHub
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/DingTalk.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ A sink plugin which use DingTalk robot send message
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Doris.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ The internal implementation of Doris sink connector is cached and imported by st
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Elasticsearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ Output data to `Elasticsearch`.
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)
- [x] [cdc](../../concept/connector-v2-features.md)

:::tip
Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Email.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ Send the data as a file to email.
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Enterprise-WeChat.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ A sink plugin which use Enterprise WeChat robot send message
## Key features
- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)
## Options
Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Feishu.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ Used to launch Feishu web hooks using data.
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/FtpFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you

By default, we use 2PC commit to ensure `exactly-once`

- [ ] [schema projection](../../concept/connector-v2-features.md)
- [x] file format
- [x] text
- [x] csv
Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Greenplum.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ Write data to Greenplum using [Jdbc connector](Jdbc.md).
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

:::tip

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/HdfsFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you

By default, we use 2PC commit to ensure `exactly-once`

- [ ] [schema projection](../../concept/connector-v2-features.md)
- [x] file format
- [x] text
- [x] csv
Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Hive.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ If you use SeaTunnel Engine, You need put seatunnel-hadoop3-3.1.4-uber.jar and h

By default, we use 2PC commit to ensure `exactly-once`

- [ ] [schema projection](../../concept/connector-v2-features.md)
- [x] file format
- [x] text
- [x] csv
Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Http.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ Used to launch web hooks using data.
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/InfluxDB.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ Write data to InfluxDB.
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
14 changes: 6 additions & 8 deletions docs/en/connector-v2/sink/IoTDB.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,19 @@

Used to write data to IoTDB.

:::tip

There is a conflict of thrift version between IoTDB and Spark.Therefore, you need to execute `rm -f $SPARK_HOME/jars/libthrift*` and `cp $IOTDB_HOME/lib/libthrift* $SPARK_HOME/jars/` to resolve it.

:::

## Key features

- [x] [exactly-once](../../concept/connector-v2-features.md)

IoTDB supports the `exactly-once` feature through idempotent writing. If two pieces of data have
the same `key` and `timestamp`, the new data will overwrite the old one.

- [ ] [schema projection](../../concept/connector-v2-features.md)

:::tip

There is a conflict of thrift version between IoTDB and Spark.Therefore, you need to execute `rm -f $SPARK_HOME/jars/libthrift*` and `cp $IOTDB_HOME/lib/libthrift* $SPARK_HOME/jars/` to resolve it.

:::

## Options

| name | type | required | default value |
Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Jdbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ e.g. If you use MySQL, should download and copy `mysql-connector-java-xxx.jar` t
Use `Xa transactions` to ensure `exactly-once`. So only support `exactly-once` for the database which is
support `Xa transactions`. You can set `is_exactly_once=true` to enable it.

- [ ] [schema projection](../../concept/connector-v2-features.md)
- [x] [cdc](../../concept/connector-v2-features.md)

## Options
Expand Down
2 changes: 0 additions & 2 deletions docs/en/connector-v2/sink/Kafka.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ Write Rows to a Kafka topic.

By default, we will use 2pc to guarantee the message is sent to kafka exactly once.

- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

| name | type | required | default value |
Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Kudu.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ Write data to Kudu.
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/LocalFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you

By default, we use 2PC commit to ensure `exactly-once`

- [ ] [schema projection](../../concept/connector-v2-features.md)
- [x] file format
- [x] text
- [x] csv
Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Maxcompute.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ Used to read data from Maxcompute.
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
5 changes: 0 additions & 5 deletions docs/en/connector-v2/sink/MongoDB.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,7 @@ Write data to `MongoDB`

## Key features

- [x] [batch](../../concept/connector-v2-features.md)
- [x] [stream](../../concept/connector-v2-features.md)
- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)
- [ ] [parallelism](../../concept/connector-v2-features.md)
- [ ] [support user-defined split](../../concept/connector-v2-features.md)

## Options

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Neo4j.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ Write data to Neo4j.
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/OssFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ It only supports hadoop version **2.9.X+**.

By default, we use 2PC commit to ensure `exactly-once`

- [ ] [schema projection](../../concept/connector-v2-features.md)
- [x] file format
- [x] text
- [x] csv
Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/OssJindoFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ It only supports hadoop version **2.9.X+**.

By default, we use 2PC commit to ensure `exactly-once`

- [ ] [schema projection](../../concept/connector-v2-features.md)
- [x] file format
- [x] text
- [x] csv
Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Phoenix.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ Two ways of connecting Phoenix with Java JDBC. One is to connect to zookeeper th
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Rabbitmq.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ Used to write data to Rabbitmq.
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Redis.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ Used to write data to Redis.
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/S3-Redshift.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ Output data to AWS Redshift.

By default, we use 2PC commit to ensure `exactly-once`

- [ ] [schema projection](../../concept/connector-v2-features.md)
- [x] file format
- [x] text
- [x] csv
Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/S3File.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ To use this connector you need put hadoop-aws-3.1.4.jar and aws-java-sdk-bundle-

By default, we use 2PC commit to ensure `exactly-once`

- [ ] [schema projection](../../concept/connector-v2-features.md)
- [x] file format
- [x] text
- [x] csv
Expand Down
2 changes: 0 additions & 2 deletions docs/en/connector-v2/sink/Sentry.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ Write message to Sentry.
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)


## Options

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/SftpFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you

By default, we use 2PC commit to ensure `exactly-once`

- [ ] [schema projection](../../concept/connector-v2-features.md)
- [x] file format
- [x] text
- [x] csv
Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Slack.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ Used to send data to Slack Channel. Both support streaming and batch mode.
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Socket.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ Used to send data to Socket Server. Both support streaming and batch mode.
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/StarRocks.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ The internal implementation of StarRocks sink connector is cached and imported b
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
1 change: 0 additions & 1 deletion docs/en/connector-v2/sink/Tablestore.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ Write data to `Tablestore`
## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
2 changes: 1 addition & 1 deletion docs/en/connector-v2/source/AmazonDynamoDB.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Read data from Amazon DynamoDB.
- [x] [batch](../../concept/connector-v2-features.md)
- [ ] [stream](../../concept/connector-v2-features.md)
- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [x] [schema projection](../../concept/connector-v2-features.md)
- [ ] [column projection](../../concept/connector-v2-features.md)
- [ ] [parallelism](../../concept/connector-v2-features.md)
- [ ] [support user-defined split](../../concept/connector-v2-features.md)

Expand Down
2 changes: 1 addition & 1 deletion docs/en/connector-v2/source/Cassandra.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Read data from Apache Cassandra.
- [x] [batch](../../concept/connector-v2-features.md)
- [ ] [stream](../../concept/connector-v2-features.md)
- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [x] [schema projection](../../concept/connector-v2-features.md)
- [x] [column projection](../../concept/connector-v2-features.md)
- [ ] [parallelism](../../concept/connector-v2-features.md)
- [ ] [support user-defined split](../../concept/connector-v2-features.md)

Expand Down
Loading

0 comments on commit 0e7638d

Please sign in to comment.