Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 18 additions & 18 deletions docs/content.zh/docs/core-concept/data-pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,23 +24,23 @@ specific language governing permissions and limitations
under the License.
-->

# Definition
Since events in Flink CDC flow from the upstream to the downstream in a pipeline manner, the whole ETL task is referred as a **Data Pipeline**.
# 定义
由于在 Flink CDC 中,事件从上游流转到下游遵循 Pipeline 的模式,因此整个 ETL 作业也被称为 **Data Pipeline**

# Parameters
A pipeline corresponds to a chain of operators in Flink.
To describe a Data Pipeline, the following parts are required:
# 参数
一个 pipeline 包含着 Flink 的一组算子链。
为了描述 Data Pipeline,我们需要定义以下部分:
- [source]({{< ref "docs/core-concept/data-source" >}})
- [sink]({{< ref "docs/core-concept/data-sink" >}})
- [pipeline](#pipeline-configurations)

the following parts are optional:
下面 是 Data Pipeline 的一些可选配置:
- [route]({{< ref "docs/core-concept/route" >}})
- [transform]({{< ref "docs/core-concept/transform" >}})

# Example
## Only required
We could use following yaml file to define a concise Data Pipeline describing synchronize all tables under MySQL app_db database to Doris :
# 示例
## 只包含必须部分
我们可以使用以下 yaml 文件来定义一个简单的 Data Pipeline 来同步 MySQL app_db 数据库下的所有表到 Doris

```yaml
source:
Expand All @@ -62,8 +62,8 @@ We could use following yaml file to define a concise Data Pipeline describing sy
parallelism: 2
```

## With optional
We could use following yaml file to define a complicated Data Pipeline describing synchronize all tables under MySQL app_db database to Doris and give specific target database name ods_db and specific target table name prefix ods_ :
## 包含可选部分
我们可以使用以下 yaml 文件来定义一个复杂的 Data Pipeline 来同步 MySQL app_db 数据库下的所有表到 Doris,并给目标数据库名 ods_db 和目标表名前缀 ods_

```yaml
source:
Expand Down Expand Up @@ -108,11 +108,11 @@ We could use following yaml file to define a complicated Data Pipeline describin
classpath: com.example.functions.FormatFunctionClass
```

# Pipeline Configurations
The following config options of Data Pipeline level are supported:
# Pipeline 配置
下面 是 Data Pipeline 的一些可选配置:

| parameter | meaning | optional/required |
|-----------------|-----------------------------------------------------------------------------------------|-------------------|
| name | The name of the pipeline, which will be submitted to the Flink cluster as the job name. | optional |
| parallelism | The global parallelism of the pipeline. Defaults to 1. | optional |
| local-time-zone | The local time zone defines current session time zone id. | optional |
| 参数 | 含义 | optional/required |
|-----------------|---------------------------------------|-------------------|
| name | 这个 pipeline 的名称,会用在 Flink 集群中作为作业的名称。 | optional |
| parallelism | pipeline的全局并发度,默认值是1。 | optional |
| local-time-zone | 作业级别的本地时区。 | optional |
24 changes: 12 additions & 12 deletions docs/content.zh/docs/core-concept/data-sink.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,21 @@ specific language governing permissions and limitations
under the License.
-->

# Definition
**Data Sink** is used to apply schema changes and write change data to external systems.
A Data Sink can write to multiple tables simultaneously.
# 定义
**Data Sink** 是 用来应用 schema 变更并写入 change data 到外部系统的组件。
一个 Data Sink 可以同时写入多个表。

# Parameters
To describe a data sink, the follows are required:
# 参数
为了定义一个 Data Sink,需要提供以下参数:

| parameter | meaning | optional/required |
|-----------------------------|-------------------------------------------------------------------------------------------------|-------------------|
| type | The type of the sink, such as doris or starrocks. | required |
| name | The name of the sink, which is user-defined (a default value provided). | optional |
| configurations of Data Sink | Configurations to build the Data Sink e.g. connection configurations and sink table properties. | optional |
| 参数 | 含义 | optional/required |
|-----------------------------|---------------------------------|-------------------|
| type | sink 的类型,例如 doris 或者 starrocks | required |
| name | sink 的名称,允许用户配置 (提供了一个默认值)。 | optional |
| configurations of Data Sink | 用于构建 sink 组件的配置,例如连接参数或者表属性的配置。 | optional |

# Example
We could use this yaml file to define a doris sink:
# 示例
我们可以使用以下的 yaml 文件来定义一个 doris sink
```yaml
sink:
type: doris
Expand Down
24 changes: 12 additions & 12 deletions docs/content.zh/docs/core-concept/data-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,21 @@ specific language governing permissions and limitations
under the License.
-->

# Definition
**Data Source** is used to access metadata and read the changed data from external systems.
A Data Source can read data from multiple tables simultaneously.
# 定义
**Data Source** 是 用来访问元数据以及从外部系统读取变更数据的组件。
一个 Data Source 可以同时访问多个表。

# Parameters
To describe a data source, the follows are required:
# 参数
为了定义一个 Data Source,需要提供以下参数:

| parameter | meaning | optional/required |
|-------------------------------|-----------------------------------------------------------------------------------------------------|-------------------|
| type | The type of the source, such as mysql. | required |
| name | The name of the source, which is user-defined (a default value provided). | optional |
| configurations of Data Source | Configurations to build the Data Source e.g. connection configurations and source table properties. | optional |
| 参数 | 含义 | optional/required |
|-------------------------------|-----------------------------------|-------------------|
| type | source 的类型,例如 mysql | required |
| name | source 的名称,允许用户配置 (提供了一个默认值)。 | optional |
| configurations of Data Source | 用于构建 source 组件的配置,例如连接参数或者表属性的配置。 | optional |

# Example
We could use yaml files to define a mysql source:
# 示例
我们可以使用yaml文件来定义一个mysql source
```yaml
source:
type: mysql
Expand Down
43 changes: 21 additions & 22 deletions docs/content.zh/docs/core-concept/route.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,24 +24,24 @@ specific language governing permissions and limitations
under the License.
-->

# Definition
**Route** specifies the rule of matching a list of source-table and mapping to sink-table. The most typical scenario is the merge of sub-databases and sub-tables, routing multiple upstream source tables to the same sink table.
# 定义
**Route** 代表一个路由规则,用来匹配一个或多个source 表,并映射到 sink 表。最常见的场景是合并子数据库和子表,将多个上游源表路由到同一个目标表。

# Parameters
To describe a route, the follows are required:
# 参数
为了定义一个路由规则,需要提供以下参数:

| parameter | meaning | optional/required |
|----------------|---------------------------------------------------------------------------------------------|-------------------|
| source-table | Source table id, supports regular expressions | required |
| sink-table | Sink table id, supports symbol replacement | required |
| replace-symbol | Special symbol in sink-table for pattern replacing, will be replaced by original table name | optional |
| description | Routing rule description(a default value provided) | optional |
| 参数 | 含义 | optional/required |
|----------------|------------------------------------------|-------------------|
| source-table | Source table id, 支持正则表达式 | required |
| sink-table | Sink table id,支持符号替换 | required |
| replace-symbol | 用于在 sink-table 中进行模式替换的特殊字符串, 会被源表中的表名替换 | optional |
| description | Route 规则的描述(提供了一个默认描述) | optional |

A route module can contain a list of source-table/sink-table rules.
一个 Route 模块可以包含一个或多个 source-table/sink-table 规则。

# Example
## Route one Data Source table to one Data Sink table
if synchronize the table `web_order` in the database `mydb` to a Doris table `ods_web_order`, we can use this yaml file to define this route
# 示例
## 路由一个 Data Source 表到一个 Data Sink
如果同步一个 `mydb` 数据库中的 `web_order` 表到一个相同库的 `ods_web_order` 表,我们可以使用下面的 yaml 文件来定义这个路由

```yaml
route:
Expand All @@ -50,17 +50,16 @@ route:
description: sync table to one destination table with given prefix ods_
```

## Route multiple Data Source tables to one Data Sink table
What's more, if you want to synchronize the sharding tables in the database `mydb` to a Doris table `ods_web_order`, we can use this yaml file to define this route
## 路由多个 Data Source 表到一个 Data Sink
更进一步的,如果同步一个 `mydb` 数据库中的多个分表到一个相同库的 `ods_web_order` 表,我们可以使用下面的 yaml 文件来定义这个路由
```yaml
route:
- source-table: mydb\.*
sink-table: mydb.ods_web_order
description: sync sharding tables to one destination table
```

## Complex Route via combining route rules
What's more, if you want to specify many different mapping rules, we can use this yaml file to define this route:
## 使用多个路由规则
更进一步的,如果需要定义多个路由规则,我们可以使用下面的 yaml 文件来定义这个路由:
```yaml
route:
- source-table: mydb.orders
Expand All @@ -74,9 +73,9 @@ route:
description: sync products table to ods_products
```

## Pattern Replacement in routing rules
## 包含符号替换的路由规则

If you'd like to route source tables and rename them to sink tables with specific patterns, `replace-symbol` could be used to resemble source table names like this:
如果你想将源表路由到 sink 表,并使用特定的模式替换源表名,那么 `replace-symbol` 就可以做到这一点:

```yaml
route:
Expand All @@ -86,4 +85,4 @@ route:
description: route all tables in source_db to sink_db
```

Then, all tables including `source_db.XXX` will be routed to `sink_db.XXX` without hassle.
然后,`source_db` 库下所有的表都会被同步到 `sink_db` 库下。
22 changes: 11 additions & 11 deletions docs/content.zh/docs/core-concept/table-id.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,17 @@ specific language governing permissions and limitations
under the License.
-->

# Definition
When connecting to external systems, it is necessary to establish a mapping relationship with the storage objects of the external system. This is what **Table Id** refers to.
# 定义
在连接外部系统时,有必要建立一个与外部系统存储对象(例如表)的映射关系。这就是 **Table Id** 所代表的含义。

# Example
To be compatible with most external systems, the Table Id is represented by a 3-tuple : (namespace, schemaName, tableName).
Connectors should establish the mapping between Table Id and storage objects in external systems.
# 示例
为了兼容大部分外部系统,Table Id 被表示为 3 元组:(namespace, schemaName, tableName)
连接器应该在连接外部系统时建立与外部系统存储对象的映射关系。

The following table lists the parts in table Id of different data systems:
下面是不同数据系统对应的 tableId 的格式:

| data system | parts in tableId | String example |
|-----------------------|--------------------------|---------------------|
| Oracle/PostgreSQL | database, schema, table | mydb.default.orders |
| MySQL/Doris/StarRocks | database, table | mydb.orders |
| Kafka | topic | orders |
| 数据系统 | tableId 的组成 | 字符串示例 |
|-----------------------|-------------------------|---------------------|
| Oracle/PostgreSQL | database, schema, table | mydb.default.orders |
| MySQL/Doris/StarRocks | database, table | mydb.orders |
| Kafka | topic | orders |
Loading