Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
220 changes: 179 additions & 41 deletions docs/content.zh/docs/deployment/kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,53 +24,50 @@ specific language governing permissions and limitations
under the License.
-->

# Introduction
# 简介

Kubernetes is a popular container-orchestration system for automating computer application deployment, scaling, and management.
Flink's native Kubernetes integration allows you to directly deploy Flink on a running Kubernetes cluster.
Moreover, Flink is able to dynamically allocate and de-allocate TaskManagers depending on the required resources because it can directly talk to Kubernetes.
Kubernetes(K8s)是一种流行的容器编排系统,用于自动化部署、扩展和管理应用程序。Flink的原生Kubernetes集成允许您直接在正在运行的 Kubernetes 集群上部署 Flink。此外,Flink 能够根据所需资源动态分配和取消分配TaskManager,因为它可以直接与Kubernetes通信。

Apache Flink also provides a Kubernetes operator for managing Flink clusters on Kubernetes. It supports both standalone and native deployment mode and greatly simplifies deployment, configuration and the life cycle management of Flink resources on Kubernetes.
Apache Flink还提供了Kubernetes Operator,用于管理Kubernetes上的Flink集群。它支持独立部署和原生部署模式,极大简化了Flink在Kubernetes上的部署、配置和生命周期管理。

For more information, please refer to the [Flink Kubernetes Operator documentation](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/).
更多信息请参考:[Flink Kubernetes Operator文档](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/)

## Preparation
## 准备

The doc assumes a running Kubernetes cluster fulfilling the following requirements:
假设您正在运行的Kubernetes集群满足以下要求:

- Kubernetes >= 1.9.
- KubeConfig, which has access to list, create, delete pods and services, configurable via `~/.kube/config`. You can verify permissions by running `kubectl auth can-i <list|create|edit|delete> pods`.
- Enabled Kubernetes DNS.
- `default` service account with [RBAC](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#rbac) permissions to create, delete pods.
- Kubernetes版本 >= 1.9
- KubeConfig,作为列出、创建、删除pods和services权限的入口,可通过`~/.kube/config`进行配置。 您可以通过运行命令:`kubectl auth can-i <list|create|edit|delete> pods` 来验证权限。
- 已启用 Kubernetes DNS
- `default`用户具有创建、删除POD的 [RBAC](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#rbac) 权限。

If you have problems setting up a Kubernetes cluster, please take a look at [how to setup a Kubernetes cluster](https://kubernetes.io/docs/setup/).
如果您在配置Kubernetes集群时遇到问题,请参考:[如何配置Kubernetes集群](https://kubernetes.io/docs/setup/)

## Session Mode
## Session模式

Flink runs on all UNIX-like environments, i.e. Linux, Mac OS X, and Cygwin (for Windows).
You can refer [overview]({{< ref "docs/connectors/pipeline-connectors/overview" >}}) to check supported versions and download [the binary release](https://flink.apache.org/downloads/) of Flink,
then extract the archive:
Flink可以在所有类UNIX环境上运行,即Linux、Mac OS X和Cygwin(适用于 Windows)。
您可以参考 [overview]({{< ref "docs/connectors/pipeline-connectors/overview" >}})页面,查看支持的Flink版本并下载[发行包](https://flink.apache.org/downloads/),然后解压:

```bash
tar -xzf flink-*.tgz
```

You should set `FLINK_HOME` environment variables like:
设置`FLINK_HOME`环境变量:

```bash
export FLINK_HOME=/path/flink-*
```

### Start a session cluster
### 启动Session集群

To start a session cluster on k8s, run the bash script that comes with Flink:
要在k8s上启动Session集群,请运行 Flink 附带的 bash 脚本:

```bash
cd /path/flink-*
./bin/kubernetes-session.sh -Dkubernetes.cluster-id=my-first-flink-cluster
```

After successful startup, the return information is as follows
成功启动集群后,返回如下信息

```
org.apache.flink.kubernetes.utils.KubernetesUtils [] - Kubernetes deployment requires a fixed port. Configuration blob.server.port will be set to 6124
Expand All @@ -79,33 +76,33 @@ org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Please note th
org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create flink session cluster my-first-flink-cluster successfully, JobManager Web Interface: http://my-first-flink-cluster-rest.default:8081
```

{{< hint info >}}
please refer to [Flink documentation](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#accessing-flinks-web-ui) to expose Flink’s Web UI and REST endpoint.
You should ensure that REST endpoint can be accessed by the node of your submission.
{{< /hint >}}
Then, you need to add these two config to your flink-conf.yaml:
{{< hint info >}}
请参考[Flink文档](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#accessing-flinks-web-ui)来暴露Flink Web UI和REST端口。
请确保您提交作业的节点可以访问REST端口。
{{< /hint >}}
然后,将以下两个配置添加到flink-conf.yaml中:

```yaml
rest.bind-port: {{REST_PORT}}
rest.address: {{NODE_IP}}
```

{{REST_PORT}} and {{NODE_IP}} should be replaced by the actual values of your JobManager Web Interface.
{{REST_PORT}}{{NODE_IP}}替换为JobManager Web界面的实际值。

### Set up Flink CDC
Download the tar file of Flink CDC from [release page](https://github.com/apache/flink-cdc/releases), then extract the archive:
### 配置Flink CDC
从[发行页面](https://github.com/apache/flink-cdc/releases)下载Flink CDC的tar文件并解压:

```bash
tar -xzf flink-cdc-*.tar.gz
```

Extracted `flink-cdc` contains four directories: `bin`,`lib`,`log` and `conf`.
解压后的`flink-cdc`包含四个目录: `bin``lib``log``conf`

Download the connector jars from [release page](https://github.com/apache/flink-cdc/releases), and move it to the `lib` directory.
Download links are available only for stable releases, SNAPSHOT dependencies need to be built based on specific branch by yourself.
从[发行页面](https://github.com/apache/flink-cdc/releases)下载连接器,并移动到`lib`路径下。
下载链接仅适用于稳定版本,SNAPSHOT依赖项需要您根据特定分支自行构建。

### Submit a Flink CDC Job
Here is an example file for synchronizing the entire database `mysql-to-doris.yaml`
### 提交Flink CDC作业
以下是mysql整库同步到doris的示例配置文件:`mysql-to-doris.yaml`

```yaml
################################################################################
Expand Down Expand Up @@ -133,27 +130,168 @@ pipeline:

```

You need to modify the configuration file according to your needs, refer to connectors more information.
- [MySQL pipeline connector]({{< ref "docs/connectors/pipeline-connectors/mysql.md" >}})
- [Apache Doris pipeline connector]({{< ref "docs/connectors/pipeline-connectors/doris.md" >}})
请参考连接器信息,按需修改配置文件。
- [MySQL Pipeline连接器]({{< ref "docs/connectors/pipeline-connectors/mysql.md" >}})
- [Apache Doris Pipeline连接器]({{< ref "docs/connectors/pipeline-connectors/doris.md" >}})

Finally, submit job to Flink Standalone cluster using Cli.
最后,通过Cli将作业提交到Flink Standalone集群。

```bash
cd /path/flink-cdc-*
./bin/flink-cdc.sh mysql-to-doris.yaml
```

After successful submission, the return information is as follows
成功提交作业后,返回如下信息

```bash
Pipeline has been submitted to cluster.
Job ID: ae30f4580f1918bebf16752d4963dc54
Job Description: Sync MySQL Database to Doris
```

Then you can find a job named `Sync MySQL Database to Doris` running through Flink Web UI.
通过Flink Web UI,您可以找到一个名为`Sync MySQL Database to Doris`的作业正在运行。

## Kubernetes Operator模式
假设您已经在K8S集群上部署[Flink Kubernetes Operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/),您只需构建自定义的Flink CDC Docker镜像即可。

### 构建自定义Docker镜像
1. 从[发行页面](https://github.com/apache/flink-cdc/releases)下载Flink CDC的tar文件和需要的连接器,并移动到Docker镜像构建目录。
假设您的Docker构建目录为`/opt/docker/flink-cdc`,此时该目录下的文件结构如下:
```text
/opt/docker/flink-cdc
├── flink-cdc-3.1.0-bin.tar.gz
├── flink-cdc-pipeline-connector-doris-3.1.0.jar
├── flink-cdc-pipeline-connector-mysql-3.1.0.jar
├── mysql-connector-java-8.0.27.jar
└── ...
```
2. 创建Dockerfile文件,从官方`flink`镜像构建出自定义镜像并添加Flink CDC的依赖。
```shell script
FROM flink:1.18.0-java8
ADD *.jar $FLINK_HOME/lib/
ADD flink-cdc*.tar.gz $FLINK_HOME/
RUN mv $FLINK_HOME/flink-cdc-3.1.0/lib/flink-cdc-dist-3.1.0.jar $FLINK_HOME/lib/
```
Docker镜像构建目录最终如下:
```text
/opt/docker/flink-cdc
├── Dockerfile
├── flink-cdc-3.1.0-bin.tar.gz
├── flink-cdc-pipeline-connector-doris-3.1.0.jar
├── flink-cdc-pipeline-connector-mysql-3.1.0.jar
├── mysql-connector-java-8.0.27.jar
└── ...
```
3. 构建自定义镜像并推送至仓库
```bash
docker build -t flink-cdc-pipeline:3.1.0 .

docker push flink-cdc-pipeline:3.1.0
```

### 创建ConfigMap用于挂载Flink CDC配置文件
以下是一个示例文件,请修改其中对应的连接参数为实际值:
```yaml
---
apiVersion: v1
data:
flink-cdc.yaml: |-
parallelism: 4
schema.change.behavior: EVOLVE
mysql-to-doris.yaml: |-
source:
type: mysql
hostname: localhost
port: 3306
username: root
password: 123456
tables: app_db.\.*
server-id: 5400-5404
server-time-zone: UTC

sink:
type: doris
fenodes: 127.0.0.1:8030
username: root
password: ""

pipeline:
name: Sync MySQL Database to Doris
parallelism: 2
kind: ConfigMap
metadata:
name: flink-cdc-pipeline-configmap
```

### 创建FlinkDeployment YAML文件
以下是示例文件`flink-cdc-pipeline-job.yaml`:
```yaml
---
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: flink-cdc-pipeline-job
spec:
flinkConfiguration:
classloader.resolve-order: parent-first
state.checkpoints.dir: 'file:///tmp/checkpoints'
state.savepoints.dir: 'file:///tmp/savepoints'
flinkVersion: v1_18
image: 'flink-cdc-pipeline:3.1.0'
imagePullPolicy: Always
job:
args:
- '--use-mini-cluster'
- /opt/flink/flink-cdc-3.1.0/conf/mysql-to-doris.yaml
entryClass: org.apache.flink.cdc.cli.CliFrontend
jarURI: 'local:///opt/flink/flink-cdc-3.1.0/lib/flink-cdc-dist-3.1.0.jar'
parallelism: 1
state: running
upgradeMode: savepoint
jobManager:
replicas: 1
resource:
cpu: 1
memory: 1024m
podTemplate:
apiVersion: v1
kind: Pod
spec:
containers:
# don't modify this name
- name: flink-main-container
volumeMounts:
- mountPath: /opt/flink/flink-cdc-3.1.0/conf
name: flink-cdc-pipeline-config
volumes:
- configMap:
name: flink-cdc-pipeline-configmap
name: flink-cdc-pipeline-config
restartNonce: 0
serviceAccount: flink
taskManager:
resource:
cpu: 1
memory: 1024m
```
{{< hint info >}}
Please note that submitting with **native application mode** and **Flink Kubernetes operator** are not supported for now.
1. 由于Flink的类加载机制,参数`classloader.resolve-order`必须设置为`parent-first`。
2. Flink CDC默认提交作业到远程Flink集群,在Operator模式下,您需要通过指定`--use-mini-cluster`参数在pod内部启动一个Standalone Flink集群。
{{< /hint >}}

### 提交Flink CDC作业
ConfigMap和FlinkDeployment YAML文件创建完成后,即可通过kubectl提交作业到Operator:
```bash
kubectl apply -f flink-cdc-pipeline-job.yaml
```

成功提交作业后,返回信息如下:
```shell
flinkdeployment.flink.apache.org/flink-cdc-pipeline-job created
```
如您需要查看日志、暴露Flink Web UI等,请参考:[Flink Kubernetes Operator文档](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/)。


{{< hint info >}}
请注意,目前不支持使用**native application mode**提交作业。
{{< /hint >}}
Loading