Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/content.zh/docs/concepts/flink-architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ _JobManager_ 具有许多与协调 Flink 应用程序的分布式执行有关的

下图中样例数据流用 5 个 subtask 执行,因此有 5 个并行线程。

<img src="{% link /fig/tasks_chains.svg %}" alt="Operator chaining into Tasks" class="offset" width="80%" />
{{< img src="/fig/tasks_chains.svg" alt="Operator chaining into Tasks" class="offset" width="80%" >}}

{{< top >}}

Expand All @@ -83,15 +83,15 @@ _JobManager_ 具有许多与协调 Flink 应用程序的分布式执行有关的

通过调整 task slot 的数量,用户可以定义 subtask 如何互相隔离。每个 TaskManager 有一个 slot,这意味着每个 task 组都在单独的 JVM 中运行(例如,可以在单独的容器中启动)。具有多个 slot 意味着更多 subtask 共享同一 JVM。同一 JVM 中的 task 共享 TCP 连接(通过多路复用)和心跳信息。它们还可以共享数据集和数据结构,从而减少了每个 task 的开销。

<img src="{% link /fig/tasks_slots.svg %}" alt="A TaskManager with Task Slots and Tasks" class="offset" width="80%" />
{{< img src="/fig/tasks_slots.svg" alt="A TaskManager with Task Slots and Tasks" class="offset" width="80%" >}}

默认情况下,Flink 允许 subtask 共享 slot,即便它们是不同的 task 的 subtask,只要是来自于同一作业即可。结果就是一个 slot 可以持有整个作业管道。允许 *slot 共享*有两个主要优点:

- Flink 集群所需的 task slot 和作业中使用的最大并行度恰好一样。无需计算程序总共包含多少个 task(具有不同并行度)。

- 容易获得更好的资源利用。如果没有 slot 共享,非密集 subtask(*source/map()*)将阻塞和密集型 subtask(*window*) 一样多的资源。通过 slot 共享,我们示例中的基本并行度从 2 增加到 6,可以充分利用分配的资源,同时确保繁重的 subtask 在 TaskManager 之间公平分配。

<img src="{% link /fig/slot_sharing.svg %}" alt="TaskManagers with shared Task Slots" class="offset" width="80%" />
{{< img src="/fig/slot_sharing.svg" alt="TaskManagers with shared Task Slots" class="offset" width="80%" >}}

## Flink 应用程序执行

Expand Down
16 changes: 4 additions & 12 deletions docs/content.zh/docs/dev/table/concepts/dynamic_tables.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,7 @@ DataStream 上的关系查询

下图显示了流、动态表和连续查询之间的关系:

<center>
<img alt="Dynamic tables" src="{% link /fig/table-streaming/stream-query-stream.png %}" width="80%">
</center>
{{< img alt="Dynamic tables" src="/fig/table-streaming/stream-query-stream.png" width="80%">}}

1. 将流转换为动态表。
2. 在动态表上计算一个连续查询,生成一个新的动态表。
Expand All @@ -102,9 +100,7 @@ DataStream 上的关系查询

下图显示了单击事件流(左侧)如何转换为表(右侧)。当插入更多的单击流记录时,结果表将不断增长。

<center>
<img alt="Append mode" src="{% link /fig/table-streaming/append-mode.png %}" width="60%">
</center>
{{< img alt="Append mode" src="/fig/table-streaming/append-mode.png" width="60%">}}

**注意:** 在流上定义的表在内部没有物化。

Expand All @@ -117,17 +113,13 @@ DataStream 上的关系查询

第一个查询是一个简单的 `GROUP-BY COUNT` 聚合查询。它基于 `user` 字段对 `clicks` 表进行分组,并统计访问的 URL 的数量。下面的图显示了当 `clicks` 表被附加的行更新时,查询是如何被评估的。

<center>
<img alt="Continuous Non-Windowed Query" src="/fig/table-streaming/query-groupBy-cnt.png" width="90%">
</center>
{{< img alt="Continuous Non-Windowed Query" src="/fig/table-streaming/query-groupBy-cnt.png" width="90%">}}

当查询开始,`clicks` 表(左侧)是空的。当第一行数据被插入到 `clicks` 表时,查询开始计算结果表。第一行数据 `[Mary,./home]` 插入后,结果表(右侧,上部)由一行 `[Mary, 1]` 组成。当第二行 `[Bob, ./cart]` 插入到 `clicks` 表时,查询会更新结果表并插入了一行新数据 `[Bob, 1]`。第三行 `[Mary, ./prod?id=1]` 将产生已计算的结果行的更新,`[Mary, 1]` 更新成 `[Mary, 2]`。最后,当第四行数据加入 `clicks` 表时,查询将第三行 `[Liz, 1]` 插入到结果表中。

第二条查询与第一条类似,但是除了用户属性之外,还将 `clicks` 分组至[每小时滚动窗口]({{< ref "docs/dev/table/sql/overview" >}}#group-windows)中,然后计算 url 数量(基于时间的计算,例如基于特定[时间属性](time_attributes.html)的窗口,后面会讨论)。同样,该图显示了不同时间点的输入和输出,以可视化动态表的变化特性。

<center>
<img alt="Continuous Group-Window Query" src="/fig/table-streaming/query-groupBy-window-cnt.png" width="100%">
</center>
{{< img alt="Continuous Group-Window Query" src="/fig/table-streaming/query-groupBy-window-cnt.png" width="100%">}}

与前面一样,左边显示了输入表 `clicks`。查询每小时持续计算结果并更新结果表。clicks表包含四行带有时间戳(`cTime`)的数据,时间戳在 `12:00:00` 和 `12:59:59` 之间。查询从这个输入计算出两个结果行(每个 `user` 一个),并将它们附加到结果表中。对于 `13:00:00` 和 `13:59:59` 之间的下一个窗口,`clicks` 表包含三行,这将导致另外两行被追加到结果表。随着时间的推移,更多的行被添加到 `click` 中,结果表将被更新。

Expand Down
2 changes: 1 addition & 1 deletion docs/content.zh/docs/learn-flink/datastream_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ DataStream API 将你的应用构建为一个 job graph,并附加到 `StreamEx

注意,如果没有调用 execute(),应用就不会运行。

<img src="{% link /fig/distributed-runtime.svg %}" alt="Flink runtime: client, job manager, task managers" class="offset" width="80%" />
{{< img src="/fig/distributed-runtime.svg" alt="Flink runtime: client, job manager, task managers" class="offset" width="80%" >}}

此分布式运行时取决于你的应用是否是可序列化的。它还要求所有依赖对集群中的每个节点均可用。

Expand Down
2 changes: 1 addition & 1 deletion docs/content.zh/docs/libs/gelly/bipartite_graph.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ Graph Transformations
* <strong>Projection</strong>: Projection is a common operation for bipartite graphs that converts a bipartite graph into a regular graph. There are two types of projections: top and bottom projections. Top projection preserves only top nodes in the result graph and creates a link between them in a new graph only if there is an intermediate bottom node both top nodes connect to in the original graph. Bottom projection is the opposite to top projection, i.e. only preserves bottom nodes and connects a pair of nodes if they are connected in the original graph.

<p class="text-center">
<img alt="Bipartite Graph Projections" width="80%" src="{% link /fig/bipartite_graph_projections.png %}"/>
<img alt="Bipartite Graph Projections" width="80%" src="/fig/bipartite_graph_projections.png"/>
</p>

Gelly supports two sub-types of projections: simple projections and full projections. The only difference between them is what data is associated with edges in the result graph.
Expand Down
10 changes: 5 additions & 5 deletions docs/content.zh/docs/libs/gelly/graph_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -457,7 +457,7 @@ graph.subgraph((vertex => vertex.getValue > 0), (edge => edge.getValue < 0))
{{< /tabs >}}

<p class="text-center">
<img alt="Filter Transformations" width="80%" src="{% link /fig/gelly-filter.png %}"/>
<img alt="Filter Transformations" width="80%" src="/fig/gelly-filter.png"/>
</p>

* <strong>Join</strong>: Gelly provides specialized methods for joining the vertex and edge datasets with other input datasets. `joinWithVertices` joins the vertices with a `Tuple2` input data set. The join is performed using the vertex ID and the first field of the `Tuple2` input as the join keys. The method returns a new `Graph` where the vertex values have been updated according to a provided user-defined transformation function.
Expand Down Expand Up @@ -499,7 +499,7 @@ val networkWithWeights = network.joinWithEdgesOnSource(vertexOutDegrees, (v1: Do
* <strong>Union</strong>: Gelly's `union()` method performs a union operation on the vertex and edge sets of the specified graph and the current graph. Duplicate vertices are removed from the resulting `Graph`, while if duplicate edges exist, these will be preserved.

<p class="text-center">
<img alt="Union Transformation" width="50%" src="{% link /fig/gelly-union.png %}"/>
<img alt="Union Transformation" width="50%" src="/fig/gelly-union.png"/>
</p>

* <strong>Difference</strong>: Gelly's `difference()` method performs a difference on the vertex and edge sets of the current graph and the specified graph.
Expand Down Expand Up @@ -630,7 +630,7 @@ The neighborhood scope is defined by the `EdgeDirection` parameter, which takes
For example, assume that you want to select the minimum weight of all out-edges for each vertex in the following graph:

<p class="text-center">
<img alt="reduceOnEdges Example" width="50%" src="{% link /fig/gelly-example-graph.png %}"/>
<img alt="reduceOnEdges Example" width="50%" src="/fig/gelly-example-graph.png"/>
</p>

The following code will collect the out-edges for each vertex and apply the `SelectMinWeight()` user-defined function on each of the resulting neighborhoods:
Expand Down Expand Up @@ -669,7 +669,7 @@ final class SelectMinWeight extends ReduceEdgesFunction[Double] {
{{< /tabs >}}

<p class="text-center">
<img alt="reduceOnEdges Example" width="50%" src="{% link /fig/gelly-reduceOnEdges.png %}"/>
<img alt="reduceOnEdges Example" width="50%" src="/fig/gelly-reduceOnEdges.png"/>
</p>

Similarly, assume that you would like to compute the sum of the values of all in-coming neighbors, for every vertex. The following code will collect the in-coming neighbors for each vertex and apply the `SumValues()` user-defined function on each neighborhood:
Expand Down Expand Up @@ -708,7 +708,7 @@ final class SumValues extends ReduceNeighborsFunction[Long] {
{{< /tabs >}}

<p class="text-center">
<img alt="reduceOnNeighbors Example" width="70%" src="{% link /fig/gelly-reduceOnNeighbors.png %}"/>
<img alt="reduceOnNeighbors Example" width="70%" src="/fig/gelly-reduceOnNeighbors.png"/>
</p>

When the aggregation function is not associative and commutative or when it is desirable to return more than one values per vertex, one can use the more general
Expand Down
4 changes: 1 addition & 3 deletions docs/content.zh/docs/ops/monitoring/checkpoint_monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,9 +78,7 @@ Checkpoint 历史记录保存有关最近触发的 checkpoint 的统计信息,

对于 subtask,有两个更详细的统计信息可用。

<center>
<img src="{% link /fig/checkpoint_monitoring-history-subtasks.png %}" width="700px" alt="Checkpoint Monitoring: History">
</center>
{{< img src="/fig/checkpoint_monitoring-history-subtasks.png" width="700px" alt="Checkpoint Monitoring: History">}}

- **Sync Duration**:Checkpoint 同步部分的持续时间。这包括 operator 的快照状态,并阻塞 subtask 上的所有其他活动(处理记录、触发计时器等)。
- **Async Duration**:Checkpoint 的异步部分的持续时间。这包括将 checkpoint 写入设置的文件系统所需的时间。对于 unaligned checkpoint,这还包括 subtask 必须等待最后一个 checkpoint barrier 到达的时间(checkpoint alignment 持续时间)以及持久化数据所需的时间。
Expand Down
2 changes: 1 addition & 1 deletion docs/content.zh/docs/try-flink/table_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -304,7 +304,7 @@ $ docker-compose up -d

You can see information on the running job via the [Flink console](http://localhost:8082/).

![Flink Console]({% link /fig/spend-report-console.png %}){:height="400px" width="800px"}
{{< img src="/fig/spend-report-console.png" height="400px" width="800px" alt="Flink Console">}}

Explore the results from inside MySQL.

Expand Down
5 changes: 2 additions & 3 deletions docs/content/docs/dev/table/concepts/dynamic_tables.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,9 +173,8 @@ When converting a dynamic table into a stream or writing it to an external syste

* **Upsert stream:** An upsert stream is a stream with two types of messages, *upsert messages* and *delete messages*. A dynamic table that is converted into an upsert stream requires a (possibly composite) unique key. A dynamic table with a unique key is transformed into a stream by encoding `INSERT` and `UPDATE` changes as upsert messages and `DELETE` changes as delete messages. The stream consuming operator needs to be aware of the unique key attribute to apply messages correctly. The main difference to a retract stream is that `UPDATE` changes are encoded with a single message and hence more efficient. The following figure visualizes the conversion of a dynamic table into an upsert stream.

<center>
<img alt="Dynamic tables" src="{% link /fig/table-streaming/redo-mode.png %}" width="85%">
</center>
{{< img alt="Dynamic tables" src="/fig/table-streaming/redo-mode.png" width="85%">}}

<br><br>

The API to convert a dynamic table into a `DataStream` is discussed on the [Common Concepts]({{< ref "docs/dev/table/common" >}}#convert-a-table-into-a-datastream) page. Please note that only append and retract streams are supported when converting a dynamic table into a `DataStream`. The `TableSink` interface to emit a dynamic table to an external system are discussed on the [TableSources and TableSinks](../sourceSinks.html#define-a-tablesink) page.
Expand Down
2 changes: 1 addition & 1 deletion docs/content/docs/libs/gelly/bipartite_graph.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ Graph Transformations
* <strong>Projection</strong>: Projection is a common operation for bipartite graphs that converts a bipartite graph into a regular graph. There are two types of projections: top and bottom projections. Top projection preserves only top nodes in the result graph and creates a link between them in a new graph only if there is an intermediate bottom node both top nodes connect to in the original graph. Bottom projection is the opposite to top projection, i.e. only preserves bottom nodes and connects a pair of nodes if they are connected in the original graph.

<p class="text-center">
<img alt="Bipartite Graph Projections" width="80%" src="{% link /fig/bipartite_graph_projections.png %}"/>
<img alt="Bipartite Graph Projections" width="80%" src="/fig/bipartite_graph_projections.png"/>
</p>

Gelly supports two sub-types of projections: simple projections and full projections. The only difference between them is what data is associated with edges in the result graph.
Expand Down
10 changes: 5 additions & 5 deletions docs/content/docs/libs/gelly/graph_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -457,7 +457,7 @@ graph.subgraph((vertex => vertex.getValue > 0), (edge => edge.getValue < 0))
{{< /tabs >}}

<p class="text-center">
<img alt="Filter Transformations" width="80%" src="{% link /fig/gelly-filter.png %}"/>
<img alt="Filter Transformations" width="80%" src="/fig/gelly-filter.png"/>
</p>

* <strong>Join</strong>: Gelly provides specialized methods for joining the vertex and edge datasets with other input datasets. `joinWithVertices` joins the vertices with a `Tuple2` input data set. The join is performed using the vertex ID and the first field of the `Tuple2` input as the join keys. The method returns a new `Graph` where the vertex values have been updated according to a provided user-defined transformation function.
Expand Down Expand Up @@ -499,7 +499,7 @@ val networkWithWeights = network.joinWithEdgesOnSource(vertexOutDegrees, (v1: Do
* <strong>Union</strong>: Gelly's `union()` method performs a union operation on the vertex and edge sets of the specified graph and the current graph. Duplicate vertices are removed from the resulting `Graph`, while if duplicate edges exist, these will be preserved.

<p class="text-center">
<img alt="Union Transformation" width="50%" src="{% link /fig/gelly-union.png %}"/>
<img alt="Union Transformation" width="50%" src="/fig/gelly-union.png"/>
</p>

* <strong>Difference</strong>: Gelly's `difference()` method performs a difference on the vertex and edge sets of the current graph and the specified graph.
Expand Down Expand Up @@ -630,7 +630,7 @@ The neighborhood scope is defined by the `EdgeDirection` parameter, which takes
For example, assume that you want to select the minimum weight of all out-edges for each vertex in the following graph:

<p class="text-center">
<img alt="reduceOnEdges Example" width="50%" src="{% link /fig/gelly-example-graph.png %}"/>
<img alt="reduceOnEdges Example" width="50%" src="/fig/gelly-example-graph.png"/>
</p>

The following code will collect the out-edges for each vertex and apply the `SelectMinWeight()` user-defined function on each of the resulting neighborhoods:
Expand Down Expand Up @@ -669,7 +669,7 @@ final class SelectMinWeight extends ReduceEdgesFunction[Double] {
{{< /tabs >}}

<p class="text-center">
<img alt="reduceOnEdges Example" width="50%" src="{% link /fig/gelly-reduceOnEdges.png %}"/>
<img alt="reduceOnEdges Example" width="50%" src="/fig/gelly-reduceOnEdges.png"/>
</p>

Similarly, assume that you would like to compute the sum of the values of all in-coming neighbors, for every vertex. The following code will collect the in-coming neighbors for each vertex and apply the `SumValues()` user-defined function on each neighborhood:
Expand Down Expand Up @@ -708,7 +708,7 @@ final class SumValues extends ReduceNeighborsFunction[Long] {
{{< /tabs >}}

<p class="text-center">
<img alt="reduceOnNeighbors Example" width="70%" src="{% link /fig/gelly-reduceOnNeighbors.png %}"/>
<img alt="reduceOnNeighbors Example" width="70%" src="/fig/gelly-reduceOnNeighbors.png"/>
</p>

When the aggregation function is not associative and commutative or when it is desirable to return more than one values per vertex, one can use the more general
Expand Down
2 changes: 1 addition & 1 deletion docs/content/docs/try-flink/table_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -304,7 +304,7 @@ $ docker-compose up -d

You can see information on the running job via the [Flink console](http://localhost:8082/).

![Flink Console]({% link /fig/spend-report-console.png %}){:height="400px" width="800px"}
{{< img src="/fig/spend-report-console.png" height="400px" width="800px" alt="Flink Console">}}

Explore the results from inside MySQL.

Expand Down