Skip to content

Commit 078dc82

Browse files
authored
[FLINK-38563][python][docs] Update documentation of Python AsyncFunction (#27195)
1 parent 4989549 commit 078dc82

File tree

9 files changed

+352
-12
lines changed

9 files changed

+352
-12
lines changed

docs/content.zh/docs/dev/datastream/operators/asyncio.md

Lines changed: 80 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -61,11 +61,14 @@ Flink 的异步 I/O API 允许用户在流处理中使用异步请求客户端
6161
在具备异步数据库客户端的基础上,实现数据流转换操作与数据库的异步 I/O 交互需要以下三部分:
6262

6363
- 实现分发请求的 `AsyncFunction`
64-
- 获取数据库交互的结果并发送给 `ResultFuture`*回调* 函数
64+
- 如果使用的 Java API,获取数据库交互的结果并发送给 `ResultFuture`*回调* 函数;如果使用的 Python API,可以通过 await 获取数据库交互的结果
6565
- 将异步 I/O 操作应用于 `DataStream` 作为 `DataStream` 的一次转换操作, 启用或者不启用重试。
6666

6767
下面是基本的代码模板:
6868

69+
{{< tabs "6c8c009c-4c12-4338-9eeb-3be83cfa9e36" >}}
70+
{{< tab "Java" >}}
71+
6972
```java
7073
// 这个例子使用 Java 8 的 Future 接口(与 Flink 的 Future 相同)实现了异步请求和回调。
7174

@@ -132,7 +135,74 @@ DataStream<Tuple2<String, String>> resultStream =
132135
AsyncDataStream.unorderedWaitWithRetry(stream, new AsyncDatabaseRequest(), 1000, TimeUnit.MILLISECONDS, 100, asyncRetryStrategy);
133136
```
134137

135-
**重要提示**: 第一次调用 `ResultFuture.complete``ResultFuture` 就完成了。
138+
{{< /tab >}}
139+
{{< tab "Python" >}}
140+
141+
```python
142+
from typing import List
143+
144+
from pyflink.common import Time, Types
145+
from pyflink.datastream import AsyncFunction, AsyncDataStream, async_retry_predicates
146+
from pyflink.datastream.functions import RuntimeContext, AsyncRetryStrategy
147+
148+
149+
class AsyncDatabaseRequest(AsyncFunction[str, (str, str)]):
150+
151+
def __init__(self, host, port, credentials):
152+
self._host = host
153+
self._port = port
154+
self._credentials = credentials
155+
156+
def open(self, runtime_context: RuntimeContext):
157+
# The database specific client that can issue concurrent requests with callbacks
158+
self._client = DatabaseClient(self._host, self._port, self._credentials)
159+
160+
def close(self):
161+
if self._client:
162+
self._client.close()
163+
164+
async def async_invoke(self, value: str) -> List[(str, str)]:
165+
try:
166+
# issue the asynchronous request
167+
result = await self._client.query(value)
168+
return [(value, str(result))]
169+
except Exception:
170+
return [(value, None)]
171+
172+
173+
# 创建初始 DataStream
174+
stream = ...
175+
176+
# 应用异步 I/O 转换操作,不启用重试
177+
result_stream = AsyncDataStream.unordered_wait(
178+
data_stream=stream,
179+
async_function=AsyncDatabaseRequest("127.0.0.1", "1234", None),
180+
timeout=Time.seconds(10),
181+
capacity=100,
182+
output_type=Types.TUPLE([Types.STRING(), Types.STRING()]))
183+
184+
# 或应用异步 I/O 转换操作并启用重试
185+
# 通过工具类创建一个异步重试策略, 或用户实现自定义的策略
186+
async_retry_strategy = AsyncRetryStrategy.fixed_delay(
187+
max_attempts=3,
188+
backoff_time_millis=100,
189+
result_predicate=async_retry_predicates.empty_result_predicate,
190+
exception_predicate=async_retry_predicates.has_exception_predicate)
191+
192+
# 应用异步 I/O 转换操作并启用重试
193+
result_stream_with_retry = AsyncDataStream.unordered_wait_with_retry(
194+
data_stream=stream,
195+
async_function=AsyncDatabaseRequest("127.0.0.1", "1234", None),
196+
timeout=Time.seconds(10),
197+
async_retry_strategy=async_retry_strategy,
198+
capacity=1000,
199+
output_type=Types.TUPLE([Types.STRING(), Types.STRING()]))
200+
```
201+
202+
{{< /tab >}}
203+
{{< /tabs >}}
204+
205+
**重要提示**: 在 Java API 中,第一次调用 `ResultFuture.complete``ResultFuture` 就完成了。
136206
后续的 `complete` 调用都将被忽略。
137207

138208
下面两个参数控制异步操作:
@@ -149,9 +219,13 @@ DataStream<Tuple2<String, String>> resultStream =
149219

150220
当异步 I/O 请求超时的时候,默认会抛出异常并重启作业。
151221
如果你想处理超时,可以重写 `AsyncFunction#timeout` 方法。
152-
重写 `AsyncFunction#timeout` 时别忘了调用 `ResultFuture.complete()` 或者 `ResultFuture.completeExceptionally()`
222+
223+
在 Java API 中,重写 `AsyncFunction#timeout` 时别忘了调用 `ResultFuture.complete()` 或者 `ResultFuture.completeExceptionally()`
153224
以便告诉Flink这条记录的处理已经完成。如果超时发生时你不想发出任何记录,你可以调用 `ResultFuture.complete(Collections.emptyList())`
154225

226+
在 Python API 中,可以返回一个 List 或者抛异常以便告诉Flink这条记录的处理已经完成。如果超时发生时你不想发出任何记录,
227+
你可以调用 `return []` 以返回一个空列表。
228+
155229
### 结果的顺序
156230

157231
`AsyncFunction` 发出的并发请求经常以不确定的顺序完成,这取决于请求得到响应的顺序。
@@ -160,9 +234,9 @@ Flink 提供两种模式控制结果记录以何种顺序发出。
160234
- **无序模式**: 异步请求一结束就立刻发出结果记录。
161235
流中记录的顺序在经过异步 I/O 算子之后发生了改变。
162236
当使用 *处理时间* 作为基本时间特征时,这个模式具有最低的延迟和最少的开销。
163-
此模式使用 `AsyncDataStream.unorderedWait(...)` 方法。
237+
此模式使用 `AsyncDataStream.unorderedWait(...)` 或者 `AsyncDataStream.unordered_wait(...)` 方法。
164238

165-
- **有序模式**: 这种模式保持了流的顺序。发出结果记录的顺序与触发异步请求的顺序(记录输入算子的顺序)相同。为了实现这一点,算子将缓冲一个结果记录直到这条记录前面的所有记录都发出(或超时)。由于记录或者结果要在 checkpoint 的状态中保存更长的时间,所以与无序模式相比,有序模式通常会带来一些额外的延迟和 checkpoint 开销。此模式使用 `AsyncDataStream.orderedWait(...)` 方法。
239+
- **有序模式**: 这种模式保持了流的顺序。发出结果记录的顺序与触发异步请求的顺序(记录输入算子的顺序)相同。为了实现这一点,算子将缓冲一个结果记录直到这条记录前面的所有记录都发出(或超时)。由于记录或者结果要在 checkpoint 的状态中保存更长的时间,所以与无序模式相比,有序模式通常会带来一些额外的延迟和 checkpoint 开销。此模式使用 `AsyncDataStream.orderedWait(...)` 或者 `AsyncDataStream.ordered_wait(...)` 方法。
166240

167241

168242
### 事件时间
@@ -203,6 +277,7 @@ Flink 提供两种模式控制结果记录以何种顺序发出。
203277
`DirectExecutor` 可以通过 `org.apache.flink.util.concurrent.Executors.directExecutor()`
204278
`com.google.common.util.concurrent.MoreExecutors.directExecutor()` 获得。
205279

280+
**注意:** 这仅适用于 Java API,在 Python API 中,您可以使用 await 等待异步执行的结果。
206281

207282
### 警告
208283

docs/content/docs/dev/datastream/operators/asyncio.md

Lines changed: 81 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -75,11 +75,14 @@ Assuming one has an asynchronous client for the target database, three parts are
7575
with asynchronous I/O against the database:
7676

7777
- An implementation of `AsyncFunction` that dispatches the requests
78-
- A *callback* that takes the result of the operation and hands it to the `ResultFuture`
78+
- A *callback* that takes the result of the operation and hands it to the `ResultFuture` in Java API or await the result of the operation in Python API
7979
- Applying the async I/O operation on a DataStream as a transformation with or without retry
8080

8181
The following code example illustrates the basic pattern:
8282

83+
{{< tabs "6c8c009c-4c12-4338-9eeb-3be83cfa9e36" >}}
84+
{{< tab "Java" >}}
85+
8386
```java
8487
// This example implements the asynchronous request and callback with Futures that have the
8588
// interface of Java 8's futures (which is the same one followed by Flink's Future)
@@ -147,7 +150,74 @@ DataStream<Tuple2<String, String>> resultStream =
147150
AsyncDataStream.unorderedWaitWithRetry(stream, new AsyncDatabaseRequest(), 1000, TimeUnit.MILLISECONDS, 100, asyncRetryStrategy);
148151
```
149152

150-
**Important note**: The `ResultFuture` is completed with the first call of `ResultFuture.complete`.
153+
{{< /tab >}}
154+
{{< tab "Python" >}}
155+
156+
```python
157+
from typing import List
158+
159+
from pyflink.common import Time, Types
160+
from pyflink.datastream import AsyncFunction, AsyncDataStream, async_retry_predicates
161+
from pyflink.datastream.functions import RuntimeContext, AsyncRetryStrategy
162+
163+
164+
class AsyncDatabaseRequest(AsyncFunction[str, (str, str)]):
165+
166+
def __init__(self, host, port, credentials):
167+
self._host = host
168+
self._port = port
169+
self._credentials = credentials
170+
171+
def open(self, runtime_context: RuntimeContext):
172+
# The database specific client that can issue concurrent requests with callbacks
173+
self._client = DatabaseClient(self._host, self._port, self._credentials)
174+
175+
def close(self):
176+
if self._client:
177+
self._client.close()
178+
179+
async def async_invoke(self, value: str) -> List[(str, str)]:
180+
try:
181+
# issue the asynchronous request
182+
result = await self._client.query(value)
183+
return [(value, str(result))]
184+
except Exception:
185+
return [(value, None)]
186+
187+
188+
# create the original stream
189+
stream = ...
190+
191+
# apply the async I/O transformation without retry
192+
result_stream = AsyncDataStream.unordered_wait(
193+
data_stream=stream,
194+
async_function=AsyncDatabaseRequest("127.0.0.1", "1234", None),
195+
timeout=Time.seconds(10),
196+
capacity=100,
197+
output_type=Types.TUPLE([Types.STRING(), Types.STRING()]))
198+
199+
# or apply the async I/O transformation with retry
200+
# create an async retry strategy via utility class or a user defined strategy
201+
async_retry_strategy = AsyncRetryStrategy.fixed_delay(
202+
max_attempts=3,
203+
backoff_time_millis=100,
204+
result_predicate=async_retry_predicates.empty_result_predicate,
205+
exception_predicate=async_retry_predicates.has_exception_predicate)
206+
207+
# apply the async I/O transformation with retry
208+
result_stream_with_retry = AsyncDataStream.unordered_wait_with_retry(
209+
data_stream=stream,
210+
async_function=AsyncDatabaseRequest("127.0.0.1", "1234", None),
211+
timeout=Time.seconds(10),
212+
async_retry_strategy=async_retry_strategy,
213+
capacity=1000,
214+
output_type=Types.TUPLE([Types.STRING(), Types.STRING()]))
215+
```
216+
217+
{{< /tab >}}
218+
{{< /tabs >}}
219+
220+
**Important note**: The `ResultFuture` is completed with the first call of `ResultFuture.complete` in the Java API.
151221
All subsequent `complete` calls will be ignored.
152222

153223
The following three parameters control the asynchronous operations:
@@ -162,17 +232,21 @@ The following three parameters control the asynchronous operations:
162232
accumulate an ever-growing backlog of pending requests, but that it will trigger backpressure once the capacity
163233
is exhausted.
164234

165-
- **AsyncRetryStrategy**: The asyncRetryStrategy defines what conditions will trigger a delayed retry and the delay strategy,
235+
- **AsyncRetryStrategy**: This parameter defines what conditions will trigger a delayed retry and the delay strategy,
166236
e.g., fixed-delay, exponential-backoff-delay, custom implementation, etc.
167237

168238
### Timeout Handling
169239

170240
When an async I/O request times out, by default an exception is thrown and job is restarted.
171241
If you want to handle timeouts, you can override the `AsyncFunction#timeout` method.
172-
Make sure you call `ResultFuture.complete()` or `ResultFuture.completeExceptionally()` when overriding
242+
243+
In the Java API, make sure you call `ResultFuture.complete()` or `ResultFuture.completeExceptionally()` when overriding
173244
in order to indicate to Flink that the processing of this input record has completed. You can call
174245
`ResultFuture.complete(Collections.emptyList())` if you do not want to emit any record when timeouts happen.
175246

247+
In the Python API, you can return a collection of results or raise an exception when overriding
248+
in order to indicate to Flink that the processing of this input record has completed. You can return
249+
empty list by calling `return []` if you do not want to emit any record when timeouts happen.
176250

177251
### Order of Results
178252

@@ -182,14 +256,14 @@ To control in which order the resulting records are emitted, Flink offers two mo
182256
- **Unordered**: Result records are emitted as soon as the asynchronous request finishes.
183257
The order of the records in the stream is different after the async I/O operator than before.
184258
This mode has the lowest latency and lowest overhead, when used with *processing time* as the basic time characteristic.
185-
Use `AsyncDataStream.unorderedWait(...)` for this mode.
259+
Use `AsyncDataStream.unorderedWait(...)` or `AsyncDataStream.unordered_wait(...)` for this mode.
186260

187261
- **Ordered**: In that case, the stream order is preserved. Result records are emitted in the same order as the asynchronous
188262
requests are triggered (the order of the operators input records). To achieve that, the operator buffers a result record
189263
until all its preceding records are emitted (or timed out).
190264
This usually introduces some amount of extra latency and some overhead in checkpointing, because records or results are maintained
191265
in the checkpointed state for a longer time, compared to the unordered mode.
192-
Use `AsyncDataStream.orderedWait(...)` for this mode.
266+
Use `AsyncDataStream.orderedWait(...)` or `AsyncDataStream.ordered_wait(...)` for this mode.
193267

194268

195269
### Event Time
@@ -240,6 +314,7 @@ with the checkpoint bookkeeping happens in a dedicated thread-pool anyways.
240314
A `DirectExecutor` can be obtained via `org.apache.flink.util.concurrent.Executors.directExecutor()` or
241315
`com.google.common.util.concurrent.MoreExecutors.directExecutor()`.
242316

317+
**NOTE:** This only applies for the Java API. In the Python API, you could just await the asynchronous result.
243318

244319
### Caveats
245320

flink-python/docs/reference/pyflink.common/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,4 +28,5 @@ This page gives an overview of all public PyFlink Common API.
2828
config
2929
typeinfo
3030
job_info
31-
serializer
31+
serializer
32+
time
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
.. ################################################################################
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing, software
13+
distributed under the License is distributed on an "AS IS" BASIS,
14+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
See the License for the specific language governing permissions and
16+
limitations under the License.
17+
################################################################################
18+
19+
20+
====
21+
Time
22+
====
23+
24+
Duration
25+
--------
26+
27+
.. currentmodule:: pyflink.common.time
28+
29+
.. autosummary::
30+
:toctree: api/
31+
32+
Duration.of_days
33+
Duration.of_hours
34+
Duration.of_minutes
35+
Duration.of_millis
36+
Duration.of_seconds
37+
Duration.of_nanos
38+
39+
40+
Instant
41+
-------
42+
43+
.. currentmodule:: pyflink.common.time
44+
45+
.. autosummary::
46+
:toctree: api/
47+
48+
Instant.of_epoch_milli
49+
Instant.to_epoch_milli
50+
51+
52+
Time
53+
----
54+
55+
.. currentmodule:: pyflink.common.time
56+
57+
.. autosummary::
58+
:toctree: api/
59+
60+
Time.milliseconds
61+
Time.seconds
62+
Time.minutes
63+
Time.hours
64+
Time.days
65+
Time.to_milliseconds

0 commit comments

Comments
 (0)