[FLINK-36180] Fix batch message data loss #95

wenbingshen · 2024-08-29T13:35:01Z

Purpose of the change

When restoring from the state, the unconsumed messages of the batch will be lost.

we should return the next batch index single message for current batch.

wenbingshen · 2024-08-30T03:42:13Z

syhily · 2024-08-30T11:01:40Z

...src/main/java/org/apache/flink/connector/pulsar/source/enumerator/cursor/CursorPosition.java

+ messageIdImpl.getPartitionIndex(),
+ messageIdImpl.getBatchIndex() + 1,
+ messageIdImpl.getBatchSize(),
+ messageIdImpl.getAckSet());


Should we change the act set here for making sure the current batch index has been acknowledged?

https://github.com/apache/pulsar/blob/dccc06bf50bb5ca510b39167908c02d2b4602ca5/pulsar-client/src/main/java/org/apache/pulsar/client/impl/MessageIdAdvUtils.java#L50

Batch Message is a quite complex feature. How could we verify the acknowledge is acked cumulately in batch message? It could be acked individually.

Should we change the act set here for making sure the current batch index has been acknowledged?

https://github.com/apache/pulsar/blob/dccc06bf50bb5ca510b39167908c02d2b4602ca5/pulsar-client/src/main/java/org/apache/pulsar/client/impl/MessageIdAdvUtils.java#L50

There is no need to modify this AckSet, because we finally call Consumer seek. In the seek method, the messages before batchIndex will be cumulative acked. The AckSet here will not work regardless of its status.

Batch Message is a quite complex feature. How could we verify the acknowledge is acked cumulately in batch message? It could be acked individually.

In flink-pulsar-connector, the receive queue of the consumer is set to 1, will the message before the current batchIndex not be confirmed?

Even if a single message is confirmed, the current connector does not support BatchIndexAck.

The seek operation during task failure recovery cannot guarantee that AckSet will work, as I said above.

The current connector's ack behavior is cumulative confirmation

Finally, if we don't change the seeking behavior of CursorPosition, we won't be able to recover from AckSet regardless of the AckSet in the state.
The changes in this PR are valid under the current cumulative acknowledgment behavior.

The receive queue setting

flink-connector-pulsar/flink-connector-pulsar/src/main/java/org/apache/flink/connector/pulsar/source/PulsarSourceOptions.java

Line 282 in b37a8b3

public static final ConfigOption<Integer> PULSAR_RECEIVER_QUEUE_SIZE =

has been exposed to the use with default value 1000

IIUC, the support for the batch AckSet is achieved locally by the pulsar-client after all the batch message has been acked. (BTW, this shouldn't be touched by the connector user and developer, which should be promised by the pulsar client developer.)

The recover is queried from the checkpoint saved MessageId. Which the AckSet is controlled internally by the client I think.

syhily · 2024-08-30T11:03:41Z

...rc/main/java/org/apache/flink/connector/pulsar/source/reader/PulsarPartitionSplitReader.java

@@ -196,7 +197,14 @@ public void handleSplitsChanges(SplitsChange<PulsarPartitionSplit> splitsChanges
 MessageId latestConsumedId = registeredSplit.getLatestConsumedId();

 if (latestConsumedId != null) {
- LOG.info("Reset subscription position by the checkpoint {}", latestConsumedId);
+ if (latestConsumedId instanceof BatchMessageIdImpl) {


I prefer to use MessageIdAdv globally.

MessageIdAdv is inaccurate. It contains implementations such as MessageIdImpl, not BatchMessageId. Here we want to print out the correct batchSize.

I can see that all the message implementation implement the MessageIdAdv interface. Which contains all the required information for the client. I think it's more better to use MessageIdAdv instead of the MessageId here in the whole connector.

/** * The {@link MessageId} interface provided for advanced users. * <p> * All built-in MessageId implementations should be able to be cast to MessageIdAdv. * </p> */

syhily · 2024-08-30T11:04:43Z

Thanks for your pull request @wenbingshen. I left some questions here. Can you check them?

wenbingshen · 2024-08-30T11:59:15Z

Thanks for your pull request @wenbingshen. I left some questions here. Can you check them?

@syhily I responded above, PTAL. Thanks.

syhily · 2024-08-31T06:11:57Z

After discussing with @wenbingshen I think this PR do fixed the batch message consuming issues when the connector is recovered from the checkpoint. LGTM.

@reswqa @tisonkun Can you double confirm this PR and merge it?

fix batch message data loss

6b59918

boring-cyborg bot added the component=Connectors/Pulsar label Aug 29, 2024

wenbingshen changed the title ~~[FLINK-36180][Connectors/Pulsar] Fix batch message data loss~~ [FLINK-36180] Fix batch message data loss Aug 29, 2024

fix spotless

91883e0

syhily reviewed Aug 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-36180] Fix batch message data loss #95

[FLINK-36180] Fix batch message data loss #95

wenbingshen commented Aug 29, 2024

wenbingshen commented Aug 30, 2024

syhily Aug 30, 2024

syhily Aug 30, 2024

wenbingshen Aug 30, 2024

wenbingshen Aug 30, 2024

syhily Aug 31, 2024

syhily Aug 30, 2024

wenbingshen Aug 30, 2024 •

edited

Loading

syhily Aug 31, 2024

syhily commented Aug 30, 2024

wenbingshen commented Aug 30, 2024

syhily commented Aug 31, 2024

[FLINK-36180] Fix batch message data loss #95

Are you sure you want to change the base?

[FLINK-36180] Fix batch message data loss #95

Conversation

wenbingshen commented Aug 29, 2024

Purpose of the change

wenbingshen commented Aug 30, 2024

syhily Aug 30, 2024

Choose a reason for hiding this comment

syhily Aug 30, 2024

Choose a reason for hiding this comment

wenbingshen Aug 30, 2024

Choose a reason for hiding this comment

wenbingshen Aug 30, 2024

Choose a reason for hiding this comment

syhily Aug 31, 2024

Choose a reason for hiding this comment

syhily Aug 30, 2024

Choose a reason for hiding this comment

wenbingshen Aug 30, 2024 • edited Loading

Choose a reason for hiding this comment

syhily Aug 31, 2024

Choose a reason for hiding this comment

syhily commented Aug 30, 2024

wenbingshen commented Aug 30, 2024

syhily commented Aug 31, 2024

wenbingshen Aug 30, 2024 •

edited

Loading