You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 1, 2024. It is now read-only.
In supporting exactly-once stateful processing in Flink with Pulsar, the source function should be able to choose a key range to consume. So the events of a key range in a given partition would always be sent to the consumer to ensure the correctness for stateful processing. Because in Flink all the state management are locally to the instances. We need to make sure all the events for a given key range is always sent to a same instance.
Describe the solution you'd like
A consumer can subscribe with a key range for a partition. Such consumer is called a Sticky consumer.
A sticky consumer can exclusively consume the messages from that key range for a partition.
For a non-sticky consumer, they can consume the remaining key ranges that are not occupied by sticky consumers.
So in implementing Flink exactly-once connector, the Flink source can compute the key ranges based on the number of instances. Each instance will be consuming the events in a key range of a partition. This allows us scaling up the number of Flink instances without increasing the number of partitions, and also guaranteeing exactly-once stateful processing.
Describe alternatives you've considered
N/A. If we don't do so, we have to scale up the number of partitions. This couples producers and consumers to make a decision. The producers are typically owned by a business service team, while the consumers are owned by analytics team.
Additional context
The requirement comes from Pulsar + Flink integration for exactly-once stateful processing.
The text was updated successfully, but these errors were encountered:
Original Issue: apache#4169
Is your feature request related to a problem? Please describe.
Master Issue: #4077
In supporting exactly-once stateful processing in Flink with Pulsar, the source function should be able to choose a key range to consume. So the events of a key range in a given partition would always be sent to the consumer to ensure the correctness for stateful processing. Because in Flink all the state management are locally to the instances. We need to make sure all the events for a given key range is always sent to a same instance.
Describe the solution you'd like
So in implementing Flink exactly-once connector, the Flink source can compute the key ranges based on the number of instances. Each instance will be consuming the events in a key range of a partition. This allows us scaling up the number of Flink instances without increasing the number of partitions, and also guaranteeing exactly-once stateful processing.
Describe alternatives you've considered
N/A. If we don't do so, we have to scale up the number of partitions. This couples producers and consumers to make a decision. The producers are typically owned by a business service team, while the consumers are owned by analytics team.
Additional context
The requirement comes from Pulsar + Flink integration for exactly-once stateful processing.
The text was updated successfully, but these errors were encountered: