Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SinkContext: ability to seek/pause/resume consumer for a topic #10498

Merged
merged 4 commits into from
May 18, 2021

Conversation

dlg99
Copy link
Contributor

@dlg99 dlg99 commented May 7, 2021

Motivation

Allow sink to rewind a topic to given offset and pause/resume consumer for given topic
This is needed for #9927 https://github.com/apache/pulsar/pull/9927/files#r595722189

Modifications

SinkContext API:
New methods to seek/pause/resume
Matching implementations in ContextImpl

Added ExtendedSourceContext interface (not public) to provide access to the PulsarSource's consumers.
Updated ContextImpl and PulsarSource's implementations to provide this functionality.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change added tests and can be verified as follows:

  • Added unit tests

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • The public API: yes

SinkContext added new methods, default implementation provided.

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? JavaDocs

@dlg99 dlg99 marked this pull request as draft May 7, 2021 01:08
@jerrypeng
Copy link
Contributor

@dlg99 can you also please explain why this is needed for the Pulsar IO Sink and Kafka Sink compatibility layer?

@dlg99
Copy link
Contributor Author

dlg99 commented May 7, 2021

@jerrypeng kafka-connect-adaptor implements some of the kafka interfaces. For the sink it is SinkTaskContext https://kafka.apache.org/23/javadoc/org/apache/kafka/connect/sink/SinkTaskContext.html - it has offset/pause/resume methods that need seek/pause/resume from the Sink context.

@dlg99 dlg99 force-pushed the sink-rewind-topic branch from ad6fc9d to 70ab10d Compare May 10, 2021 22:03
@dlg99 dlg99 marked this pull request as ready for review May 10, 2021 23:24
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, overall the approach is good to me.

either this approach or the proposal from @jerrypeng works for me.

@dlg99 dlg99 requested a review from eolivelli May 11, 2021 20:11
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

Copy link
Contributor

@jerrypeng jerrypeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a simpler approach to referencing consumers created in PulsarSource from Context. Please see my comment.

@dlg99
Copy link
Contributor Author

dlg99 commented May 13, 2021

@jerrypeng @eolivelli I did the changes that @jerrypeng requested, also figured that MultiTopicsConsumerImpl (and, transitively, PatternMultiTopicsConsumerImpl) need special handling because getTopic()for such consumers does not return the topic.

@dlg99 dlg99 requested a review from jerrypeng May 13, 2021 22:14
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if (inputConsumers == null) {
throw new PulsarClientException("Getting consumer is not supported");
}
for (int i = 0; i < 2; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this for loop needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two attempts to get the consumer:

  1. Try to get the consumer.
  2. If not found, reprocess MultiTopicsConsumers in case new consumers appeared (happens on repartition or a new topic that matches provided pattern if the pattern is used),
  3. give it another try.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like the right way to do this. Retrying twice seems arbitrary. I would rather let the user implement how many retries they want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerrypeng this is not arbitrary.
I have no access to callbacks/events when the MultiTopicsConsumers adds another partition or a topic (for pattern-based subscription). MultiTopicsConsumers do have that internally.
So if the check of previously cached topic/partition -> consumer map finds nothing it means that it:

  • either doesn't exist
  • or the topic/partition (and consumer) got added after the initial map was built.
    In that case it makes sense to check MultiTopicsConsumers once to update the map and try again.
    I don't think blocking and polling is justified, at least I don't have such usecase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlg99 if partitions / topics are added dynamically because of topic pattern match or increases in partition count, how would the function know when that happened and why would trying to the consumers twice help with that? There is an inherent race condition there. If can always argue there can be a scenario that topics/partitions got created after you called this method. If the goal of the method is to get new consumers for new topics / partitions then this API is not appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked offline about reasons it is needed.
I expressed the retry more clearly and added comments. Please take a look.

@sijie sijie added this to the 2.8.0 milestone May 15, 2021
@dlg99 dlg99 requested a review from jerrypeng May 15, 2021 17:40
@dlg99
Copy link
Contributor Author

dlg99 commented May 17, 2021

@jerrypeng do you have any additional feedback or is it good to go?

@eolivelli eolivelli merged commit 8183a1e into apache:master May 18, 2021
@dlg99 dlg99 deleted the sink-rewind-topic branch October 14, 2021 23:30
bharanic-dev pushed a commit to bharanic-dev/pulsar that referenced this pull request Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants