Skip to content

KAFKA-10000: Use transactional producer for config topic (KIP-618)#11778

Merged
showuon merged 1 commit intoapache:trunkfrom
C0urante:kafka-10000-leader-transactional-producer
Jun 7, 2022
Merged

KAFKA-10000: Use transactional producer for config topic (KIP-618)#11778
showuon merged 1 commit intoapache:trunkfrom
C0urante:kafka-10000-leader-transactional-producer

Conversation

@C0urante
Copy link
Contributor

Implements the behavior described in KIP-618: using a transactional producer for writes to the config topic that should only be performed by the leader of the cluster.

Relies on changes from:

@C0urante C0urante changed the title KAFKA-10000: Use transactional producer for config topic KAFKA-10000: Use transactional producer for config topic (KIP-618) Feb 17, 2022
@C0urante C0urante marked this pull request as draft February 18, 2022 06:35
@C0urante
Copy link
Contributor Author

Converting to draft until upstream PRs are reviewed.

@C0urante C0urante force-pushed the kafka-10000-leader-transactional-producer branch 2 times, most recently from cb77582 to 0085eee Compare March 3, 2022 16:57
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to a different package to leverage package-private fields and methods while testing other classes in this package (such as the KafkaConfigBackingStore class).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.apache.kafka.connect.storage is a public package, and AFAICS this change in public API wasn't mentioned in KIP-618.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That package contains a mix of public and private API. The public parts are all documented in the Javadocs for the package and include the Converter interface and the SimpleHeaderConverter class; the private parts (which include everything in the connect/runtime module) contain the ClusterConfigState class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I missed that this package move on its own won't make this class a public API.

@C0urante C0urante force-pushed the kafka-10000-leader-transactional-producer branch from 0085eee to 4599afe Compare May 10, 2022 00:44
@C0urante
Copy link
Contributor Author

Given that all merge conflicts have been resolved and #11775 has already been approved, marking this as ready for review.

@C0urante C0urante marked this pull request as ready for review May 10, 2022 00:46
@C0urante C0urante force-pushed the kafka-10000-leader-transactional-producer branch from 4599afe to 65fddc7 Compare May 10, 2022 02:52
@C0urante C0urante force-pushed the kafka-10000-leader-transactional-producer branch from 65fddc7 to a9bf688 Compare May 17, 2022 12:27
@C0urante
Copy link
Contributor Author

C0urante commented May 21, 2022

@tombentley the 3.3.0 feature freeze is a little over a month away and there are still seven open pull requests for this feature. Do you have time to take a look?

@C0urante
Copy link
Contributor Author

@tombentley If you do not have time to take a look, is there anyone you can recommend in your place?

@C0urante
Copy link
Contributor Author

C0urante commented Jun 1, 2022

@tombentley hello? Is anyone actually maintaining Kafka Connect anymore or should we all just stop contributing to it?

Copy link
Member

@tombentley tombentley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @C0urante! I left a few comments, but mostly this looks good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that it is not necessary to wrap every write to the config topic in this method, only the writes that should be performed exclusively by the leader.

Should we make this part of the method name, e.g. writeToConfigTopicAsLeader

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, done 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.apache.kafka.connect.storage is a public package, and AFAICS this change in public API wasn't mentioned in KIP-618.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're talking about a "fencable" producer, then shouldn't this be "fencably", rather than "fencibly"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise that we're not trying to support a thread cancellation mechanism here, but shouldn't we set the thread's interrupted status when catching InterruptedException?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part isn't new, it's just an inlining of a private method that was only being used in one place. Considering it follows the logic used in all similar methods for this class and there haven't been any bugs reported related to it that I'm aware of, I'd like to err on the side of caution here and keep this logic as-is for now, though we can address in a follow-up if this is a genuine issue.

@C0urante C0urante force-pushed the kafka-10000-leader-transactional-producer branch from 335991c to 6c94f37 Compare June 1, 2022 22:08
@C0urante
Copy link
Contributor Author

C0urante commented Jun 1, 2022

Thanks Tom. I hope the package shuffling with ClusterConfigState isn't too painful; it seems warranted for the testing advantages and better fit with its purpose (since it's used by both the standalone and distributed herders) but if there's a less-intrusive, better-suited alternative, I'm all ears.

@C0urante C0urante force-pushed the kafka-10000-leader-transactional-producer branch from 6c94f37 to c312555 Compare June 2, 2022 12:35
@showuon
Copy link
Member

showuon commented Jun 2, 2022

I'll take a look this week.

@C0urante
Copy link
Contributor Author

C0urante commented Jun 2, 2022

Thanks, Luke.

Copy link
Member

@showuon showuon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the implementation code. Left some comments. Will continue with the test code tomorrow. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: [Please] disable the worker's exactly-once support....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, done

Comment on lines 387 to 388
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this expected even when we enable idempotence? I thought the idempotent producer can make sure the order when max in-flight <= 5? Could we set to 5 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, changed it to set to 5. Considered an alternative where we don't explicitly set anything and let the default (5) take effect unless the user overrides it in their worker config file, but we'd have to lower it to 5 if it were higher than that in order to prevent reordering, and there just doesn't seem to be enough value in permitting that kind of flexibility to do the extra work to add the necessary safeguards around it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we put default to 5 for fencableProducer for now. And improve it for customization to <= 5 value if necessary in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😎

@C0urante C0urante force-pushed the kafka-10000-leader-transactional-producer branch from c312555 to 7d84e77 Compare June 5, 2022 20:08
@C0urante
Copy link
Contributor Author

C0urante commented Jun 5, 2022

Thanks Luke, appreciate it.

Copy link
Member

@showuon showuon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Left some comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we close the fencableProducer when ProducerFencedException thrown? Looks like we only recreate one afterwards, right? And what about other exceptions? Should we also close fencableProducer in those cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should close it in this case. I originally wanted to be more conservative about other cases, but after thinking it over, I'd rather not risk getting into a bad state with a transactional producer that can't be fixed and requires a new one. Given the expected low frequency of writes to the config topic, it's probably fine to also close the producer and then construct a new one in claimWritePrivileges in this case. Good catch!

Comment on lines 387 to 388
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we put default to 5 for fencableProducer for now. And improve it for customization to <= 5 value if necessary in the future.

@C0urante C0urante force-pushed the kafka-10000-leader-transactional-producer branch from 7d84e77 to ac9674c Compare June 7, 2022 03:03
Copy link
Member

@showuon showuon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

@showuon
Copy link
Member

showuon commented Jun 7, 2022

Failed tests are unrelated:

Build / JDK 8 and Scala 2.12 / kafka.api.ConsumerBounceTest.testConsumerReceivesFatalExceptionWhenGroupPassesMaxSize()

@showuon showuon merged commit 603502b into apache:trunk Jun 7, 2022
@C0urante C0urante deleted the kafka-10000-leader-transactional-producer branch June 7, 2022 15:13
@C0urante
Copy link
Contributor Author

C0urante commented Jun 7, 2022

Thanks Luke, and thanks Tom!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants