-
Notifications
You must be signed in to change notification settings - Fork 14.9k
KAFKA-19176: Update Transactional producer to translate retriable into abortable exceptions #19522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| if (error == null) | ||
| throw new IllegalArgumentException("Cannot transition to " + target + " with a null exception"); | ||
|
|
||
| if (error instanceof RetriableException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we leave a comment that RetriableExceptions from the Sender thread should be translated to abortable?
clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java
Outdated
Show resolved
Hide resolved
jolshan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
|
@jolshan Thanks for review |
| // RetriableExceptions from the Sender thread are converted to Abortable errors | ||
| // because they indicate that the transaction cannot be completed after all retry attempts. | ||
| // This conversion ensures the application layer treats these errors as abortable, | ||
| // preventing duplicate message delivery. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe not a change we need now, but it isn't totally clear from the method name that this should only be called from the sender thread. Maybe we should refactor this in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.. Got me confused too.
Thanks for review
jolshan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
…o abortable exceptions (apache#19522) ### Problem - Currently, when a transactional producer encounters retriable errors (like `COORDINATOR_LOAD_IN_PROGRESS`) and exhausts all retries, finally returns retriable error to Application Layer. - Application reties can cause duplicate records. As a fix we are transitioning all retriable errors as Abortable Error in transaction producer path. - Additionally added InvalidTxnStateException as part of https://issues.apache.org/jira/browse/KAFKA-19177 ### Solution - Modified the TransactionManager to automatically transition retriable errors to abortable errors after all retries are exhausted. This ensures that applications can abort transaction when they encounter `TransactionAbortableException` - `RefreshRetriableException` like `CoordinatorNotAvailableException` will be refreshed internally [[code](https://github.com/k-raina/kafka/blob/6c26595ce3d1608ae98ad4958b2ff8776a025fc3/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#L1702-L1705)] till reties are expired, then it will be treated as retriable errors and translated to `TransactionAbortableException` - Similarly for InvalidTxnStateException ### Testing Added test `testSenderShouldTransitionToAbortableAfterRetriesExhausted` to verify in sender thread: - Retriable errors are properly converted to abortable state after retries - Transaction state transitions correctly and subsequent operations fail appropriately with TransactionAbortableException Reviewers: Justine Olshan <jolshan@confluent.io>
Problem
Currently, when a transactional producer encounters retriable errors
(like
COORDINATOR_LOAD_IN_PROGRESS) and exhausts all retries, finallyreturns retriable error to Application Layer.
Application reties can cause duplicate records. As a fix we are
transitioning all retriable errors as Abortable Error in transaction
producer path.
Additionally added InvalidTxnStateException as part of
https://issues.apache.org/jira/browse/KAFKA-19177
Solution
Modified the TransactionManager to automatically transition retriable
errors to abortable errors after all retries are exhausted. This ensures
that applications can abort transaction when they encounter
TransactionAbortableExceptionRefreshRetriableExceptionlikeCoordinatorNotAvailableExceptionwill be refreshed internally
[code]
till reties are expired, then it will be treated as retriable errors and
translated to
TransactionAbortableExceptionSimilarly for InvalidTxnStateException
Testing
Added test
testSenderShouldTransitionToAbortableAfterRetriesExhaustedto verify in sender thread:
retries
appropriately with TransactionAbortableException
Reviewers: Justine Olshan jolshan@confluent.io