Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backport] [v24.1.x] miscellaneous idempotency fixes #22687 #22781

Merged
merged 9 commits into from
Aug 15, 2024

Conversation

bharathv
Copy link
Contributor

@bharathv bharathv commented Aug 7, 2024

Two main changes in this patch

  • Broker can now handle epoch bumps for idempotent producers. A client can independently bump the producer epoch in certain situations (check kip-360 and related code) as idempotency only pertains to the single session. The broker code had issues handling epoch bumps which is fixed.

  • For evicted producer state on the broker (eg: log prefix truncation, producer expiration etc), there are subtle differences among clients around how they handle the producer reset scenario. Java client, for example bumps the epoch on OOOSN and if there are no other requests in flight while librdkafka is pretty strict and only does it on UNKNOWN_PRODUCER_ID error code (which explicitly tells the client that the broker has no state for the producer and it should reset). Changed the code to what Apache Kafka does, upon encountering an unknown producer id, any sequence number is accepted to make forward progress because the only way a broker doesn't know about the producer is when it got evicted from memory, doesn't seem fool proof but consistent with AK behavior and more importantly works with all the client implementations.

Fixes #22754

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

  • none

Factoring out the code into a utility, to be used in a later commit.

(cherry picked from commit 7ac5acf)
To be used to reset the producer state with new epoch for idempotent
producers that decide to bump the epoch on the client side (which is
totally fine as the idempotency is per session and client can
independently decide to bump the epoch on it's side).

(cherry picked from commit 2f5b9cb)
For producers the broker no longer tracks, we now skip sequence
checks and allow any non zero sequence. This can happen if the producer
produced after the producer got evicted from the broker's memory (eg:
log got prefix truncated, producer hit expiration thresholds etc)

While kip-360 suggests that the broker should throw unknown_producer_id
error in this case, Apache Kafka no longer does that. Adding to the
complication not every client implements unknown_producer_id logic
similiarly, this can result in different behaviors on different clients.

With this patch, we just mimic what Apache Kafka does to be consistent.

Apache Kafka code for future reference.
https://github.com/apache/kafka/pull/7115/files#diff-5482b26d93c5d36f272f65e628c1692622b69f8ba4a2df04ba74fad23623828dR239

Cherry-picked from bc3d761
config definition says it is but it is not, fixed it

(cherry picked from commit 0ed2a90)
If expire_old_txes kicks in and there are no tx topics, it means there
are no transactions, that can be logged at a lower severity.

(cherry picked from commit 3c36e7c)
Copy link
Contributor

@ztlpn ztlpn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess 092a2b8 was "backported" earlier in #22738

@@ -209,6 +209,17 @@ class producer_state {
void update_current_txn_start_offset(std::optional<kafka::offset> offset) {
_current_txn_start_offset = offset;
}

model::producer_identity id() const { return _id; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I guess this better fits in a5bb2d2?

@piyushredpanda piyushredpanda added this to the v24.1.15 milestone Aug 12, 2024
@piyushredpanda
Copy link
Contributor

/ci-repeat 1

@bharathv
Copy link
Contributor Author

Failure #17206

@piyushredpanda piyushredpanda merged commit 6156f49 into redpanda-data:v24.1.x Aug 15, 2024
15 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants