Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backport] [v24.2.x] miscellaneous idempotency fixes #22687 #22757

Merged
merged 10 commits into from
Aug 6, 2024

Conversation

bharathv
Copy link
Contributor

@bharathv bharathv commented Aug 6, 2024

Two main changes in this patch

  • Broker can now handle epoch bumps for idempotent producers. A client can independently bump the producer epoch in certain situations (check kip-360 and related code) as idempotency only pertains to the single session. The broker code had issues handling epoch bumps which is fixed.

  • For evicted producer state on the broker (eg: log prefix truncation, producer expiration etc), there are subtle differences among clients around how they handle the producer reset scenario. Java client, for example bumps the epoch on OOOSN and if there are no other requests in flight while librdkafka is pretty strict and only does it on UNKNOWN_PRODUCER_ID error code (which explicitly tells the client that the broker has no state for the producer and it should reset). Changed the code to what Apache Kafka does, upon encountering an unknown producer id, any sequence number is accepted to make forward progress because the only way a broker doesn't know about the producer is when it got evicted from memory, doesn't seem fool proof but consistent with AK behavior and more importantly works with all the client implementations.

Fixes #22753

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

  • none

bharathv added 10 commits August 6, 2024 11:41
Factoring out the code into a utility, to be used in a later commit.

(cherry picked from commit 7ac5acf)
To be used to reset the producer state with new epoch for idempotent
producers that decide to bump the epoch on the client side (which is
totally fine as the idempotency is per session and client can
independently decide to bump the epoch on it's side).

(cherry picked from commit 2f5b9cb)
For producers the broker no longer tracks, we now skip sequence
checks and allow any non zero sequence. This can happen if the producer
produced after the producer got evicted from the broker's memory (eg:
log got prefix truncated, producer hit expiration thresholds etc)

While kip-360 suggests that the broker should throw unknown_producer_id
error in this case, Apache Kafka no longer does that. Adding to the
complication not every client implements unknown_producer_id logic
similiarly, this can result in different behaviors on different clients.

With this patch, we just mimic what Apache Kafka does to be consistent.

Apache Kafka code for future reference.
https://github.com/apache/kafka/pull/7115/files#diff-5482b26d93c5d36f272f65e628c1692622b69f8ba4a2df04ba74fad23623828dR239

(cherry picked from commit bc3d761)
config definition says it is but it is not, fixed it

(cherry picked from commit 0ed2a90)
If expire_old_txes kicks in and there are no tx topics, it means there
are no transactions, that can be logged at a lower severity.

(cherry picked from commit 3c36e7c)
In some racy situations it may happen that the request is already
errored out. Consider the following sequence of actions.

replicate_f - succeeded but set_value() not called
-- scheduling point --
term change -> sync() -> GC of inflight requests, request is marked
timedout

now set_value() is called in the original fiber, this triggers an
assert.

Relaxing the assert condition to make it idempotent. Subsequent client
retry of the request will be marked success (once the change is applied
in the stm and the request state is populated).

Unable to reproduce in a unit test mainly due to lack of an idempotent
client in the unit test fixture.

(cherry picked from commit 092a2b8)
@bharathv bharathv requested review from mmaslankaprv and ztlpn August 6, 2024 19:12
@bharathv bharathv added this to the v24.2.2 milestone Aug 6, 2024
@piyushredpanda piyushredpanda merged commit b25f128 into redpanda-data:v24.2.x Aug 6, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants