Skip to content

Conversation

@jason810496
Copy link
Member

closes: #34213

Why

Based on the discussion in the issue

  • if we need to respect commit_cadence="never", "end_of_batch", "end_of_operator"
    • we should set enable.auto.commit option in Kafka Connection to false
    • otherwise enable.auto.commit would be on by default, and the consumer will auto commit the offset every 5 seconds.

What

  • Add validation for commit_cadence with enable.auto.commit option with corresponding Kafka Connection
    • Will log warning if the commit_cadence is set but enable.auto.commit is not false
  • Point out this behavior in the documentation as well

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds validation for the commit_cadence parameter in the Kafka consumer operator and ensures that the Kafka connection configuration adheres to the expected auto-commit settings.

  • Added a private _validate_commit_cadence method in the operator code that validates commit_cadence and logs warnings when necessary.
  • Updated tests to cover various commit_cadence and enable.auto.commit configuration combinations.
  • Updated documentation to emphasize the need to set enable.auto.commit to false when commit_cadence is used.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
providers/apache/kafka/tests/unit/apache/kafka/operators/test_consume.py Added parameterized tests to validate commit_cadence configuration and warning behavior.
providers/apache/kafka/src/airflow/providers/apache/kafka/operators/consume.py Introduced _validate_commit_cadence method and refactored validation logic for commit_cadence and enable.auto.commit.
providers/apache/kafka/docs/operators/index.rst Documented the required Kafka connection configuration when using commit_cadence.
Comments suppressed due to low confidence (1)

providers/apache/kafka/src/airflow/providers/apache/kafka/operators/consume.py:207

  • Consider adding an inline comment explaining why the 'never' commit_cadence is converted to None, clarifying the intended behavior for future maintainers.
        if self.commit_cadence == "never":

@jason810496 jason810496 force-pushed the fix/providers/apache-kafka/commit_cadence branch from c806005 to 3f82e50 Compare June 22, 2025 11:02
@jason810496 jason810496 marked this pull request as draft June 23, 2025 09:20
@jason810496 jason810496 marked this pull request as ready for review June 23, 2025 14:19
Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test coverage! LGTM +1 (few nits)

@jason810496 jason810496 force-pushed the fix/providers/apache-kafka/commit_cadence branch from 77864c0 to 9fcee63 Compare June 24, 2025 13:29
Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1
@eladkal ?

@jason810496 jason810496 force-pushed the fix/providers/apache-kafka/commit_cadence branch from 9fcee63 to deed801 Compare June 30, 2025 18:20
@jason810496
Copy link
Member Author

Just rebased and resolved conflict.
Any comment for this PR @eladkal ?

@jason810496 jason810496 merged commit 1ebac39 into apache:main Jul 4, 2025
70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Airflow Kafka Provider "commit_cadence" Not Working as Expected

2 participants