Skip to content

Airflow Kafka Provider "commit_cadence" Not Working as Expected #34213

@ahipp13

Description

@ahipp13

Apache Airflow version

Other Airflow 2 version (please specify below)

What happened

When running the Airflow Kafka Provider Operator "ConsumeFromTopicOperator", I had one of my runs fail. Naturally since I have the "commit_cadence" option set to "end_of_operator", I was expecting to have duplicate records since it should have not commit the offset because the operator failed. Well the day ended and my counts were off, and when I looked in my DB I found that during the time it failed is when it missed the messages. So when the DAG run failed the offset was for some reason still committed even though I had set it to "end_of_operator".

What you think should happen instead

Based on your description, the offset should not get committed until the operator has completed successfully. If the DAG fails, it should go back to the offset the operator started on.

How to reproduce

Run the Kafka Provider on a topic and mid DAG run fail it, and see if it goes back and gets the messages it missed. The connection information I used is:

{
"bootstrap.servers": SERVERS,
"group.id": GROUPID,
"auto.offset.reset": "earliest",
"security.protocol": "SSL",
"ssl.ca.location": "CA",
"ssl.certificate.location": "CERT",
"ssl.key.location": "KEY",
"ssl.key.password": "PW"
}

Operating System

PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"

Versions of Apache Airflow Providers

apache-airflow-providers-apache-kafka==1.1.2

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else

Looking through the Confluent Kafka Documentation, I suspect what is happening here is because for Confluent's consumers they have an option "enable.auto.commit" that defaults to true, and it commits the offset every 5 seconds (https://docs.confluent.io/platform/current/clients/consumer.html#id1). When I turned this option to false, it worked as expected and I was getting duplicate messages on fails.

I don't really know what the expected behavior here is, but either 1) the code should be changed to turn this option off in the source code or 2) the documentation should specifically say that you need to turn this option to false in order for the commit_cadence option to work.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions