-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow version
2.6.3
What happened
We were using Airflow Oracle Provider 2.0.1 before and recently upgraded to 3.7.1 along with airflow upgrade, and found a strange issue around tcp keep alive settings since upgrading.
We had setup OS-level tcp keep alive settings long time ago to avoid firewall blocking issues (it will kill connection which is idle for more than 5 mins), and it was working well with old version of Oracle provider (using cx-Oracle).
However we met firewall issues after upgrading, which I think may be related to new version of Oracle provider using oracledb.
https://github.com/oracle/python-oracledb/blob/v1.3.2/src/oracledb/connection.py#L1033
expire_time: int=0,
- expire_time: an integer indicating the number of minutes between the
sending of keepalive probes. If this parameter is set to a value greater
than zero it enables keepalive (default: 0)
Seems default setting will DISABLE tcp keepalive for oracledb, however I didn't find any place from Airflow to overwrite this setting, e.g. pass related settings to conn_config
https://github.com/apache/airflow/blob/providers-oracle/3.7.1/airflow/providers/oracle/hooks/oracle.py#L242
Did I miss anything?
What you think should happen instead
It should be able to turn on tcp keep alive related settings Airflow Oracle Provider.
How to reproduce
Just using below codes:
oracle_hook = OracleHook(oracle_conn_id=self.conn_id)
with oracle_hook.get_conn() as src_conn:
cursor = src_conn.cursor()
cursor.outputtypehandler = self.output_type_handler
self.log.info("Querying data from source: %s", self.oracle_conn_id)
cursor.arraysize = self.cursor_size
self.log.info("Setting cursor size to {}".format(self.cursor_size))
self.log.info('Executing {}'.format(self.sql))
if self.parameters is not None:
self.log.info(f"Querying data with the parameters:{self.parameters}")
try:
cursor.execute(self.sql, self.parameters)
except Exception as e:
self.log.error(f"Error message:{e}")
end_time = datetime.now()
duration = (end_time - start_time).total_seconds()
self.log.info(
f'OracleToS3Operator: Schema: {self.table_config["SCHEMA_CODE"]}, Table: {self.table_config["TABLE_NAME"]}, Duration: {duration} seconds, Status: Failed')
raise AirflowException(f"The task has experienced a failure.")
else:
self.log.info(f"The query has been successfully executed")
Operating System
Debian GNU/Linux 11 (bullseye) which is used by airflow helm chart
Versions of Apache Airflow Providers
apache-airflow 2.6.3
apache-airflow-providers-amazon 8.2.0
apache-airflow-providers-celery 3.2.1
apache-airflow-providers-cncf-kubernetes 7.1.0
apache-airflow-providers-common-sql 1.5.2
apache-airflow-providers-docker 3.7.1
apache-airflow-providers-elasticsearch 4.5.1
apache-airflow-providers-ftp 3.4.2
apache-airflow-providers-google 10.2.0
apache-airflow-providers-grpc 3.2.1
apache-airflow-providers-hashicorp 3.4.1
apache-airflow-providers-http 4.4.2
apache-airflow-providers-imap 3.2.2
apache-airflow-providers-microsoft-azure 6.1.2
apache-airflow-providers-mysql 5.1.1
apache-airflow-providers-odbc 4.0.0
apache-airflow-providers-oracle 3.7.1
apache-airflow-providers-postgres 5.5.1
apache-airflow-providers-redis 3.2.1
apache-airflow-providers-salesforce 5.4.1
apache-airflow-providers-sendgrid 3.2.1
apache-airflow-providers-sftp 4.3.1
apache-airflow-providers-slack 7.3.1
apache-airflow-providers-snowflake 4.2.0
apache-airflow-providers-sqlite 3.4.2
apache-airflow-providers-ssh 3.7.1
Deployment
Official Apache Airflow Helm Chart
Deployment details
Using official airflow helm chart 1.9.0
Anything else
After upgrading, we can see a lot of airflow tasks using new oracle providers are stuck and actually those queries not seen from source Oracle DB, i.e. connections killed by firewall.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct