-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow Provider(s)
Versions of Apache Airflow Providers
Issue faced on two versions:
apache-airflow-providers-google==12.0.0
apache-airflow-providers-google==15.1.0
Apache Airflow version
2.10.5
Operating System
Debian Linux 12
Deployment
Docker-Compose
Deployment details
Docker version 27.5.1
No tools used other than airflow.
What happened
Code reference that causes the issue:
airflow/providers/google/src/airflow/providers/google/cloud/hooks/bigquery.py
Lines 1663 to 1666 in e142ab9
| if "schema" in query_results: | |
| self.description = _format_schema_for_description(query_results["schema"]) | |
| else: | |
| self.description = [] |
airflow/providers/google/src/airflow/providers/google/cloud/hooks/bigquery.py
Lines 2061 to 2062 in e142ab9
| description = [] | |
| for field in schema["fields"]: |
While executing an external table creation job, I faced an issue that the above code runs into an error:
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/google/cloud/hooks/bigquery.py", line 2077, in _format_schema_for_description
for field in schema["fields"]:
~~~~~~^^^^^^^^^^
KeyError: 'fields'
After debugging I noticed that this kind of jobs return an empty dictionary for the key "schema"
This causes the second code reference in line 2061-2062 to try to loop over an empty dictionary and then it fails as there are no key "fields".
What you think should happen instead
What should happen is that the following script should check if the value for key "schema" exists and not check if the key it self exists
airflow/providers/google/src/airflow/providers/google/cloud/hooks/bigquery.py
Lines 1663 to 1666 in e142ab9
| if "schema" in query_results: | |
| self.description = _format_schema_for_description(query_results["schema"]) | |
| else: | |
| self.description = [] |
If we make the logic to check the value instead of the key then it will make sure that it won't call function '_format_schema_for_description()' without a proper schema values.
To fix this we can modify the script at line 1663-1666 to the following script:
if query_results["schema"]:
self.description = _format_schema_for_description(query_results["schema"])
else:
self.description = []
How to reproduce
To reproduce the problem you need the following:
1- Airflow and airflow-providers installed < versions dont matter much as this script exists in multiple versions.
2- Create a sql script that create an external table < This will assure no schema will be returned.
3- execute the sql script using the cursor from the bigqueryhook
hook = BigQueryHook(gcp_conn_id=gcp_conn_id, use_legacy_sql=False) conn = hook.get_conn() cursor = conn.cursor() cursor.execute(operation='SELECT 1 AS NUMBER')
Anything else
This problem is part of the script and will appear whenever anyone tries to execute a non resulting query.
Something like creating and dropping tables, data transformation, and other system queries.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct