Skip to content

Null key reference in bigquery cursor query results #53127

@FaresDev8

Description

@FaresDev8

Apache Airflow Provider(s)

google

Versions of Apache Airflow Providers

Issue faced on two versions:
apache-airflow-providers-google==12.0.0
apache-airflow-providers-google==15.1.0

Apache Airflow version

2.10.5

Operating System

Debian Linux 12

Deployment

Docker-Compose

Deployment details

Docker version 27.5.1
No tools used other than airflow.

What happened

Code reference that causes the issue:

if "schema" in query_results:
self.description = _format_schema_for_description(query_results["schema"])
else:
self.description = []

description = []
for field in schema["fields"]:

While executing an external table creation job, I faced an issue that the above code runs into an error:

File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/google/cloud/hooks/bigquery.py", line 2077, in _format_schema_for_description
    for field in schema["fields"]:
                 ~~~~~~^^^^^^^^^^
KeyError: 'fields'

After debugging I noticed that this kind of jobs return an empty dictionary for the key "schema"
This causes the second code reference in line 2061-2062 to try to loop over an empty dictionary and then it fails as there are no key "fields".

What you think should happen instead

What should happen is that the following script should check if the value for key "schema" exists and not check if the key it self exists

if "schema" in query_results:
self.description = _format_schema_for_description(query_results["schema"])
else:
self.description = []

If we make the logic to check the value instead of the key then it will make sure that it won't call function '_format_schema_for_description()' without a proper schema values.

To fix this we can modify the script at line 1663-1666 to the following script:

if query_results["schema"]:
self.description = _format_schema_for_description(query_results["schema"])
else:
self.description = []

How to reproduce

To reproduce the problem you need the following:
1- Airflow and airflow-providers installed < versions dont matter much as this script exists in multiple versions.
2- Create a sql script that create an external table < This will assure no schema will be returned.
3- execute the sql script using the cursor from the bigqueryhook
hook = BigQueryHook(gcp_conn_id=gcp_conn_id, use_legacy_sql=False) conn = hook.get_conn() cursor = conn.cursor() cursor.execute(operation='SELECT 1 AS NUMBER')

Anything else

This problem is part of the script and will appear whenever anyone tries to execute a non resulting query.
Something like creating and dropping tables, data transformation, and other system queries.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions