missing pyarrow dependency in google provider? #42924
Labels
area:dependencies
Issues related to dependencies problems
area:providers
good first issue
kind:bug
This is a clearly a bug
needs-triage
label for new issues that we didn't triage yet
provider:google
Google (including GCP) related issues
Apache Airflow Provider(s)
google
Versions of Apache Airflow Providers
6.1
but looks it's the same in>10
Apache Airflow version
2.2
Operating System
tested linux, macos
Deployment
Virtualenv installation
Deployment details
linux/macos &
uv pip
to install the packagesWhat happened
I dont know if im missing some obvious reason for this, but
pyarrow
is not specified as a dependency for the google provider, while it definetly depends on it: https://github.com/apache/airflow/blob/main/providers/src/airflow/providers/google/cloud/transfers/sql_to_gcs.py#L29If i do an install in a venv with:
pyarrow won't be installed, and importing from sql_to_gcs will raise a exception
if i remove google-cloud-bigquery, it WILL be installed, i have no idea what causes this behavior since google-cloud-bigquery does list pyarrow as a dependency. But the version is installed is due to
pandas-gbq
and depending on itWhat you think should happen instead
IMO if a package is used directly, then it's a direct dependency and it shouldn't rely on it being available via indirect dependencies
I can just add pyarrow myself and solve my problem, but i think the dependency should be explicitly defined in the provider
How to reproduce
uv pip install .
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: