-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deps!: BigQuery Storage and pyarrow are required dependencies #776
Conversation
The annotations caused a mismatch
Since this is now a required dependency, there should not be any more pip quirks that used to allow installing BQ Storage as an extra, but without always respecting its minimum version pin.
@tswast It might be easier to review each commit separately, I tried to make them independent from each other. Also, double check if the assumption stated in the last commit's message really holds - if not, we can revert that bit. |
setup.py
Outdated
"google-cloud-core >= 1.4.1, < 3.0dev", | ||
"google-resumable-media >= 0.6.0, < 3.0dev", | ||
"packaging >= 14.3", | ||
"protobuf >= 3.12.0", | ||
"pyarrow >= 1.0.0, < 5.0dev", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making these previously-optional dependencies now required feels like it might be a semver: major
change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC @tswast said in a chat that it shouldn't be a problem for most users, and those who cannot install these dependencies for any reason can still pin to a lower (non-major version). But reading this as I write, we might actually want to re-consider this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There has been a debate on this on the issue itself. Also copying Tim's input:
The gist is that the problematic dependencies are already required transitively anyway
With pyarrow being the exception
It's probably worth the bump for pyarrow if we're being conservative about it. Esp since we do know of environments where adding it will break
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with the major version bump. I'd rather people adopt this change a little more slowly than break someone who has a working 2.x branch without pyarrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roger.
Is there perhaps anything else that we want to clean up and would be a breaking change? (possibly in a separate PR)
A major version bump is a good opportunity for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#319 is still on my mind. Perhaps easier now that proto-plus added more case options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another breaking change we might want to do: https://issuetracker.google.com/144712110 -- Better type conversions in the pandas connector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yet another one from today - maybe bumping the minimum pyarrow
version, as the older ones suffer from a bug (load data from dataframe can upload incorrect data, yikes!)
Update: Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a nasty bug. Thankfully pyarrow
didn't follow NEP-29 this time and kept support for Python 3.6 in the 4.x packages. I'd be okay with bumping the minimum for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yet another - if this gets delayed significantly (read: months), we might want to consider dropping Python 3.6 support as well. But Python 3.6 end of life will only happen in December this year.
As a follow-up to #781 I also bumped the minimum |
Failing systest is another instance of #733. |
"version or a different source format. " | ||
"See: https://github.com/googleapis/python-bigquery/issues/781" | ||
) | ||
warnings.warn(msg, category=RuntimeWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This warning was removed because the just-fixed bug will not come into play once we require pyarrow > 3.0.0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome!
FYI: I pointed this PR at the v3
branch. Should be safe to merge.
deps!: BigQuery Storage and pyarrow are required dependencies (#776) fix!: use nullable `Int64` and `boolean` dtypes in `to_dataframe` (#786) feat!: destination tables are no-longer removed by `create_job` (#891) feat!: In `to_dataframe`, use `dbdate` and `dbtime` dtypes from db-dtypes package for BigQuery DATE and TIME columns (#972) fix!: automatically convert out-of-bounds dates in `to_dataframe`, remove `date_as_object` argument (#972) feat!: mark the package as type-checked (#1058) feat!: default to DATETIME type when loading timezone-naive datetimes from Pandas (#1061) feat: add `api_method` parameter to `Client.query` to select `INSERT` or `QUERY` API (#967) fix: improve type annotations for mypy validation (#1081) feat: use `StandardSqlField` class for `Model.feature_columns` and `Model.label_columns` (#1117) docs: Add migration guide from version 2.x to 3.x (#1027) Release-As: 3.0.0
deps!: BigQuery Storage and pyarrow are required dependencies (googleapis#776) fix!: use nullable `Int64` and `boolean` dtypes in `to_dataframe` (googleapis#786) feat!: destination tables are no-longer removed by `create_job` (googleapis#891) feat!: In `to_dataframe`, use `dbdate` and `dbtime` dtypes from db-dtypes package for BigQuery DATE and TIME columns (googleapis#972) fix!: automatically convert out-of-bounds dates in `to_dataframe`, remove `date_as_object` argument (googleapis#972) feat!: mark the package as type-checked (googleapis#1058) feat!: default to DATETIME type when loading timezone-naive datetimes from Pandas (googleapis#1061) feat: add `api_method` parameter to `Client.query` to select `INSERT` or `QUERY` API (googleapis#967) fix: improve type annotations for mypy validation (googleapis#1081) feat: use `StandardSqlField` class for `Model.feature_columns` and `Model.label_columns` (googleapis#1117) docs: Add migration guide from version 2.x to 3.x (googleapis#1027) Release-As: 3.0.0
deps!: BigQuery Storage and pyarrow are required dependencies (googleapis#776) fix!: use nullable `Int64` and `boolean` dtypes in `to_dataframe` (googleapis#786) feat!: destination tables are no-longer removed by `create_job` (googleapis#891) feat!: In `to_dataframe`, use `dbdate` and `dbtime` dtypes from db-dtypes package for BigQuery DATE and TIME columns (googleapis#972) fix!: automatically convert out-of-bounds dates in `to_dataframe`, remove `date_as_object` argument (googleapis#972) feat!: mark the package as type-checked (googleapis#1058) feat!: default to DATETIME type when loading timezone-naive datetimes from Pandas (googleapis#1061) feat: add `api_method` parameter to `Client.query` to select `INSERT` or `QUERY` API (googleapis#967) fix: improve type annotations for mypy validation (googleapis#1081) feat: use `StandardSqlField` class for `Model.feature_columns` and `Model.label_columns` (googleapis#1117) docs: Add migration guide from version 2.x to 3.x (googleapis#1027) Release-As: 3.0.0
Closes #757.
This might already be a final version, but I still need to check a few additional things:
Remove logic for "ensure client"? Or maybe not, since we still allow tabledata.list?We still need that.
Remove min version check for BQ Storage, since now it is guaranteed>=2.0.0
will be installed (no morepip
quirks with extras).Done.
Remove checks for GAPIC? It's now a required dependency.Not applicable, the minimum GAPIC version was never checked at runtime.
PR checklist::