Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deps!: BigQuery Storage and pyarrow are required dependencies #776

Merged
merged 27 commits into from
Jul 27, 2021

Conversation

plamut
Copy link
Contributor

@plamut plamut commented Jul 16, 2021

Closes #757.

This might already be a final version, but I still need to check a few additional things:

  • Remove logic for "ensure client"? Or maybe not, since we still allow tabledata.list?
    We still need that.
  • Remove min version check for BQ Storage, since now it is guaranteed >=2.0.0 will be installed (no more pip quirks with extras).
    Done.
  • Remove checks for GAPIC? It's now a required dependency.
    Not applicable, the minimum GAPIC version was never checked at runtime.

PR checklist::

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Jul 16, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Jul 16, 2021
@tswast tswast changed the title process: make BigQUery Storage and pyarrow required dependencies deps: BigQuery Storage and pyarrow are required dependencies Jul 16, 2021
@plamut plamut marked this pull request as ready for review July 19, 2021 20:29
@plamut plamut requested a review from a team July 19, 2021 20:29
@plamut plamut requested a review from a team as a code owner July 19, 2021 20:29
@plamut plamut requested review from loferris and tswast and removed request for a team July 19, 2021 20:29
@plamut
Copy link
Contributor Author

plamut commented Jul 19, 2021

@tswast It might be easier to review each commit separately, I tried to make them independent from each other. Also, double check if the assumption stated in the last commit's message really holds - if not, we can revert that bit.

setup.py Outdated
"google-cloud-core >= 1.4.1, < 3.0dev",
"google-resumable-media >= 0.6.0, < 3.0dev",
"packaging >= 14.3",
"protobuf >= 3.12.0",
"pyarrow >= 1.0.0, < 5.0dev",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making these previously-optional dependencies now required feels like it might be a semver: major change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC @tswast said in a chat that it shouldn't be a problem for most users, and those who cannot install these dependencies for any reason can still pin to a lower (non-major version). But reading this as I write, we might actually want to re-consider this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There has been a debate on this on the issue itself. Also copying Tim's input:

The gist is that the problematic dependencies are already required transitively anyway
With pyarrow being the exception
It's probably worth the bump for pyarrow if we're being conservative about it. Esp since we do know of environments where adding it will break

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with the major version bump. I'd rather people adopt this change a little more slowly than break someone who has a working 2.x branch without pyarrow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger.

Is there perhaps anything else that we want to clean up and would be a breaking change? (possibly in a separate PR)
A major version bump is a good opportunity for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#319 is still on my mind. Perhaps easier now that proto-plus added more case options?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another breaking change we might want to do: https://issuetracker.google.com/144712110 -- Better type conversions in the pandas connector.

Copy link
Contributor Author

@plamut plamut Jul 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yet another one from today - maybe bumping the minimum pyarrow version, as the older ones suffer from a bug (load data from dataframe can upload incorrect data, yikes!)

Update: Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a nasty bug. Thankfully pyarrow didn't follow NEP-29 this time and kept support for Python 3.6 in the 4.x packages. I'd be okay with bumping the minimum for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yet another - if this gets delayed significantly (read: months), we might want to consider dropping Python 3.6 support as well. But Python 3.6 end of life will only happen in December this year.

setup.py Show resolved Hide resolved
@plamut
Copy link
Contributor Author

plamut commented Jul 21, 2021

As a follow-up to #781 I also bumped the minimum pyarrow version.

@tseaver
Copy link
Contributor

tseaver commented Jul 21, 2021

Failing systest is another instance of #733.

"version or a different source format. "
"See: https://github.com/googleapis/python-bigquery/issues/781"
)
warnings.warn(msg, category=RuntimeWarning)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning was removed because the just-fixed bug will not come into play once we require pyarrow > 3.0.0.

@tswast tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 22, 2021
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 22, 2021
@plamut plamut requested a review from tswast July 23, 2021 06:36
@tswast tswast changed the base branch from master to v3.x.x July 27, 2021 17:21
@tswast tswast changed the base branch from v3.x.x to v3 July 27, 2021 17:28
Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome!

FYI: I pointed this PR at the v3 branch. Should be safe to merge.

@tswast tswast merged commit 9cd7554 into googleapis:v3 Jul 27, 2021
@plamut plamut deleted the iss-757 branch July 28, 2021 07:17
tswast added a commit that referenced this pull request Mar 29, 2022
deps!: BigQuery Storage and pyarrow are required dependencies (#776)

fix!: use nullable `Int64` and `boolean` dtypes in `to_dataframe` (#786) 

feat!: destination tables are no-longer removed by `create_job` (#891)

feat!: In `to_dataframe`, use `dbdate` and `dbtime` dtypes from db-dtypes package for BigQuery DATE and TIME columns (#972)

fix!: automatically convert out-of-bounds dates in `to_dataframe`, remove `date_as_object` argument (#972)

feat!: mark the package as type-checked (#1058)

feat!: default to DATETIME type when loading timezone-naive datetimes from Pandas (#1061)

feat: add `api_method` parameter to `Client.query` to select `INSERT` or `QUERY` API (#967)

fix: improve type annotations for mypy validation (#1081)

feat: use `StandardSqlField` class for `Model.feature_columns` and `Model.label_columns` (#1117)

docs: Add migration guide from version 2.x to 3.x (#1027)

Release-As: 3.0.0
waltaskew pushed a commit to waltaskew/python-bigquery that referenced this pull request Jul 20, 2022
deps!: BigQuery Storage and pyarrow are required dependencies (googleapis#776)

fix!: use nullable `Int64` and `boolean` dtypes in `to_dataframe` (googleapis#786) 

feat!: destination tables are no-longer removed by `create_job` (googleapis#891)

feat!: In `to_dataframe`, use `dbdate` and `dbtime` dtypes from db-dtypes package for BigQuery DATE and TIME columns (googleapis#972)

fix!: automatically convert out-of-bounds dates in `to_dataframe`, remove `date_as_object` argument (googleapis#972)

feat!: mark the package as type-checked (googleapis#1058)

feat!: default to DATETIME type when loading timezone-naive datetimes from Pandas (googleapis#1061)

feat: add `api_method` parameter to `Client.query` to select `INSERT` or `QUERY` API (googleapis#967)

fix: improve type annotations for mypy validation (googleapis#1081)

feat: use `StandardSqlField` class for `Model.feature_columns` and `Model.label_columns` (googleapis#1117)

docs: Add migration guide from version 2.x to 3.x (googleapis#1027)

Release-As: 3.0.0
abdelmegahedgoogle pushed a commit to abdelmegahedgoogle/python-bigquery that referenced this pull request Apr 17, 2023
deps!: BigQuery Storage and pyarrow are required dependencies (googleapis#776)

fix!: use nullable `Int64` and `boolean` dtypes in `to_dataframe` (googleapis#786) 

feat!: destination tables are no-longer removed by `create_job` (googleapis#891)

feat!: In `to_dataframe`, use `dbdate` and `dbtime` dtypes from db-dtypes package for BigQuery DATE and TIME columns (googleapis#972)

fix!: automatically convert out-of-bounds dates in `to_dataframe`, remove `date_as_object` argument (googleapis#972)

feat!: mark the package as type-checked (googleapis#1058)

feat!: default to DATETIME type when loading timezone-naive datetimes from Pandas (googleapis#1061)

feat: add `api_method` parameter to `Client.query` to select `INSERT` or `QUERY` API (googleapis#967)

fix: improve type annotations for mypy validation (googleapis#1081)

feat: use `StandardSqlField` class for `Model.feature_columns` and `Model.label_columns` (googleapis#1117)

docs: Add migration guide from version 2.x to 3.x (googleapis#1027)

Release-As: 3.0.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. cla: yes This human has signed the Contributor License Agreement. semver: major Hint for users that this is an API breaking change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pull google-cloud-bigquery-storage and pyarrow into required dependencies
5 participants