Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support INTERVAL data type #826

Open
4 tasks
tswast opened this issue Jul 28, 2021 · 8 comments
Open
4 tasks

Support INTERVAL data type #826

tswast opened this issue Jul 28, 2021 · 8 comments
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@tswast
Copy link
Contributor

tswast commented Jul 28, 2021

"BigQuery now supports the INTERVAL type, which represents a duration or an amount of time. This type is in Preview."

https://cloud.google.com/bigquery/docs/release-notes#July_27_2021

Name Range
INTERVAL -10000-0 -3660000 -87840000:0:0 to 10000-0 3660000 87840000:0:0

An INTERVAL object represents duration or amount of time. Interval is composed of three independent parts:

  • [sign]Y-M: Years and Months
  • [sign]D: Days
  • [sign]H:M:S.F: Hours, Minutes, Seconds and Subseconds.

Canonical format
[sign]Y-M [sign]D [sign]H:M:S[.F]
  • Y: Year
  • M: Month
  • D: Day
  • H: Hour
  • M: Minute
  • S: Second
  • [.F]: Up to six fractional digits (microsecond precision)

TODO:

Edit: Removed pandas, arrow, db-api support in favor of #836, as those implementations are currently blocked on Arrow and the BQ Storage API.

@tswast tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Jul 28, 2021
@tswast tswast self-assigned this Jul 28, 2021
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Jul 28, 2021
@tswast
Copy link
Contributor Author

tswast commented Jul 28, 2021

Full pandas / arrow support may have to wait. Getting an error from the BQ Storage API when I added

  {
    "mode": "NULLABLE",
    "name": "interval_col",
    "type": "INTERVAL"
  }

to the scalars_schema.json file, and the test value "interval_col": "P7Y11M9DT4H15M37.123456S" (ISO 8601, which is used as the canonical format for JSON)

@tswast
Copy link
Contributor Author

tswast commented Jul 29, 2021

Query parameter support doesn't appear to be implemented yet in the backend. I've filed internal issue 195050789.

@tswast
Copy link
Contributor Author

tswast commented Jul 29, 2021

insert_rows works, but only with "canonical" format. The "canonical" format is a pain to work with since I'll have to have a bunch of client-side logic to detect if years/months are different signs and recalculate. Likewise for hours/minutes/seconds/microseconds.

I've filed internal issue 195051077 to support ISO 8601 Duration.

@tswast
Copy link
Contributor Author

tswast commented Nov 23, 2021

Issue 195051077 is now resolved. Both insert_rows and query parameters are now supported (or will be once the changes roll out, hopefully in December)

@tboddyspargo
Copy link

Hi, @tswast - would you be able to summarize the status of this request? Are there outstanding blockers or is it just a question of prioritization? My team would love to be able to support INTERVAL with GBQ (using pandas-gbq or bigquery-sqlalchemy), but I'm hesitant to pursue a potentially complicated custom workaround if there might be a more official solution in the works. Thanks in advance!

@tswast
Copy link
Contributor Author

tswast commented Nov 27, 2023

@tboddyspargo Last I checked, the BQ Storage API now sends back data of this type https://arrow.apache.org/docs/python/generated/pyarrow.month_day_nano_interval.html which accounts for the calendar-based intervals that BigQuery supports.

Easiest thing for pandas-gbq and this package to do would be to wrap that type in a pandas ArrowDtype. Would love to see a PR for that at some point. Main tricky part is we need to transform the response from the REST API into that same arrow type. I believe that requires updating the type mappings here: https://github.com/googleapis/python-bigquery/blob/main/google/cloud/bigquery/_pyarrow_helpers.py and the types mappers here: https://github.com/googleapis/python-bigquery/blob/main/google/cloud/bigquery/_pandas_helpers.py

For context: pandas is making a lot of progress making the ArrowDtype type act like other timestamp/timedelta operations, but I'm not aware of any work in pandas to support month_day_nano_interval, which most closely resembles https://pandas.pydata.org/docs/reference/api/pandas.tseries.offsets.DateOffset.html

@tswast
Copy link
Contributor Author

tswast commented Nov 27, 2023

For the write side, we might need some backend changes. I'm not seeing INTERVAL listed in the parquet data types here: https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet#parquet_conversions

@kiraksi
Copy link
Contributor

kiraksi commented Feb 27, 2024

The work to be done to get closer to interval support here:

  • PR 1

  • Delete _row_iterator_to_dataframe and download_dataframe_row_iterator, as these functions will be blocking interval support

  • Instead use and refactor RowIterator.to_dataframe / to_dataframe_iterable to make it shared helper code to do the work of converting to dataframes

  • PR 2

  • Add type support by adding interval to pyarrow_helpers scalars

  • Add data support by adding parsing logic to transform BigQuery Interval data to pyarrow.month_day_nano for use in our library. Possibly will need to do an additional transformation by transforming the arrow data to pandas.ArrowDType

I plan to complete this work in 3 weeks before the end of my internship, if I am unable to then this work can be taken up by anyone else interested in adding this support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants