Support configuring `types_mapper` in `read_gbq` #45

bnaul · 2023-05-04T13:52:27Z

googleapis/python-bigquery#1529 and googleapis/python-bigquery#1547 have recently added arguments for overriding the default type conversions performed by record_batch.to_pandas(); this allows, for example, loading string data directly into dtype string[pyarrow] (which can be quite a bit more efficient) without doing any expensive conversions after the fact.

I think we could basically just copy the implementation from the above PRs, same kwarg names and everything. Anyone see any potential issues @jrbourbeau @j-bennet @ncclementi?

The text was updated successfully, but these errors were encountered:

j-bennet · 2023-05-04T18:55:03Z

@bnaul You can already provide a custom mapper in read_dbq as part of arrow_options:

https://github.com/coiled/dask-snowflake/blob/42cb99e4e35aeb18f7ede95badba8f20a4113378/dask_snowflake/core.py#L215-L217

bnaul · 2023-05-04T20:43:12Z

Ha well this looks great, but unfortunately that's dask-snowflake and this is dask-bigquery 😅 but yeah that's basically what I had in mind! Do you think the same arrow_options approach makes sense here? As opposed to copying the many many kwargs that they expose in google.cloud.bigquery.QueryJob.to_dataframe().

j-bennet · 2023-05-04T21:41:46Z

:) :) :) scratch that, I'm contributing to dask-snowflake and dask-bigquery right now, and completely confused which one we're talking about.

Yes, I think arrow_kwargs would make sense in read_dbq. It would not take much, we would need to pass those kwargs through to the point where we call to_pandas on a pyarrow record batch:

dask-bigquery/dask_bigquery/core.py

Lines 45 to 48 in 876ddb8

    
           pyarrow.ipc.read_record_batch( 
        
               pyarrow.py_buffer(message.arrow_record_batch.serialized_record_batch), 
        
               schema, 
        
           ).to_pandas()

j-bennet changed the title ~~Support configuring types_mapper in read_gbq~~ Support configuring types_mapper in read_gbq May 4, 2023

This was referenced May 5, 2023

Add arrow_options to read_gbq #46

Merged

Add arrow_options to read_gbq #47

Closed

bnaul closed this as completed in #46 May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support configuring `types_mapper` in `read_gbq` #45

Support configuring `types_mapper` in `read_gbq` #45

bnaul commented May 4, 2023 •

edited

Loading

j-bennet commented May 4, 2023

bnaul commented May 4, 2023

j-bennet commented May 4, 2023

Support configuring types_mapper in read_gbq #45

Support configuring types_mapper in read_gbq #45

Comments

bnaul commented May 4, 2023 • edited Loading

j-bennet commented May 4, 2023

bnaul commented May 4, 2023

j-bennet commented May 4, 2023

Support configuring `types_mapper` in `read_gbq` #45

Support configuring `types_mapper` in `read_gbq` #45

bnaul commented May 4, 2023 •

edited

Loading