Skip to content

to_gbq: Allow creation of new tables from DataFrame (and generate schema) #8325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jtratner opened this issue Sep 19, 2014 · 5 comments · Fixed by #10857
Closed

to_gbq: Allow creation of new tables from DataFrame (and generate schema) #8325

jtratner opened this issue Sep 19, 2014 · 5 comments · Fixed by #10857
Milestone

Comments

@jtratner
Copy link
Contributor

Small extension on top the to_gbq so that you can actually create new tables given only an existing dataframe. Given an arbitrary DataFrame with a non hierarchical-index, create a schema from it. For now, we'd likely assume that object dtype columns are string and maybe allow for specifying some or all columns for the schema so that int columns with nulls come out correctly (otherwise, they'd be coerced to float columns b/c of nan stuff).

E.g.:

In [6]: import pandas as pd

In [7]: import pandas.util.testing as testing

In [8]: df = testing.makeMixedDataFrame()

In [9]: df
Out[9]:
   A  B     C          D
0  0  0  foo1 2009-01-01
1  1  1  foo2 2009-01-02
2  2  0  foo3 2009-01-05
3  3  1  foo4 2009-01-06
4  4  0  foo5 2009-01-07

In [10]: df.dtypes
Out[10]:
A           float64
B           float64
C            object
D    datetime64[ns]
dtype: object

Then you could do something like:

In [11]: generate_bq_schema(df)
Out[11]:
{'fields': [{'name': 'A', 'type': 'FLOAT'},
  {'name': 'B', 'type': 'FLOAT'},
  {'name': 'C', 'type': 'STRING'},
  {'name': 'D', 'type': 'TIMESTAMP'}]}

and with a named index, that could be added to the schema as well. For now, we could stick to requiring non-hierarchical/MultiIndex, but maybe we could use record types for an index that's MultiIndex in the future?

@jtratner
Copy link
Contributor Author

cc @jacobschaer - I think you're the main one to ask on this?

@jtratner
Copy link
Contributor Author

@jacobschaer for context - there's a Bloomberg Hackathon that's happening next Saturday and I'm thinking this could be a good project for someone who uses BigQuery and/or Pandas

@jacobschaer
Copy link
Contributor

Sounds like something someone asked a while ago on Stack. See

http://stackoverflow.com/questions/21886742/convert-pandas-dtypes-to-bigquery-type-representation

@jtratner
Copy link
Contributor Author

yeah, it's certainly not that complicated, just would make it easier for
people to write to slap-dash write to bigquery. That SO post is probably
80% of the way there too.

On Fri, Sep 19, 2014 at 3:46 PM, Jacob Schaer notifications@github.com
wrote:

Sounds like something someone asked a while ago on Stack. See

http://stackoverflow.com/questions/21886742/convert-pandas-dtypes-to-bigquery-type-representation


Reply to this email directly or view it on GitHub
#8325 (comment).

@ghost
Copy link

ghost commented Aug 17, 2015

I'm currently using pandas for a project I'm working on and would really like to see a new feature that allows users to create new tables in google big query using to_gbq. I notice that the ability to create tables from schema was removed in #6937.

I would like to try and develop this feature if no one else is working on it.

jreback added a commit that referenced this issue Sep 13, 2015
ENH: #8325 Add ability to create tables using the gbq module.
yarikoptic added a commit to neurodebian/pandas that referenced this issue Sep 16, 2015
* commit 'v0.17.0rc1-40-gd1feb49': (394 commits)
  DOC: fix ref to template for plot accessor
  ENH Move check for inferred compression to before `get_filepath_or_buffer`
  CI: add py3.5 build
  ENH Enable streaming from S3
  Fix Series.nunique groupby with object
  DOC: Update perf doc for 10953
  TST: Fix skipped unit tests in test_ga. Install python-gflags using pip. pandas-dev#11090
  ENH Recognize 's3n' and 's3a' as an S3 address
  DOC: Comparison with SAS
  BUG: Use StrictVersion instead of LooseVersion when testing for minimum google api client version pandas-dev#10652
  BLD: Install google-api-python-client and httplib2 using pip
  ENH: Add ability to create tables using the gbq module. pandas-dev#8325
  TST: make sure to close stata readers
  asv bench cleanup - groupby
  DOC: fix plot submethods whatsnew example
  CI: support *.pip for installations
  DOC: Modified incorrect doc-string for DataFrameFormatter and removed outdated doc-string (+1 squashed commit) Squashed commits: [068b1fd] DOC: Modified incorrect doc-string for DataFrameFormatter using new doc-string design  (+1 squashed commit) Squashed commits: [12e032d] DOC: Updated doc-string using new doc-string design for DataFrameFormatter
  ENH Enable bzip2 streaming for Python 3
  DOC: update release.rst with the highlites
  DOC: Categorize whatsnew
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants