-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigQuery: Deprecate automatic schema conversion in load_table_from_dataframe
#9042
Comments
#9049 is pending (needs unit test updates), which will make me slightly more comfortable with this deprecation. My current thought for the "load dataframe" algorithm is:
|
Added #5572 (comment) as another blocker. If we don't provide a way to explicitly serializing index(es), we actually lose the ability to write indexes outside the deprecated path. |
Thought: Do we want to 100% deprecate the |
Hmm ... I would still find it useful to be warned about any potential problems with schema detection, even I explicitly instruct the method to detect it for me. Suppressing the warning might make me think that everything went well behind the scenes. How about having an |
In a lot of cases, My thought with #9042 (comment) is that there are still going to be times when you have an arbitrary DataFrame and can't exactly guess the schema and want to explicitly opt-in to whatever pandas is doing to convert DataFrames to Parquet files, even if we know it might not always pick the type we want (such as confusing nullable integer columns and float columns). |
I am not that familiar with what a "typical" use case is, but the scenario described above sounds plausible. And we can still mention the potential misdetection of schema in the "opt-in"' parameter's docstring, and that warnings will not be issued. Users will then know that they are explicitly turning off this aspect. |
After discussion around #9024, I'm coming to realize that there are a lot of inconsistencies with the pandas DataFrame serialization when we have to autodetect the schema. I propose that we warn when we are given a DataFrame but can't determine the correct schema.
I realize this will be a step backwards in terms of usability. I think the following feature requests are needed to be prioritized if we proceed with this deprecation:
load_table_from_dataframe
#8140 Allow partial schemas, so that someone can just override certainobject
dtype columns, for example.load_table_from_dataframe
#8142 Get table schema if not supplied. This covers the case when appending rows to an existing table.load_table_from_dataframe
automatically generate schema for known dtypes #9044 Generate a (partial) schema for known dtypes, and merge with the provided schema.job_config.schema
.The text was updated successfully, but these errors were encountered: