-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for the overriding of stringify_dict
for json export format on BaseSQLToGCSOperator
#26277
Conversation
…n BaseSQLToGCSOperator
Hey @patricker , any reason why this fix was not applied for |
Hey @Perados, I didn't do Parquet because I wasn't prepared to test it. There are a number of other issues going on with the Parquet export format, it needs quite a bit of work. As for CSV, I believe it already does this by default. I did some testing and in those tests dictionaries were being stringified by the |
Hey @patricker, may I know why we hardcode stringify_dict = False in convert_types function?
Due to this, the implementation of convert_type function in PostgresToGCSOperator, the stringify_dict default value does not take effect:
Hence the dict object won't be stringified. |
@sleepy-tiger sorry, I don't know. It was already like that when I made my changes. Please file an issue with the details of the problem you are having. |
Followup PR #26876 |
closes: #26273
This change allows you to dump
dict
type objects returned from a database to a string. Schema generation already labels them as strings (at least from Postgres).Currently JSON type columns are hard to ingest into BQ since a JSON field in a source database does not enforce a schema, and we can't reliably generate a RECORD schema for the column.
No change to default behavior, must be enabled by setting
stringify_dict=True