-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Google Sheet: Unformatted date column not loaded correctly, if the the first row contains no value for that column #553
Comments
@francescomucio I assume you do not have any impact on what is in the google sheets? we indeed use first row of data to infer the data types. But that is not happening via the values but via metadata. So to make it work you should just set the data types on the first row of data, even if they are empty.
datetimes are represented as integers in some super convoluted way and it is impossible to infer date type just from content if you cannot do that then indeed we need to start scanning deeper ie. obtain more rows of metadata. |
Formatting the row is for sure my suggestion, but I am not sure if this is going to work. I think it will make sense to try to get this metadata information from the first valid cell and not stopping at the first one |
dlt version
0.5.2
Source name
google_sheets
Describe the problem
Loading a Google Sheet we discovered that a date column (let's call it
column_dt
) is not loaded correctly if:If both conditions apply the first row had
column_dt
empty (as expected), but in the second row we will find1970-01-01
instead of the actual value.This is a problem because we often load sheet which we do not own.
I tried to use data_type hints, but the only thing that actually works is formatting the column in the sheet.
It happens with duckdb and PG, so I assume is db agnostic.
Expected behavior
The expected behaviour is to have the values loaded correctly.
Steps to reproduce
This is the issue and how to reproduce:
Important: The date columns are not formatted as date. If you start playing with formatting, it is possible that you won't be able to reproduce the issue. Just create a new sheet and copy there only the values.
How you are using the source?
I run this source in production.
Operating system
Linux
Runtime environment
Local
Python version
3.10
dlt destination
duckdb to test, postgres in production
Additional information
Slack thread is here.
The text was updated successfully, but these errors were encountered: