-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autodetector for ISO date strings #767
Conversation
✅ Deploy Preview for dlt-hub-docs canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great! I have a question to make iso_date
default detector, see my comment
dlt/common/schema/utils.py
Outdated
@@ -698,4 +698,4 @@ def standard_hints() -> Dict[TColumnHint, List[TSimpleRegex]]: | |||
|
|||
|
|||
def standard_type_detections() -> List[TTypeDetections]: | |||
return ["iso_timestamp"] | |||
return ["iso_timestamp", "iso_date"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the only thing I'd change. I'm not sure that generating dates should be the default. date-like strings are simple and common - IMO automatic conversion to date
database type is an unexpected behavior.
instead you can always use add_type_detection
- from within a dlt.source like here https://dlthub.com/docs/general-usage/schema#schema-is-modified-in-the-source-function-body
or just before loading: source().schema.add_type_detection("iso_date")
.
we'll be releasing dlt soon so if you are OK with it we can do this change here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Description
ISO 8601 defines date strings as
YYYY-MM-DD
. We already have autodetection for ISO timestamp strings and detect ISO-compliant date strings like2023-11-15
as date objects (see here). However, neither does theis_iso_timestamp
autodetection function pick date strings up as dates, nor is there a dedicatedis_iso_date
auto-detection function.As a result, sources that yield "native" date objects get loaded with data type
date
, whereas sources that yield ISO-compliant date strings get loaded with data typetext
This PR adds an
iso_date
autodetection function to the list of default type detectors. This change is fully backward compatible with existing schemas, so that we don't need to bump the schema engine version.Related Issues