Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autodetector for ISO date strings #767

Merged
merged 10 commits into from
Nov 17, 2023
Merged

Autodetector for ISO date strings #767

merged 10 commits into from
Nov 17, 2023

Conversation

codingcyclist
Copy link
Collaborator

Description

ISO 8601 defines date strings as YYYY-MM-DD. We already have autodetection for ISO timestamp strings and detect ISO-compliant date strings like 2023-11-15 as date objects (see here). However, neither does the is_iso_timestamp autodetection function pick date strings up as dates, nor is there a dedicated is_iso_date auto-detection function.

As a result, sources that yield "native" date objects get loaded with data type date, whereas sources that yield ISO-compliant date strings get loaded with data type text

This PR adds an iso_date autodetection function to the list of default type detectors. This change is fully backward compatible with existing schemas, so that we don't need to bump the schema engine version.

Related Issues

Copy link

netlify bot commented Nov 15, 2023

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit 91b7293
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/65579dd252c4ed00084d4d0d

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! I have a question to make iso_date default detector, see my comment

@@ -698,4 +698,4 @@ def standard_hints() -> Dict[TColumnHint, List[TSimpleRegex]]:


def standard_type_detections() -> List[TTypeDetections]:
return ["iso_timestamp"]
return ["iso_timestamp", "iso_date"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the only thing I'd change. I'm not sure that generating dates should be the default. date-like strings are simple and common - IMO automatic conversion to date database type is an unexpected behavior.

instead you can always use add_type_detection - from within a dlt.source like here https://dlthub.com/docs/general-usage/schema#schema-is-modified-in-the-source-function-body

or just before loading: source().schema.add_type_detection("iso_date").

we'll be releasing dlt soon so if you are OK with it we can do this change here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rudolfix rudolfix merged commit e970741 into devel Nov 17, 2023
38 of 40 checks passed
@rudolfix rudolfix deleted the 766-iso-date-autodetector branch November 18, 2023 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ISO date strings don't get loaded with 'date' data type
2 participants