Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: date_bin supports MonthDayNano, microsecond and nanosecond units #5698

Merged
merged 1 commit into from
Mar 23, 2023

Conversation

stuartcarnie
Copy link
Contributor

Which issue does this PR close?

Closes #5697

Rationale for this change

Teach date_bin to support for microsecond and nanosecond precision of intervals via the MonthDayNano type. Note that months are complicated to support and will require additional work. In doing so, we can also address #5689.

What changes are included in this PR?

date_bin is now capable of parsing fractional intervals with a precision of microseconds and greater. The interval parser is also capable of interpreting the additional units microsecond, microseconds, nanosecond and nanoseconds, which should be additive.

Are these changes tested?

Yes, unit tests for parsing intervals, validating arguments to date_bin and new SQL tests in the timestamps.slt file.

Are there any user-facing changes?

Yes, users can now use additional units for intervals, such as:

select date_bin('500 microseconds', ...)

and the date_bin function is capable of binning microsecond and nanosecond precision intervals.

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels Mar 23, 2023
Comment on lines +84 to +95
Century = 0b_0000_0000_0001,
Decade = 0b_0000_0000_0010,
Year = 0b_0000_0000_0100,
Month = 0b_0000_0000_1000,
Week = 0b_0000_0001_0000,
Day = 0b_0000_0010_0000,
Hour = 0b_0000_0100_0000,
Minute = 0b_0000_1000_0000,
Second = 0b_0001_0000_0000,
Millisecond = 0b_0010_0000_0000,
Microsecond = 0b_0100_0000_0000,
Nanosecond = 0b_1000_0000_0000,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice Format 👍

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @stuartcarnie . I think this looks good

However, one thing I worry about is that @doki23 ported this code to arrow-rs in apache/arrow-rs#3762 and I think the long term plan is to switch to using arrow-rs's version

https://github.com/apache/arrow-rs/blob/526c57a0f65ee7aaa838f252f48c8179f7d9ce03/arrow-cast/src/parse.rs#L754-L790

However, given this PR adds test coverage as well I think we will be able to avoid regressions.

Thus my plan is to file a ticket in arrow-rs to track porting support for microsecond and nanosecond in arrow and then we can merge this PR

let (months, days, nanos) = IntervalMonthDayNanoType::to_parts(*v);
if months != 0 {
return Err(DataFusionError::NotImplemented(
"DATE_BIN stride does not support month intervals".to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this is because months are not a fixed number of nanoseconds, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

We'll need to round to the first of the month, and those nanosecond intervals are not regular. We'll probably be best served using the chrono package to handle month and year rounding.

@alamb
Copy link
Contributor

alamb commented Mar 23, 2023

Filed apache/arrow-rs#3916 in arrow-rs

@alamb alamb merged commit 3e24795 into apache:main Mar 23, 2023
@stuartcarnie stuartcarnie deleted the sgc/issue/date_bin_5697 branch March 23, 2023 20:05
@stuartcarnie
Copy link
Contributor Author

@alamb thank you for the review. I can help port that code over to arrow-rs if I have some spare cycles – I see that will help us with implementing ::interval casting, which is 💯 .

@alamb
Copy link
Contributor

alamb commented Mar 23, 2023

Thanks @stuartcarnie -- I actually already ported over the code in apache/arrow-rs#3916 (because I figured it would be just as fast as writing up a ticket)

@andygrove andygrove added the enhancement New feature or request label Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate enhancement New feature or request logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

date_bin doesn't support microseconds or nanoseconds
4 participants