-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: date_bin
supports MonthDayNano, microsecond and nanosecond units
#5698
Conversation
Century = 0b_0000_0000_0001, | ||
Decade = 0b_0000_0000_0010, | ||
Year = 0b_0000_0000_0100, | ||
Month = 0b_0000_0000_1000, | ||
Week = 0b_0000_0001_0000, | ||
Day = 0b_0000_0010_0000, | ||
Hour = 0b_0000_0100_0000, | ||
Minute = 0b_0000_1000_0000, | ||
Second = 0b_0001_0000_0000, | ||
Millisecond = 0b_0010_0000_0000, | ||
Microsecond = 0b_0100_0000_0000, | ||
Nanosecond = 0b_1000_0000_0000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice Format 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @stuartcarnie . I think this looks good
However, one thing I worry about is that @doki23 ported this code to arrow-rs in apache/arrow-rs#3762 and I think the long term plan is to switch to using arrow-rs's version
However, given this PR adds test coverage as well I think we will be able to avoid regressions.
Thus my plan is to file a ticket in arrow-rs to track porting support for microsecond and nanosecond in arrow and then we can merge this PR
let (months, days, nanos) = IntervalMonthDayNanoType::to_parts(*v); | ||
if months != 0 { | ||
return Err(DataFusionError::NotImplemented( | ||
"DATE_BIN stride does not support month intervals".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 this is because months are not a fixed number of nanoseconds, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
We'll need to round to the first of the month, and those nanosecond intervals are not regular. We'll probably be best served using the chrono
package to handle month and year rounding.
Filed apache/arrow-rs#3916 in arrow-rs |
@alamb thank you for the review. I can help port that code over to |
Thanks @stuartcarnie -- I actually already ported over the code in apache/arrow-rs#3916 (because I figured it would be just as fast as writing up a ticket) |
Which issue does this PR close?
Closes #5697
Rationale for this change
Teach
date_bin
to support for microsecond and nanosecond precision of intervals via theMonthDayNano
type. Note that months are complicated to support and will require additional work. In doing so, we can also address #5689.What changes are included in this PR?
date_bin
is now capable of parsing fractional intervals with a precision of microseconds and greater. The interval parser is also capable of interpreting the additional unitsmicrosecond
,microseconds
,nanosecond
andnanoseconds
, which should be additive.Are these changes tested?
Yes, unit tests for parsing intervals, validating arguments to
date_bin
and new SQL tests in thetimestamps.slt
file.Are there any user-facing changes?
Yes, users can now use additional units for intervals, such as:
and the
date_bin
function is capable of binning microsecond and nanosecond precision intervals.