-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Add date_bin
function
#3015
Comments
I think following postgres is a good idea 👌 thanks @stuartcarnie |
@ovr , @andygrove any concerns or thoughts? |
Love it! |
Sounds like a great feature 👍 |
I don't know how far DataFusion likes to deviate from PostgreSQL-flavored SQL, but here's an idea for this feature. Suggestion 1Make the default value of the It seems there is no default value specified by PostgreSQL. Certainly there exists a need for The result is that 99% of queries written with this function will be shorter and less error prone. Some proposed, equivalent statements: -- as specified by PostgreSQL - origin at Unix epoch
select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', timestamp '1970-01-01 00:00:00 UTC');
-- 2022-08-03 14:45:00
-- proposed
select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST'); Suggestion 2Allow The result enables better readability. Some proposed, equivalent statements: -- as specified by PostgreSQL - origin shifts bins forward 5 minutes
select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', timestamp '1970-01-01 00:05:00 UTC');
-- 2022-08-03 14:35:00
-- proposed
select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', time '00:05:00');
select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', interval '5 minutes'); -- as specified by PostgreSQL - origin shifts bins back 5 minutes
select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', timestamp '1969-12-31 23:55:00 UTC');
-- 2022-08-03 14:40:00
-- proposed
select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', - time '00:05:00');
select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', interval '- 5 minutes'); -- as specified by PostgreSQL - origin shifts bins forward one day
select date_bin('7 days', timestamp '2022-08-03 14:49:50 PST', timestamp '1970-01-02 00:00:00 UTC');
-- 2022-07-29 00:00:00
-- proposed
select date_bin('7 days', timestamp '2022-08-03 14:49:50 PST', date '1970-01-02');
select date_bin('7 days', timestamp '2022-08-03 14:49:50 PST', interval '1 day'); |
@jacobmarble these are great suggestions.
|
This definitely seems like a worthwhile extension, but it also makes sense to not expand the scope of this PR. |
Introduction
This proposal suggests adding a new scalar function,
date_bin
, to DataFusion, for transforming timestamp values to arbitrary intervals for the purpose of grouping and aggregating time-series data.Motivation
Time-series data is typically analysed in aggregate where one axis is almost always time. DataFusion's
date_trunc
is modelled after the PostgreSQLdate_trunc
function, which allows truncating a timestamp column for the purpose of grouping, however, the intervals are limited to an enumeration, such as second, minute, hour, day, week, month, quarter and year. To address this limitation, PostgreSQL 14 introduced thedate_bin
function, which can bin or adjust the input timestamp to arbitrary intervals.Describe the solution you'd like
Add a new function,
date_bin
to DataFusion with the same semantics as the PostgreSQL function.Name:
date_bin(stride, source, origin)
Per the PostgreSQL 14 docs
Required arguments
stride
source
origin
Example Usage
Demonstrate
date_bin
1:producing the following output:
Example Usage: time offset for origin
producing the following output:
Describe alternatives you've considered
date_trunc
, as mentioned, provides limited support for binning timestamps, but there is no alternative but to provide a native function.Footnotes
DataFusion does not support "typed string" literals in a
VALUES
statement, likeVALUES ((TIMESTAMP '2021-06-10 17:05:00Z'))
, but feat: Enable typed strings expressions for VALUES clause #3018 will address that ↩The text was updated successfully, but these errors were encountered: