DATE_TRUNC('day', date_day) not supported #49

BigBerny · 2024-07-13T18:49:15Z

What happens?

When running this query I get an error:

SELECT DATE_TRUNC('day', date_day), AVG("queryLatency")
FROM predictions
GROUP BY DATE_TRUNC('day', date_day)

Error:
Query 1 ERROR at Line 1: : ERROR: Column date_trunc has Arrow data type Date32 but is mapped to the BuiltIn(TIMESTAMPOID) type in Postgres, which are incompatible. If you believe this conversion should be supported, please submit a request at https://github.com/paradedb/paradedb/issues.

Since DATE_TRUNC() is quite fundamental for many analysis, for whenever you want to group by day, week, month or year, it would be awesome if this could be fixed somehow.
Tools like Metabase often offer users a way to choose granularity of an analysis

To Reproduce

Do a GROUP BY on DATE_TRUNC(). It happens with date and timestamp columns and also for 'week' etc.

OS:

Ubuntu(?) with PostgreSQL 16.3

ParadeDB Version:

0.8.3

Full Name:

Janis

Affiliation:

Typewise

What is the latest build you tested with? If possible, we recommend testing by compiling the latest `dev` branch.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include the code required to reproduce the issue?

Yes, I have

Did you include all relevant configurations (e.g., CPU architecture, PostgreSQL version, Linux distribution) to reproduce the issue?

Yes, I have

The text was updated successfully, but these errors were encountered:

philippemnoel · 2024-07-19T12:16:37Z

Thank you for reporting, we'll get this fixed ASAP.

Weijun-H · 2024-08-06T08:27:13Z

Hi @BigBerny , could you provide more details for the reproduction? I am diving into this case but no error found.

pg_lakehouse=# CREATE TEMP TABLE predictions (
  date_day DATE,
  queryLatency INT
);
CREATE TABLE

pg_lakehouse=# INSERT INTO predictions (date_day, queryLatency) VALUES
  ('2023-03-01', 100),
  ('2023-03-01', 150),
  ('2023-03-01', 200),
  ('2023-03-02', 50),
  ('2023-03-02', 100),
  ('2023-03-03', 250),
  ('2023-03-03', 300);
INSERT 0 7

pg_lakehouse=# SELECT DATE_TRUNC('day', date_day), AVG(queryLatency) 
FROM predictions 
GROUP BY DATE_TRUNC('day', date_day);
       date_trunc       |         avg          
------------------------+----------------------
 2023-03-03 00:00:00+08 | 275.0000000000000000
 2023-03-01 00:00:00+08 | 150.0000000000000000
 2023-03-02 00:00:00+08 |  75.0000000000000000
(3 rows)

evanxg852000 · 2024-08-06T11:55:33Z

@Weijun-H Unless I missed something, the temp table is directly sitting in PostgreSQL instead of being in pg_analytic's DuckDB. I used your example to create a parquet file, loaded it, and got the same error as reported.

-- LOAD PARQUET
CREATE FOREIGN DATA WRAPPER parquet_wrapper
HANDLER parquet_fdw_handler
VALIDATOR parquet_fdw_validator;

CREATE SERVER parquet_server
FOREIGN DATA WRAPPER parquet_wrapper;

CREATE FOREIGN TABLE latencies ()
SERVER parquet_server
OPTIONS (files '/datasets/latencies.parquet');

SELECT DATE_TRUNC('day', date_day), AVG(query_latency) 
FROM latencies 
GROUP BY DATE_TRUNC('day', date_day);

BigBerny · 2024-08-06T13:25:41Z

I used parquet files read by pg_lakehouse/duckdb as said by @evanxg852000

philippemnoel · 2024-08-12T18:05:51Z

@evanxg852000 I've reverted this PR as the tests were not passing. We simply had not noticed that the CI was not running

philippemnoel added the bug Something isn't working label Jul 13, 2024

evanxg852000 self-assigned this Aug 6, 2024

joinbase0 mentioned this issue Aug 7, 2024

fix: support DATE_TRUNC('day', date_day) for Date32 Arrow column paradedb/paradedb#1478

Closed

evanxg852000 mentioned this issue Aug 7, 2024

fix: Fixes datetime functions support #34

Merged

philippemnoel added the good first issue Good for newcomers label Aug 7, 2024

philippemnoel transferred this issue from paradedb/paradedb Aug 8, 2024

philippemnoel added the priority-medium Medium priority issue label Aug 8, 2024

philippemnoel closed this as completed in #34 Aug 10, 2024

philippemnoel reopened this Aug 12, 2024

philippemnoel mentioned this issue Aug 23, 2024

fix: Support date_trunc #95

Merged

philippemnoel unassigned evanxg852000 Aug 23, 2024

philippemnoel added priority-high High priority issue and removed priority-medium Medium priority issue labels Aug 23, 2024

rebasedming closed this as completed in #95 Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DATE_TRUNC('day', date_day) not supported #49

DATE_TRUNC('day', date_day) not supported #49

BigBerny commented Jul 13, 2024 •

edited

Loading

philippemnoel commented Jul 19, 2024

Weijun-H commented Aug 6, 2024

evanxg852000 commented Aug 6, 2024

BigBerny commented Aug 6, 2024

philippemnoel commented Aug 12, 2024

DATE_TRUNC('day', date_day) not supported #49

DATE_TRUNC('day', date_day) not supported #49

Comments

BigBerny commented Jul 13, 2024 • edited Loading

What happens?

To Reproduce

OS:

ParadeDB Version:

Full Name:

Affiliation:

What is the latest build you tested with? If possible, we recommend testing by compiling the latest dev branch.

Did you include all relevant data sets for reproducing the issue?

Did you include the code required to reproduce the issue?

Did you include all relevant configurations (e.g., CPU architecture, PostgreSQL version, Linux distribution) to reproduce the issue?

philippemnoel commented Jul 19, 2024

Weijun-H commented Aug 6, 2024

evanxg852000 commented Aug 6, 2024

BigBerny commented Aug 6, 2024

philippemnoel commented Aug 12, 2024

BigBerny commented Jul 13, 2024 •

edited

Loading

What is the latest build you tested with? If possible, we recommend testing by compiling the latest `dev` branch.