Skip to content

DATE/TIMESTAMP filter expression issues #152

@gregrahn

Description

@gregrahn

spark 1.6.0, spark-redshift 0.6.0

A few different issues here, seemingly the same place in the code, so lumping them together.
https://github.com/databricks/spark-redshift/blob/master/src/main/scala/com/databricks/spark/redshift/FilterPushdown.scala#L44

DATEs not quoted
For columns of (Redshift) DATE type, when using the following syntax
t.filter("c_date = cast('2016-01-01' as DATE)")
or
t.filter("c_date = to_date('2016-01-01')")
or
t.filter("c_date < current_date()")
the filter is pushed as part of the UNLOAD command, but the date string ends up not being quoted on the SQL side as such
WHERE "c_date" = 2016-01-01
which really results in the following
WHERE "c_date" = 2014
because 2016-01-01 is interpreted as a numeric expression (2016 minus 1 minus 1), not a date string.

redshift=# select 2016-01-01;
 ?column?
----------
     2014
(1 row)

Thus, the wrong results are returned to the user (bad).

DATE strings not pushed
Interestingly enough using
t.filter($"c_date" >= "2016-01-01")
does not push any SQL filter through to the UNLOAD command which, while returns the correct result, scans much more data on the Redshift side than need be and should also be fixed.

TIMESTAMPs not quoted
If the column type is TIMESTAMP, we have the same problem as DATE when a string is passed in
t.filter($"c_timestamp" === "2016-01-01 00:00:00")
results in the following SQL
WHERE "c_timestamp" = 2016-01-01 00:00:00.0
which simply throws a Redshift SQL parse error.
Also happens when a Spark TimestampType is passed
t.filter($"c_timestamp" < current_timestamp())
which converts to
WHERE "c_timestamp" < 2016-01-14 15:03:43.928

It seems the right thing to do here for DATE/TIMESTAMP is check for both types and use the correct casting functions on the database side with the corresponding string format masks.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions