-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28432][SQL] Add make_date function
#25210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #107941 has finished for PR 25210 at commit
|
|
Thank you for working on this, @MaxGekk . cc @wangyum |
|
The R failure seems to be CRAN check failure (which is irrelevant to this PR) again, but let's rerun this to confirm. |
|
Retest this please. |
make_date function
I think it is relevant. Need to describe parameters. |
|
@felixcheung @HyukjinKwon Could you take a look at the PR, especially on R related changes, please. |
felixcheung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and add R test?
|
Test build #107943 has finished for PR 25210 at commit
|
I will add a test but R tests crash with segmentation fault on my laptop with Mac OS. |
|
Test build #107945 has finished for PR 25210 at commit
|
|
@MaxGekk . Got it. Is the following fixed at the latest commit? |
|
BTW, I'm wondering if we need to add this function for all. For hyperbolic math function, we only add them in SQL later because it's PostgreSQL compatibility effort. |
|
Test build #107946 has finished for PR 25210 at commit
|
I see. I will remove the function from Scala, Python, R API. |
|
Test build #107950 has finished for PR 25210 at commit
|
|
All test passed and only CRAN check fails. The failure is irrelevant to this PR. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
| -- !query 48 schema | ||
| struct<make_date(-44, 3, 15):date> | ||
| -- !query 48 output | ||
| 0045-03-15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The year -44 is out of valid range according to SQL standard. We are getting 45 instead of -44 while converting to java.sql.Date. If you switch to Java 8 API for date/timestamps:
scala> spark.conf.set("spark.sql.datetime.java8API.enabled", true)
scala> spark.sql("select make_date(-44, 3, 15)").collect
res7: Array[org.apache.spark.sql.Row] = Array([-0044-03-15])the returned instance of java.time.LocalDate seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to file a JIRA issue for this difference on make_date input range checking. Also, please add the JIRA id to date.sql file.
FYI, the following is PostgreSQL output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the reason for the difference here is how Spark converts internal type DateType to external one java.sql.Date (by default) or java.sql.LocalDate (when spark.sql.datetime.java8API.enabled is set to true). And how the external type converted to string. For example, to get the same format as PostgreSQL, need to provide the appropriate formatter.
For java.sql.Date:
spark.conf.set("spark.sql.datetime.java8API.enabled", false)
val date = spark.sql("select make_date(-44, 3, 15)").first.getAs[java.sql.Date](0)
val sdf = new java.text.SimpleDateFormat("MM-dd-yyyy G")
scala> sdf.format(date)
res18: String = 03-17-0045 BCFor Java8 java.time.LocalDate:
spark.conf.set("spark.sql.datetime.java8API.enabled", true)
val formatter = DateTimeFormatter.ofPattern("MM-dd-yyyy G")
val localDate = spark.sql("select make_date(-44, 3, 15)").first.getAs[LocalDate](0)
localDate.format(formatter)
res16: String = 03-15-0045 BCThe difference in days due to difference calendars (Julian in the first case, and Proleptic Gregorian in the second one).
I see Postgres formats the -44 year as 44 BC which is wrong according to ISO 8601. See https://en.wikipedia.org/wiki/Year_zero , for example:
The "basic" format for year 0 ... year 1 BC. ... hence -0001 = 2 BC.
I don't think we should implement Postgres bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, what I mean is we need a JIRA report and JIRA ID comment because we don't follow PostgreSQL like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, you can file a PostgreSQL bug and use that instead of SPARK JIRA.
In both ways, we should report the difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the JIRA ticket: https://issues.apache.org/jira/browse/SPARK-28471
|
Test build #107961 has finished for PR 25210 at commit
|
|
Test build #107967 has finished for PR 25210 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Show resolved
Hide resolved
|
cc @ueshin |
|
@MaxGekk Could you also improve the PR description with the reasons why we need this new function? |
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from my side.
ueshin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
Test build #107999 has finished for PR 25210 at commit
|
|
Test build #108003 has finished for PR 25210 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Merged to master.
Thank you so much, @MaxGekk , @felixcheung , @HyukjinKwon , @gatorsmile , @ueshin !
What changes were proposed in this pull request?
New function
make_date()takes 3 columnsyear,monthandday, and makes new column of theDATEtype. If values in the input columns arenullor out of valid ranges, the function returnsnull. Valid ranges are:year-[1, 9999]month-[1, 12]day-[1, 31]Also constructed date must be valid otherwise
make_datereturnsnull.The function is implemented similarly to
make_datein PostgreSQL: https://www.postgresql.org/docs/11/functions-datetime.html to maintain feature parity with it.Here is an example:
How was this patch tested?
Added new tests to
DateExpressionsSuite.