Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jul 20, 2019

What changes were proposed in this pull request?

New function make_date() takes 3 columns year, month and day, and makes new column of the DATE type. If values in the input columns are null or out of valid ranges, the function returns null. Valid ranges are:

  • year - [1, 9999]
  • month - [1, 12]
  • day - [1, 31]

Also constructed date must be valid otherwise make_date returns null.

The function is implemented similarly to make_date in PostgreSQL: https://www.postgresql.org/docs/11/functions-datetime.html to maintain feature parity with it.

Here is an example:

select make_date(2013, 7, 15);
2013-07-15

How was this patch tested?

Added new tests to DateExpressionsSuite.

@SparkQA
Copy link

SparkQA commented Jul 20, 2019

Test build #107941 has finished for PR 25210 at commit fea2621.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Thank you for working on this, @MaxGekk .
Since this is one function of SPARK-28432, I split SPARK-28432 into two JIRAs by cloning.
This one is for you and make_date. The new one is for make_timestamp.

cc @wangyum

@dongjoon-hyun
Copy link
Member

The R failure seems to be CRAN check failure (which is irrelevant to this PR) again, but let's rerun this to confirm.

@dongjoon-hyun
Copy link
Member

Retest this please.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28432][SQL] New function - make_date [SPARK-28432][SQL] Add make_date function Jul 20, 2019
@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 20, 2019

The R failure seems to be CRAN check failure (which is irrelevant to this PR) again ...

I think it is relevant. Need to describe parameters.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 20, 2019

@felixcheung @HyukjinKwon Could you take a look at the PR, especially on R related changes, please.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and add R test?

@SparkQA
Copy link

SparkQA commented Jul 20, 2019

Test build #107943 has finished for PR 25210 at commit fea2621.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 20, 2019

and add R test?

I will add a test but R tests crash with segmentation fault on my laptop with Mac OS.

@SparkQA
Copy link

SparkQA commented Jul 20, 2019

Test build #107945 has finished for PR 25210 at commit a57754c.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

@MaxGekk . Got it. Is the following fixed at the latest commit?

* checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : 
  dims [product 24] do not match the length of object [0]

@dongjoon-hyun
Copy link
Member

BTW, I'm wondering if we need to add this function for all. For hyperbolic math function, we only add them in SQL later because it's PostgreSQL compatibility effort.

@SparkQA
Copy link

SparkQA commented Jul 21, 2019

Test build #107946 has finished for PR 25210 at commit 318cc18.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 21, 2019

BTW, I'm wondering if we need to add this function for all.

I see. I will remove the function from Scala, Python, R API.

@SparkQA
Copy link

SparkQA commented Jul 21, 2019

Test build #107950 has finished for PR 25210 at commit a5ef08b.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 21, 2019

All test passed and only CRAN check fails. The failure is irrelevant to this PR.

* checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : 
  dims [product 24] do not match the length of object [0]
Execution halted

-- !query 48 schema
struct<make_date(-44, 3, 15):date>
-- !query 48 output
0045-03-15
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The year -44 is out of valid range according to SQL standard. We are getting 45 instead of -44 while converting to java.sql.Date. If you switch to Java 8 API for date/timestamps:

scala> spark.conf.set("spark.sql.datetime.java8API.enabled", true)

scala> spark.sql("select make_date(-44, 3, 15)").collect
res7: Array[org.apache.spark.sql.Row] = Array([-0044-03-15])

the returned instance of java.time.LocalDate seems reasonable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to file a JIRA issue for this difference on make_date input range checking. Also, please add the JIRA id to date.sql file.

FYI, the following is PostgreSQL output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the reason for the difference here is how Spark converts internal type DateType to external one java.sql.Date (by default) or java.sql.LocalDate (when spark.sql.datetime.java8API.enabled is set to true). And how the external type converted to string. For example, to get the same format as PostgreSQL, need to provide the appropriate formatter.

For java.sql.Date:

spark.conf.set("spark.sql.datetime.java8API.enabled", false)
val date = spark.sql("select make_date(-44, 3, 15)").first.getAs[java.sql.Date](0)
val sdf = new java.text.SimpleDateFormat("MM-dd-yyyy G")
scala> sdf.format(date)
res18: String = 03-17-0045 BC

For Java8 java.time.LocalDate:

spark.conf.set("spark.sql.datetime.java8API.enabled", true)
val formatter = DateTimeFormatter.ofPattern("MM-dd-yyyy G")
val localDate = spark.sql("select make_date(-44, 3, 15)").first.getAs[LocalDate](0)
localDate.format(formatter)
res16: String = 03-15-0045 BC

The difference in days due to difference calendars (Julian in the first case, and Proleptic Gregorian in the second one).

I see Postgres formats the -44 year as 44 BC which is wrong according to ISO 8601. See https://en.wikipedia.org/wiki/Year_zero , for example:

The "basic" format for year 0 ... year 1 BC. ... hence -0001 = 2 BC.

I don't think we should implement Postgres bugs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, what I mean is we need a JIRA report and JIRA ID comment because we don't follow PostgreSQL like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, you can file a PostgreSQL bug and use that instead of SPARK JIRA.
In both ways, we should report the difference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented Jul 21, 2019

Test build #107961 has finished for PR 25210 at commit 30e62c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 21, 2019

Test build #107967 has finished for PR 25210 at commit 646456b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

cc @ueshin

@gatorsmile
Copy link
Member

@MaxGekk Could you also improve the PR description with the reasons why we need this new function?

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my side.

Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@SparkQA
Copy link

SparkQA commented Jul 22, 2019

Test build #107999 has finished for PR 25210 at commit 639c6b0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 22, 2019

Test build #108003 has finished for PR 25210 at commit 15c64d2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master.
Thank you so much, @MaxGekk , @felixcheung , @HyukjinKwon , @gatorsmile , @ueshin !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants