Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jan 10, 2019

What changes were proposed in this pull request?

In the PR, I propose to use java.time classes in stringToDate and stringToTimestamp. This switches the methods from the hybrid calendar (Gregorian+Julian) to Proleptic Gregorian calendar. And it should make the casting consistent to other Spark classes that converts textual representation of dates/timestamps to DateType/TimestampType.

How was this patch tested?

The changes were tested by existing suites - HashExpressionsSuite, CastSuite and DateTimeUtilsSuite.

@SparkQA
Copy link

SparkQA commented Jan 10, 2019

Test build #101034 has finished for PR 23512 at commit 2c811af.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 11, 2019

Test build #101073 has finished for PR 23512 at commit b9a52a8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…p-parsing

# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateFormatter.scala
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
@SparkQA
Copy link

SparkQA commented Jan 11, 2019

Test build #101097 has finished for PR 23512 at commit c2bc4b0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 11, 2019

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Jan 12, 2019

Test build #101108 has finished for PR 23512 at commit c2bc4b0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 12, 2019

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Jan 12, 2019

Test build #101122 has finished for PR 23512 at commit c2bc4b0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 12, 2019

Test build #101130 has finished for PR 23512 at commit 0f7ddd2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 12, 2019

@srowen @cloud-fan Please, review the PR.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you know the change much better than I, and so I defer to your judgment. The actual change looks reasonable and consistent with the others in this effort. (One tiny suggestion, not important.) Are there any further compatibility issues introduced by this change that need a note in the release notes? We can change behavior if needed, and certainly if it's fixing problematic behavior.


import java.sql.{Date, Timestamp}
import java.text.{DateFormat, SimpleDateFormat}
import java.time._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there aren't like 10 classes to import, I'd make them explicit. At least, the following line is currently redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah, IntelliJ IDEA collapses the imports automatically. I will try to revert them back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DateTimeUtils uses many classes from java.time. I will keep import java.time._ if you don't mind.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 12, 2019

Are there any further compatibility issues introduced by this change that need a note in the release notes?

Besides of switching from the hybrid calendar (Julian+Gregorian) to Proleptic Gregorian calendar used by java.time classes, the PR shouldn't introduce any behaviour change. At least our tests did show that.

I will add a note to the migration guide.


checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 13:10:15").getTime))), 288)
checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-04 13:10:15").getTime))), 277)
checkEvaluation(DayOfYear(Cast(Literal("1582-10-15 13:10:15"), DateType)), 288)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changed to avoid using of SimpleDateFormat

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove sdf from this test suite then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it from the suite


// boundary cases
checkHiveHashForDateType("0000-01-01", -719530)
checkHiveHashForDateType("0000-01-01", -719528)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hash was changed because number of days since epoch is different for the old date.

Seq(
Row(1, Timestamp.valueOf("2016-01-01 10:11:12.123456")),
Row(2, null),
Row(3, Timestamp.valueOf("1965-01-01 10:11:12.123456"))))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Timestamp to hold number of seconds from epoch is ok but any other methods that involve the hybrid calendar can cause calendar related issue. I changed it to cast since it is ported on new Java 8 API (java.time) in the PR.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 12, 2019

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Jan 13, 2019

Test build #101132 has finished for PR 23512 at commit 0f7ddd2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 13, 2019

Test build #101140 has finished for PR 23512 at commit 61ad31a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 13, 2019

Test build #101142 has finished for PR 23512 at commit 0a89446.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…p-parsing

# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
@SparkQA
Copy link

SparkQA commented Jan 15, 2019

Test build #101208 has finished for PR 23512 at commit f02c85c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

cloud-fan commented Jan 15, 2019

LGTM with only one comment: #23512 (comment)

@SparkQA
Copy link

SparkQA commented Jan 16, 2019

Test build #101326 has finished for PR 23512 at commit 1af865f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 16, 2019

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Jan 16, 2019

Test build #101328 has finished for PR 23512 at commit 1af865f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 16, 2019

jenkins, retest this, please

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 16, 2019

It seems test failures are not related to this changes.

@SparkQA
Copy link

SparkQA commented Jan 17, 2019

Test build #101333 has finished for PR 23512 at commit 1af865f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

Merging to master. Thanks.

@asfgit asfgit closed this in 6f8c0e5 Jan 17, 2019
cloud-fan pushed a commit that referenced this pull request Jan 23, 2019
…ower/upper bounds

## What changes were proposed in this pull request?

In the PR, I propose using of the `stringToDate` and `stringToTimestamp` methods in parsing JDBC lower/upper bounds of the partition column if it has `DateType` or `TimestampType`. Since those methods have been ported on Proleptic Gregorian calendar by #23512, the PR switches parsing of JDBC bounds of the partition column on the calendar as well.

## How was this patch tested?

This was tested by `JDBCSuite`.

Closes #23597 from MaxGekk/jdbc-parse-timestamp-bounds.

Lead-authored-by: Maxim Gekk <maxim.gekk@databricks.com>
Co-authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…ring to Date/TimestampType

## What changes were proposed in this pull request?

In the PR, I propose to use *java.time* classes in `stringToDate` and `stringToTimestamp`. This switches the methods from the hybrid calendar (Gregorian+Julian) to Proleptic Gregorian calendar. And it should make the casting consistent to other Spark classes that converts textual representation of dates/timestamps to `DateType`/`TimestampType`.

## How was this patch tested?

The changes were tested by existing suites - `HashExpressionsSuite`, `CastSuite` and `DateTimeUtilsSuite`.

Closes apache#23512 from MaxGekk/utf8string-timestamp-parsing.

Lead-authored-by: Maxim Gekk <maxim.gekk@databricks.com>
Co-authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…ower/upper bounds

## What changes were proposed in this pull request?

In the PR, I propose using of the `stringToDate` and `stringToTimestamp` methods in parsing JDBC lower/upper bounds of the partition column if it has `DateType` or `TimestampType`. Since those methods have been ported on Proleptic Gregorian calendar by apache#23512, the PR switches parsing of JDBC bounds of the partition column on the calendar as well.

## How was this patch tested?

This was tested by `JDBCSuite`.

Closes apache#23597 from MaxGekk/jdbc-parse-timestamp-bounds.

Lead-authored-by: Maxim Gekk <maxim.gekk@databricks.com>
Co-authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@MaxGekk MaxGekk deleted the utf8string-timestamp-parsing branch August 17, 2019 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants