-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26593][SQL] Use Proleptic Gregorian calendar in casting UTF8String to Date/TimestampType #23512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #101034 has finished for PR 23512 at commit
|
|
Test build #101073 has finished for PR 23512 at commit
|
…p-parsing # Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateFormatter.scala # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
|
Test build #101097 has finished for PR 23512 at commit
|
|
jenkins, retest this, please |
|
Test build #101108 has finished for PR 23512 at commit
|
|
jenkins, retest this, please |
|
Test build #101122 has finished for PR 23512 at commit
|
|
Test build #101130 has finished for PR 23512 at commit
|
|
@srowen @cloud-fan Please, review the PR. |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you know the change much better than I, and so I defer to your judgment. The actual change looks reasonable and consistent with the others in this effort. (One tiny suggestion, not important.) Are there any further compatibility issues introduced by this change that need a note in the release notes? We can change behavior if needed, and certainly if it's fixing problematic behavior.
|
|
||
| import java.sql.{Date, Timestamp} | ||
| import java.text.{DateFormat, SimpleDateFormat} | ||
| import java.time._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there aren't like 10 classes to import, I'd make them explicit. At least, the following line is currently redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yah, IntelliJ IDEA collapses the imports automatically. I will try to revert them back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DateTimeUtils uses many classes from java.time. I will keep import java.time._ if you don't mind.
Besides of switching from the hybrid calendar (Julian+Gregorian) to Proleptic Gregorian calendar used by java.time classes, the PR shouldn't introduce any behaviour change. At least our tests did show that. I will add a note to the migration guide. |
|
|
||
| checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 13:10:15").getTime))), 288) | ||
| checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-04 13:10:15").getTime))), 277) | ||
| checkEvaluation(DayOfYear(Cast(Literal("1582-10-15 13:10:15"), DateType)), 288) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changed to avoid using of SimpleDateFormat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove sdf from this test suite then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it from the suite
|
|
||
| // boundary cases | ||
| checkHiveHashForDateType("0000-01-01", -719530) | ||
| checkHiveHashForDateType("0000-01-01", -719528) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hash was changed because number of days since epoch is different for the old date.
| Seq( | ||
| Row(1, Timestamp.valueOf("2016-01-01 10:11:12.123456")), | ||
| Row(2, null), | ||
| Row(3, Timestamp.valueOf("1965-01-01 10:11:12.123456")))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Timestamp to hold number of seconds from epoch is ok but any other methods that involve the hybrid calendar can cause calendar related issue. I changed it to cast since it is ported on new Java 8 API (java.time) in the PR.
|
jenkins, retest this, please |
|
Test build #101132 has finished for PR 23512 at commit
|
|
Test build #101140 has finished for PR 23512 at commit
|
|
Test build #101142 has finished for PR 23512 at commit
|
…p-parsing # Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
|
Test build #101208 has finished for PR 23512 at commit
|
|
LGTM with only one comment: #23512 (comment) |
|
Test build #101326 has finished for PR 23512 at commit
|
|
jenkins, retest this, please |
|
Test build #101328 has finished for PR 23512 at commit
|
|
jenkins, retest this, please |
|
It seems test failures are not related to this changes. |
|
Test build #101333 has finished for PR 23512 at commit
|
|
Merging to master. Thanks. |
…ower/upper bounds ## What changes were proposed in this pull request? In the PR, I propose using of the `stringToDate` and `stringToTimestamp` methods in parsing JDBC lower/upper bounds of the partition column if it has `DateType` or `TimestampType`. Since those methods have been ported on Proleptic Gregorian calendar by #23512, the PR switches parsing of JDBC bounds of the partition column on the calendar as well. ## How was this patch tested? This was tested by `JDBCSuite`. Closes #23597 from MaxGekk/jdbc-parse-timestamp-bounds. Lead-authored-by: Maxim Gekk <maxim.gekk@databricks.com> Co-authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…ring to Date/TimestampType ## What changes were proposed in this pull request? In the PR, I propose to use *java.time* classes in `stringToDate` and `stringToTimestamp`. This switches the methods from the hybrid calendar (Gregorian+Julian) to Proleptic Gregorian calendar. And it should make the casting consistent to other Spark classes that converts textual representation of dates/timestamps to `DateType`/`TimestampType`. ## How was this patch tested? The changes were tested by existing suites - `HashExpressionsSuite`, `CastSuite` and `DateTimeUtilsSuite`. Closes apache#23512 from MaxGekk/utf8string-timestamp-parsing. Lead-authored-by: Maxim Gekk <maxim.gekk@databricks.com> Co-authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>
…ower/upper bounds ## What changes were proposed in this pull request? In the PR, I propose using of the `stringToDate` and `stringToTimestamp` methods in parsing JDBC lower/upper bounds of the partition column if it has `DateType` or `TimestampType`. Since those methods have been ported on Proleptic Gregorian calendar by apache#23512, the PR switches parsing of JDBC bounds of the partition column on the calendar as well. ## How was this patch tested? This was tested by `JDBCSuite`. Closes apache#23597 from MaxGekk/jdbc-parse-timestamp-bounds. Lead-authored-by: Maxim Gekk <maxim.gekk@databricks.com> Co-authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
In the PR, I propose to use java.time classes in
stringToDateandstringToTimestamp. This switches the methods from the hybrid calendar (Gregorian+Julian) to Proleptic Gregorian calendar. And it should make the casting consistent to other Spark classes that converts textual representation of dates/timestamps toDateType/TimestampType.How was this patch tested?
The changes were tested by existing suites -
HashExpressionsSuite,CastSuiteandDateTimeUtilsSuite.