-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28471][SQL] Replace yyyy by uuuu in date-timestamp patterns without era
#25230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #108017 has finished for PR 25230 at commit
|
|
Test build #108022 has finished for PR 25230 at commit
|
|
Test build #108037 has finished for PR 25230 at commit
|
|
jenkins, retest this, please |
|
Test build #108040 has finished for PR 25230 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had never seen this! good discussion at https://stackoverflow.com/questions/41177442/uuuu-versus-yyyy-in-datetimeformatter-formatting-pattern-codes-in-java which supports your conclusion. It really won't matter except for years before 1 AD.
How about a basic test to demonstrate the behavior difference?
@srowen I added tests for formatting of negative years. Just in case, parsing of negative years didn't work before and won't work after the changes even due to different reasons:
|
|
Test build #108080 has finished for PR 25230 at commit
|
felixcheung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a breaking change though?
@felixcheung No, negative years are out of the valid range for the
It could be considered as a fix for correctness issue. Before: written -99 year was loaded back as 100. That's incorrect. scala> Seq(java.time.LocalDate.of(-99, 1, 1)).toDF("d").write.mode("overwrite").json("neg_year2")
scala> spark.read.schema("d date").json("/Users/maxim/tmp/neg_year2").show
+----------+
| d|
+----------+
|0100-01-01|
+----------+After: scala> Seq(java.time.LocalDate.of(-99, 1, 1)).toDF("d").write.mode("overwrite").json("neg_year")
scala> spark.read.schema("d date").json("neg_year").show
+----+
| d|
+----+
|null|
+----+ |
|
jenkins, retest this, please |
|
Test build #108082 has finished for PR 25230 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only hesitation is that most users who totally undesrtand what yyyy-MM-dd means in the docs won't necessarily get uuuu-MM-dd. I could see an argument that it's not worth it. But I think it's the right thing to do for full correctness even if it should virtually never affect an actual query or data.
|
Retest this please. |
|
@MaxGekk . Does this PR cover all instances? It seems that there are some leftovers. Could you elaborate a little bit more about the criteria of replacement? |
|
Test build #108255 has finished for PR 25230 at commit
|
@dongjoon-hyun Everything with the
For find . -type f -name "*.scala" -print0|xargs -0 grep -n 'yyyy-'|grep -v -i SimpleDateFormat|grep -v '> SELECT _FUNC_'|grep -v '\*'|grep -v '//'|grep -v Suite|grep -v Benchmark|grep -v TestSimilar criteria for python and R. |
|
Please put that into the PR description, @MaxGekk ~ |
@dongjoon-hyun I have updated the description. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. This looks more intuitive.
Thank you, @MaxGekk , @srowen , @felixcheung .
Merged to master.
|
cc @gatorsmile and @cloud-fan |
|
+1 nice |
What changes were proposed in this pull request?
In the PR, I propose to use
uuuufor years instead ofyyyyin date/timestamp patterns without the era patternG(https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html). Parsing/formatting of positive years (current era) will be the same. The difference is in formatting negative years belong to previous era - BC (Before Christ).I replaced the
yyyypattern byuuuueverywhere except:SimpleDateFormatbecause it doesn't support theuuuupattern.Before the changes, the year of common era
100and the year of BC era-99, showed similarly as100. After the changes negative years will be formatted with the-sign.Before:
After:
How was this patch tested?
By existing test suites, and added tests for negative years to
DateFormatterSuiteandTimestampFormatterSuite.