Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jul 22, 2019

What changes were proposed in this pull request?

In the PR, I propose to use uuuu for years instead of yyyy in date/timestamp patterns without the era pattern G (https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html). Parsing/formatting of positive years (current era) will be the same. The difference is in formatting negative years belong to previous era - BC (Before Christ).

I replaced the yyyy pattern by uuuu everywhere except:

  1. Test, Suite & Benchmark. Existing tests must work as is.
  2. SimpleDateFormat because it doesn't support the uuuu pattern.
  3. Comments and examples (except comments related to already replaced patterns).

Before the changes, the year of common era 100 and the year of BC era -99, showed similarly as 100. After the changes negative years will be formatted with the - sign.

Before:

scala> Seq(java.time.LocalDate.of(-99, 1, 1)).toDF().show
+----------+
|     value|
+----------+
|0100-01-01|
+----------+

After:

scala> Seq(java.time.LocalDate.of(-99, 1, 1)).toDF().show
+-----------+
|      value|
+-----------+
|-0099-01-01|
+-----------+

How was this patch tested?

By existing test suites, and added tests for negative years to DateFormatterSuite and TimestampFormatterSuite.

@SparkQA
Copy link

SparkQA commented Jul 22, 2019

Test build #108017 has finished for PR 25230 at commit 3e4426c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 22, 2019

Test build #108022 has finished for PR 25230 at commit 8b4e733.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 23, 2019

Test build #108037 has finished for PR 25230 at commit cdfe56d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 23, 2019

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Jul 23, 2019

Test build #108040 has finished for PR 25230 at commit cdfe56d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had never seen this! good discussion at https://stackoverflow.com/questions/41177442/uuuu-versus-yyyy-in-datetimeformatter-formatting-pattern-codes-in-java which supports your conclusion. It really won't matter except for years before 1 AD.

How about a basic test to demonstrate the behavior difference?

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 24, 2019

How about a basic test to demonstrate the behavior difference?

@srowen I added tests for formatting of negative years. Just in case, parsing of negative years didn't work before and won't work after the changes even due to different reasons:

@SparkQA
Copy link

SparkQA commented Jul 24, 2019

Test build #108080 has finished for PR 25230 at commit b47d6d5.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a breaking change though?

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 24, 2019

is this a breaking change though?

@felixcheung No, negative years are out of the valid range for the DATE type:

* Valid range is [0001-01-01, 9999-12-31].

It could be considered as a fix for correctness issue.

Before: written -99 year was loaded back as 100. That's incorrect.

scala> Seq(java.time.LocalDate.of(-99, 1, 1)).toDF("d").write.mode("overwrite").json("neg_year2")
scala> spark.read.schema("d date").json("/Users/maxim/tmp/neg_year2").show
+----------+
|         d|
+----------+
|0100-01-01|
+----------+

After:

scala> Seq(java.time.LocalDate.of(-99, 1, 1)).toDF("d").write.mode("overwrite").json("neg_year")
scala> spark.read.schema("d date").json("neg_year").show
+----+
|   d|
+----+
|null|
+----+

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 24, 2019

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Jul 24, 2019

Test build #108082 has finished for PR 25230 at commit b47d6d5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only hesitation is that most users who totally undesrtand what yyyy-MM-dd means in the docs won't necessarily get uuuu-MM-dd. I could see an argument that it's not worth it. But I think it's the right thing to do for full correctness even if it should virtually never affect an actual query or data.

@dongjoon-hyun
Copy link
Member

Retest this please.

@dongjoon-hyun
Copy link
Member

@MaxGekk . Does this PR cover all instances? It seems that there are some leftovers. Could you elaborate a little bit more about the criteria of replacement?

@SparkQA
Copy link

SparkQA commented Jul 27, 2019

Test build #108255 has finished for PR 25230 at commit b47d6d5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 28, 2019

Could you elaborate a little bit more about the criteria of replacement?

@dongjoon-hyun Everything with the yyyy- pattern except of:

  1. Test, Suite & Benchmark. Existing tests must work as is.
  2. SimpleDateFormat because it doesn't support the uuuu pattern.
  3. Comments and examples (except comments related to already replaced patterns).

For *.scala:

find . -type f -name "*.scala" -print0|xargs -0 grep -n 'yyyy-'|grep -v -i SimpleDateFormat|grep -v '> SELECT _FUNC_'|grep -v '\*'|grep -v '//'|grep -v Suite|grep -v Benchmark|grep -v Test

Similar criteria for python and R.

@dongjoon-hyun
Copy link
Member

Please put that into the PR description, @MaxGekk ~

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 28, 2019

Please put that into the PR description, @MaxGekk ~

@dongjoon-hyun I have updated the description.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. This looks more intuitive.
Thank you, @MaxGekk , @srowen , @felixcheung .
Merged to master.

@dongjoon-hyun
Copy link
Member

cc @gatorsmile and @cloud-fan

@HyukjinKwon
Copy link
Member

+1 nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants