Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Mar 26, 2020

What changes were proposed in this pull request?

In the PR, I propose to change types of DateTimeTestUtils values and functions by replacing java.util.TimeZone to java.time.ZoneId. In particular:

  1. Type of ALL_TIMEZONES is changed to Seq[ZoneId].
  2. Remove val outstandingTimezones: Seq[TimeZone].
  3. Change the type of the time zone parameter in withDefaultTimeZone to ZoneId.
  4. Modify affected test suites.

Why are the changes needed?

Currently, Spark SQL's date-time expressions and functions have been already ported on Java 8 time API but tests still use old time APIs. In particular, DateTimeTestUtils exposes functions that accept only TimeZone instances. This is inconvenient, and CPU consuming because need to convert TimeZone instances to ZoneId instances via strings (zone ids).

Does this PR introduce any user-facing change?

No

How was this patch tested?

By affected test suites executed by jenkins builds.

@SparkQA
Copy link

SparkQA commented Mar 26, 2020

Test build #120414 has finished for PR 28033 at commit 13f6054.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 26, 2020

jenkins, retest this, please

@MaxGekk MaxGekk changed the title [WIP][SQL][TESTS] Migrate DateTimeTestUtils from TimeZone to ZoneId [SPARK-31277][SQL][TESTS] Migrate DateTimeTestUtils from TimeZone to ZoneId Mar 26, 2020
@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 26, 2020

@cloud-fan @HyukjinKwon Please, have a look at the PR.

@SparkQA
Copy link

SparkQA commented Mar 26, 2020

Test build #120433 has finished for PR 28033 at commit 13f6054.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


val tz = LA.getId
withDefaultTimeZone(TimeZone.getTimeZone(tz)) {
withDefaultTimeZone(getZoneId(tz)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just LA?

val UTC_OPT = Option("UTC")

val ALL_TIMEZONES: Seq[TimeZone] = TimeZone.getAvailableIDs.toSeq.map(TimeZone.getTimeZone)
val ALL_TIMEZONES: Seq[ZoneId] = ZoneId.getAvailableZoneIds.asScala.map(getZoneId).toSeq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should only test the outstandingZoneIds instead of ALL_TIMEZONES. The test can be very slow if we test all timezones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already discussed this some time ago, see #22657 . We replaced ALL_TIMEZONES by outstanding time zones in some places but decided to leave in date-time tests. @srowen proposed to run tests for ALL_TIMEZONES in parallel, see #22672 (#22672 (comment)). Total time of testing all time zone in parallel ~0.24 seconds. Not so much, I think.

@SparkQA
Copy link

SparkQA commented Mar 27, 2020

Test build #120453 has finished for PR 28033 at commit 7d2b5bd.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 27, 2020

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Mar 27, 2020

Test build #120464 has finished for PR 28033 at commit 7d2b5bd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0 !

@cloud-fan cloud-fan closed this in 9f0c010 Mar 27, 2020
cloud-fan pushed a commit that referenced this pull request Mar 27, 2020
… to `ZoneId`

In the PR, I propose to change types of `DateTimeTestUtils` values and functions by replacing `java.util.TimeZone` to `java.time.ZoneId`. In particular:
1. Type of `ALL_TIMEZONES` is changed to `Seq[ZoneId]`.
2. Remove `val outstandingTimezones: Seq[TimeZone]`.
3. Change the type of the time zone parameter in `withDefaultTimeZone` to `ZoneId`.
4. Modify affected test suites.

Currently, Spark SQL's date-time expressions and functions have been already ported on Java 8 time API but tests still use old time APIs. In particular, `DateTimeTestUtils` exposes functions that accept only TimeZone instances. This is inconvenient, and CPU consuming because need to convert TimeZone instances to ZoneId instances via strings (zone ids).

No

By affected test suites executed by jenkins builds.

Closes #28033 from MaxGekk/with-default-time-zone.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 9f0c010)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
… to `ZoneId`

### What changes were proposed in this pull request?
In the PR, I propose to change types of `DateTimeTestUtils` values and functions by replacing `java.util.TimeZone` to `java.time.ZoneId`. In particular:
1. Type of `ALL_TIMEZONES` is changed to `Seq[ZoneId]`.
2. Remove `val outstandingTimezones: Seq[TimeZone]`.
3. Change the type of the time zone parameter in `withDefaultTimeZone` to `ZoneId`.
4. Modify affected test suites.

### Why are the changes needed?
Currently, Spark SQL's date-time expressions and functions have been already ported on Java 8 time API but tests still use old time APIs. In particular, `DateTimeTestUtils` exposes functions that accept only TimeZone instances. This is inconvenient, and CPU consuming because need to convert TimeZone instances to ZoneId instances via strings (zone ids).

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
By affected test suites executed by jenkins builds.

Closes apache#28033 from MaxGekk/with-default-time-zone.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@MaxGekk MaxGekk deleted the with-default-time-zone branch June 5, 2020 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants