Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Oct 6, 2018

What changes were proposed in this pull request?

After the changes, total execution time of JsonExpressionsSuite.scala dropped from 12.5 seconds to 3 seconds.

@MaxGekk MaxGekk changed the title [SPARK-25670][SQL] Reduce number of tested timezones in JsonExpressionsSuite [SPARK-25670][TEST] Reduce number of tested timezones in JsonExpressionsSuite Oct 6, 2018
@srowen
Copy link
Member

srowen commented Oct 6, 2018

-1 We definitely don't want to randomly test subsets of functionality just to make things faster. 9 seconds isn't worth it.

@SparkQA
Copy link

SparkQA commented Oct 6, 2018

Test build #97051 has finished for PR 22657 at commit e7d17d2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 6, 2018

@srowen What is the difference between this test suite and this PR #22631 . Also I took a sub-set of timezones in the PR: #22379 (comment) . I think we should apply consistent approach across all test suites. @gatorsmile @cloud-fan What do you think?

9 seconds isn't worth it.

It isn't worth if you run the test not so often. I run the test suite manually pretty often, and for me it is worth.

@srowen
Copy link
Member

srowen commented Oct 6, 2018

I didn't see that one, and I object to it. I'll follow up there.

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 8, 2018

I didn't see that one, and I object to it.

I believe we don't need to test all timezones in the case of JSON datasource. Actually we just check how FastDateFormat of commons-lang3 is able to parse/generate JSON by using different timezones. From my point of view, It should be enough to have one test case which checks that we properly pass a timezone to FastDateFormat. Don't think we have to test apache commons-lang3 fully. @HyukjinKwon What do you think of it?

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Oct 8, 2018

I agree. We don't test commons-lang3 library here - one test should be good enough since what we need to check is if it delegates and passes the timezone properly or not.

@cloud-fan
Copy link
Contributor

Then we don't need any randomness here, just pick one timezone(like PST?) and test it.


val ALL_TIMEZONES: Seq[TimeZone] = TimeZone.getAvailableIDs.toSeq.map(TimeZone.getTimeZone)

val TIMEZONES = Seq(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we move it to SparkFunSuite and name it like outstandingTimezones?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems it should be useful across a few test suites. I will do that.

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 10, 2018

Then we don't need any randomness here, just pick one timezone(like PST?) and test it.

I took 8 out of 627. Going to do the same in the PR: #22379

}
}

lazy val outstandingTimezones = Seq(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this lazy so that it's not evaluated by each test suite? I get it although TimeZone caches these, it seems. It won't matter really either way. I'm neutral, it doesn't matter

@SparkQA
Copy link

SparkQA commented Oct 10, 2018

Test build #97201 has finished for PR 22657 at commit 5f8bf79.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 10, 2018

Test build #97202 has finished for PR 22657 at commit 778629d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes from 14 seconds to almost no time. That's a small nice win. Where else can we apply this subset?

@kiszk
Copy link
Member

kiszk commented Oct 11, 2018

Do we need to apply the same reduction to CastSuite?
I believe that DateTimeUtilsSuite should check all of the time zones, but may skip some values between-20000 to 20000.

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 11, 2018

Where else can we apply this subset?

I am going to make similar changes in tests for from_csv where we don't need to test all timezones too: https://github.com/apache/spark/pull/22379/files#diff-95f5ea298edecd7edf5578466969d7eeR93

@kiszk
Copy link
Member

kiszk commented Oct 11, 2018

How about defining this subset in object DateTimeTestUtils like ALL_TIMEZONES?

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 11, 2018

How about defining this subset in object DateTimeTestUtils like ALL_TIMEZONES?

This is what I already did in the PR: #22657 (comment)

@cloud-fan
Copy link
Contributor

If it's only used in test let's define it in the test scope.

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 11, 2018

let's define it in the test scope.

What's the test scope here? Both DateTimeTestUtils and SparkFunSuite are used in test suites only.

@cloud-fan
Copy link
Contributor

Ah sorry I misread it as DateTimeUtils before...

Then yes, it's a better place. Sorry for the back and forth!

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 11, 2018

Then yes, it's a better place. Sorry for the back and forth!

@cloud-fan Never mind. I will return it back. Thank you for reviewing it.

@SparkQA
Copy link

SparkQA commented Oct 11, 2018

Test build #97259 has finished for PR 22657 at commit 30be9f3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Oct 11, 2018

Sorry for bothering you again. Do we need to apply the same reduction to CastSuite?

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 11, 2018

@kiszk It has been already parallelized by @srowen:

. What is the reason to reduce the number of zones there?

@srowen
Copy link
Member

srowen commented Oct 11, 2018

Yeah, I could see the argument both ways for keeping all the tests in CastSuite or just checking a subset. We already got the test down considerably, though it's still like 24 seconds. Is there any reasonable reason to think the subset of timezones doesn't cover a case we can imagine that could fail here?

@srowen
Copy link
Member

srowen commented Oct 12, 2018

Merged to master

@asfgit asfgit closed this in d47a25f Oct 12, 2018
@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 12, 2018

@srowen @HyukjinKwon @cloud-fan Thank you for your review of the PR.

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…onsSuite

## What changes were proposed in this pull request?

After the changes, total execution time of `JsonExpressionsSuite.scala` dropped from 12.5 seconds to 3 seconds.

Closes apache#22657 from MaxGekk/json-timezone-test.

Authored-by: Maxim Gekk <maxim.gekk@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
@MaxGekk MaxGekk deleted the json-timezone-test branch August 17, 2019 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants