-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25670][TEST] Reduce number of tested timezones in JsonExpressionsSuite #22657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
-1 We definitely don't want to randomly test subsets of functionality just to make things faster. 9 seconds isn't worth it. |
|
Test build #97051 has finished for PR 22657 at commit
|
|
@srowen What is the difference between this test suite and this PR #22631 . Also I took a sub-set of timezones in the PR: #22379 (comment) . I think we should apply consistent approach across all test suites. @gatorsmile @cloud-fan What do you think?
It isn't worth if you run the test not so often. I run the test suite manually pretty often, and for me it is worth. |
|
I didn't see that one, and I object to it. I'll follow up there. |
I believe we don't need to test all timezones in the case of JSON datasource. Actually we just check how |
|
I agree. We don't test |
|
Then we don't need any randomness here, just pick one timezone(like PST?) and test it. |
|
|
||
| val ALL_TIMEZONES: Seq[TimeZone] = TimeZone.getAvailableIDs.toSeq.map(TimeZone.getTimeZone) | ||
|
|
||
| val TIMEZONES = Seq( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we move it to SparkFunSuite and name it like outstandingTimezones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it should be useful across a few test suites. I will do that.
I took 8 out of 627. Going to do the same in the PR: #22379 |
| } | ||
| } | ||
|
|
||
| lazy val outstandingTimezones = Seq( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this lazy so that it's not evaluated by each test suite? I get it although TimeZone caches these, it seems. It won't matter really either way. I'm neutral, it doesn't matter
|
Test build #97201 has finished for PR 22657 at commit
|
|
Test build #97202 has finished for PR 22657 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This goes from 14 seconds to almost no time. That's a small nice win. Where else can we apply this subset?
|
Do we need to apply the same reduction to |
I am going to make similar changes in tests for |
|
How about defining this subset in |
This is what I already did in the PR: #22657 (comment) |
|
If it's only used in test let's define it in the test scope. |
What's the test scope here? Both |
|
Ah sorry I misread it as Then yes, it's a better place. Sorry for the back and forth! |
@cloud-fan Never mind. I will return it back. Thank you for reviewing it. |
|
Test build #97259 has finished for PR 22657 at commit
|
|
Sorry for bothering you again. Do we need to apply the same reduction to |
|
@kiszk It has been already parallelized by @srowen: spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala Line 115 in eaafcd8
|
|
Yeah, I could see the argument both ways for keeping all the tests in CastSuite or just checking a subset. We already got the test down considerably, though it's still like 24 seconds. Is there any reasonable reason to think the subset of timezones doesn't cover a case we can imagine that could fail here? |
|
Merged to master |
|
@srowen @HyukjinKwon @cloud-fan Thank you for your review of the PR. |
…onsSuite ## What changes were proposed in this pull request? After the changes, total execution time of `JsonExpressionsSuite.scala` dropped from 12.5 seconds to 3 seconds. Closes apache#22657 from MaxGekk/json-timezone-test. Authored-by: Maxim Gekk <maxim.gekk@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>
What changes were proposed in this pull request?
After the changes, total execution time of
JsonExpressionsSuite.scaladropped from 12.5 seconds to 3 seconds.