[SPARK-25605][TESTS] Run cast string to timestamp tests for a subset of timezones #22631

mgaido91 · 2018-10-04T15:53:37Z

What changes were proposed in this pull request?

The test cast string to timestamp used to run for all time zones. So it run for more than 600 times. Running the tests for a significant subset of time zones is probably good enough and doing this in a randomized manner enforces anyway that we are going to test all time zones in different runs.

How was this patch tested?

the test time reduces to 11 seconds from more than 2 minutes

…of timezones

mgaido91 · 2018-10-04T15:53:43Z

cc @gatorsmile

MaxGekk · 2018-10-04T18:25:25Z

JsonExpressionsSuite has a test for all time zones too. Probably it makes sense to apply the same approach there:

spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala

Line 506 in 1007cae

for (tz <- DateTimeTestUtils.ALL_TIMEZONES) {

I guess number of time zones can be reduced in DateTimeUtilsSuite too.

SparkQA · 2018-10-04T19:54:05Z

Test build #96945 has finished for PR 22631 at commit 4853479.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile

LGTM

Thanks! Merged to master.

srowen · 2018-10-06T18:44:04Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala


  test("cast string to timestamp") {
-    for (tz <- ALL_TIMEZONES) {
+    for (tz <- Random.shuffle(ALL_TIMEZONES).take(50)) {


@mgaido91 @gatorsmile surely we shouldn't do this. We're introducing nondeterminism into the test, and also no longer testing things we were testing before. If there's a lot of timezones we don't need to test, then, don't test them. Right? I didn't see this PR but would have objected to this change if I had seen it.

@srowen I think the point here is to test that this works with different timezones. In DateExpressionsSuite, for instance, we test only 3-4 timezones. I don't think it makes sense to test every of the 650 possible timezones: if it works with many of them, then it means that the code is generic and respects timezones. We can also define a fixed subset of timezones, but IMHO taking randomly some of them provides the additional safety that if there is a specific single timezone creating problem, we are able to identify it on several subsequent runs.

We have many place where we generate data randomly in the test, so we already have randomness in the tests. I think the rationale behind them is the same: if the function works with some different data, then it generalize properly.

Tests should be deterministic, ideally; any sources of randomness should be seeded. Do you see one that isn't?

I think this is like deciding we'll run just 90% of all test suites every time randomly, to save time. I think it's just well against good practice.

There are other solutions:

pick a subset of timezones that we're confident do exercise the code and just explicitly test those

parallelize these tests within the test suite

The latter should be trivial in this case: ALL_TIMEZONES.par.foreach { tz => instead. It's the same amount of work but 8x, 16x faster by wall clock time, depending on how many cores are available. What about that?

Yes, there are many tests where data is randomly generated. And they are not seeded of course.

As I said, I think the goal here is to test that the function works well with different timezones: then picking a subset of timezones would be fine too, but I prefer taking them randomly among all because if there is a single timezone creating issues (very unlikely IMHO), we would discover it anyway (not on the single run though).

Moreover, it would be great then to be consistent among all the codebase on what we test. In DateExpressionsSuite we test only 3 timezones and here we test all 650: it is a weird, isn't it? We should probably define which is the right thing to do when timezones are involved and test always the same. Otherwise, testing 650 timezones on a single specific function and 3 on the most of the others is a nonsense IMHO.

Surely not by design? tests need to be deterministic, or else what's the value? failures can't be reproduced. (I know that in practice many things are hard to make deterministic.)

Of course, if you're worried that we might not be testing an important case, we have to test it. We can't just not test it sometimes to make some tests run a little faster.

Testing just 3 timezones might be fine too; I don't know. Testing 50 randomly seems suboptimal in all cases.

I'll open a PR to try simply testing in parallel instead.

I think tests need to be deterministic in general as well.

In this particular case ideally, we should categorize timezones, pick up some timezones representing them and test fixed set. For instance, timezone with DST, without DST, and some exceptions such as, for instance, see this particular case which Python 3.6 addressed lately (https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Lib/datetime.py#L1572-L1574), IMHO.

Of course, this approach requires a lot of investigations and overheads. So, as an alternative, I would incline to go for Sean's approach (https://github.com/apache/spark/pull/22631/files#r223224573) for this particular case.

For randomness, I think primarily we should have first deterministic set of tests. Maybe we could additionally have some set of randomized input to cover some cases we haven't foreseen but that's secondary.

I mean some tests like with randomized input, let's say, integer range input are fine in common sense but this case is different, isn't it?

I don't think that adding parallelism is a good way for improve test time. The amount of resources used for testing is anyway limited. I think the goal here is not (only) reduce the overall time of the test but also reduce the amount of resources needed to test.

Problems with a specific timezone like you mentioned, @HyukjinKwon, are exactly the reason why I am proposing this randomized approach, rather than picking 3 timezones and always use them as done in DateExpressionsSuite: if there is a problem with a specific timezone, in this way, we will be able to catch it. With a fixed subset of them (even though not on the single run), we are not.

The only safe deterministic way would be to run against all of them, as it was done before, but then I'd argue that we should do the same everywhere we have different timezones involved in tests (why are we testing all timezones only for casting to timestamp and not for all other functions involving dates and times, if it is so important to check all timezones?). But then the amount of time needed to run all the tests would be crazy, so it is not doable.

we should categorize timezones, pick up some timezones representing them and test fixed set

That would be the best, but we need some deep understanding of timezone, to make sure the test coverage is good.

We aren't blocked on CPU time or resources, no. The tests are mostly single-threaded, and the big test machines mostly idle throughout the runs. I've opened #22672 to evaluate that.

I feel strongly that we can't do this kind of thing and am opening a thread on dev@ for wider discussion.

…ts for a subset of timezones ## What changes were proposed in this pull request? Try testing timezones in parallel instead in CastSuite, instead of random sampling. See also #22631 ## How was this patch tested? Existing test. Closes #22672 from srowen/SPARK-25605.2. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>

…of timezones ## What changes were proposed in this pull request? The test `cast string to timestamp` used to run for all time zones. So it run for more than 600 times. Running the tests for a significant subset of time zones is probably good enough and doing this in a randomized manner enforces anyway that we are going to test all time zones in different runs. ## How was this patch tested? the test time reduces to 11 seconds from more than 2 minutes Closes apache#22631 from mgaido91/SPARK-25605. Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>

…ts for a subset of timezones ## What changes were proposed in this pull request? Try testing timezones in parallel instead in CastSuite, instead of random sampling. See also apache#22631 ## How was this patch tested? Existing test. Closes apache#22672 from srowen/SPARK-25605.2. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>

[SPARK-25605][TESTS] Run cast string to timestamp tests for a subset …

4853479

…of timezones

MaxGekk mentioned this pull request Oct 4, 2018

[SPARK-25393][SQL] Adding new function from_csv() #22379

Closed

gatorsmile reviewed Oct 5, 2018

View reviewed changes

asfgit closed this in 8113b9c Oct 5, 2018

MaxGekk mentioned this pull request Oct 6, 2018

[SPARK-25670][TEST] Reduce number of tested timezones in JsonExpressionsSuite #22657

Closed

srowen reviewed Oct 6, 2018

View reviewed changes

srowen mentioned this pull request Oct 8, 2018

[SPARK-25605][TESTS] Alternate take. Run cast string to timestamp tests for a subset of timezones #22672

Closed

[SPARK-25605][TESTS] Run cast string to timestamp tests for a subset of timezones #22631

[SPARK-25605][TESTS] Run cast string to timestamp tests for a subset of timezones #22631

Uh oh!

Conversation

mgaido91 commented Oct 4, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

mgaido91 commented Oct 4, 2018

Uh oh!

MaxGekk commented Oct 4, 2018

Uh oh!

SparkQA commented Oct 4, 2018

Uh oh!

gatorsmile left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Oct 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

HyukjinKwon Oct 8, 2018 •

edited

Loading