[SPARK-19117][SPARK-18922][TESTS] Fix the rest of flaky, newly introduced and missed test failures on Windows #16586

HyukjinKwon · 2017-01-15T14:28:00Z

What changes were proposed in this pull request?

Failed tests

org.apache.spark.sql.hive.execution.HiveQuerySuite:
 - transform with SerDe3 *** FAILED ***
 - transform with SerDe4 *** FAILED ***

org.apache.spark.sql.hive.execution.HiveDDLSuite:
 - create hive serde table with new syntax *** FAILED ***
 - add/drop partition with location - managed table *** FAILED ***

org.apache.spark.sql.hive.ParquetMetastoreSuite:
 - Explicitly added partitions should be readable after load *** FAILED ***
 - Non-partitioned table readable after load *** FAILED ***

Aborted tests

Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.execution.HiveSerDeSuite *** ABORTED *** (157 milliseconds)
   org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: C:projectssparksqlhive   argetscala-2.11   est-classesdatafilessales.txt;

Flaky tests(failed 9ish out of 10)

org.apache.spark.scheduler.SparkListenerSuite:
 - local metrics *** FAILED ***

How was this patch tested?

Manually tested via AppVeyor.

Failed tests

org.apache.spark.sql.hive.execution.HiveQuerySuite:
 - transform with SerDe3 !!! CANCELED !!! (0 milliseconds)
 - transform with SerDe4 !!! CANCELED !!! (0 milliseconds)

org.apache.spark.sql.hive.execution.HiveDDLSuite:
 - create hive serde table with new syntax (1 second, 672 milliseconds)
 - add/drop partition with location - managed table (2 seconds, 391 milliseconds)

org.apache.spark.sql.hive.ParquetMetastoreSuite:
 - Explicitly added partitions should be readable after load (609 milliseconds)
 - Non-partitioned table readable after load (344 milliseconds)

Aborted tests

spark.sql.hive.execution.HiveSerDeSuite:
 - Read with RegexSerDe (2 seconds, 142 milliseconds)
 - Read and write with LazySimpleSerDe (tab separated) (2 seconds)
 - Read with AvroSerDe (1 second, 47 milliseconds)
 - Read Partitioned with AvroSerDe (1 second, 422 milliseconds)

Flaky tests (failed 9ish out of 10)

org.apache.spark.scheduler.SparkListenerSuite:
 - local metrics (4 seconds, 562 milliseconds)

HyukjinKwon · 2017-01-15T14:32:35Z

Build started: [TESTS] ALL

+There is something wrong with AppVeyor. Let me re-triger all the builds

SparkQA · 2017-01-15T17:03:48Z

Test build #71399 has finished for PR 16586 at commit 9ce8846.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-01-15T17:10:56Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

             |ALTER TABLE $tab ADD
-             |PARTITION (ds='2008-04-08', hr=11) LOCATION '$part1Path'
-             |PARTITION (ds='2008-04-08', hr=12) LOCATION '$part2Path'
+             |PARTITION (ds='2008-04-08', hr=11) LOCATION '${part1Path.toURI}'


Just wondering what is the reason?

It seems due to the parser. If the path is something like C:\tmp\b\c, then, it goes like C: mpbc (escaping). To deal with this, we should make it C:\\tmp\\b\\c or URI. The simplest choice seems to use URI unless it is a test dedicated to such case.

Thanks! I will keep it in mind. We are not following this rule when writing the test cases.

HyukjinKwon · 2017-01-16T13:31:21Z

Hm.. it seems the builds are somehow blocked again.. I think I should contact AppVeyor again if it does not proceed further tomorrow. It happened to me before when I heavily cancel and restart the builds frequently.

srowen · 2017-01-16T21:22:14Z

core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala


    val numSlices = 16
-    val d = sc.parallelize(0 to 1e3.toInt, numSlices).map(w)
+    val d = sc.parallelize(0 to 1e4.toInt, numSlices).map(w)


While here, feel free to just write "0 to 10000"

Sure, thanks.

HyukjinKwon · 2017-01-17T01:54:10Z

@srowen, @shivaram and @felixcheung this is the problem I previously reported to three of you before - I have jobs in the queue at AppVeyor but it does not start.

I observed that in AFS account at Spark - https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/history. This can be simply just jobs queued up in other projects or same thing happening. To verify this, I should check if any job is running in any project in Apache via AppVeyor.

In case of mine, it is easy to check because I only have single project, spark.

So, any of you (as a committer or PMC) knows if there is any easy way to retreive any list of projects that use AppVeyor in Apache? I am willing to check each project once I get the list, and report this to AppVeyor if the same thing happens to AFS account too when the jobs are blocked like some days or a couple of weeks.

(cc @dongjoon-hyun FYI as REEF uses the same AFS account IIRC)

srowen · 2017-01-17T12:19:23Z

@HyukjinKwon I have no info on this I'm afraid. From searching the internet for "appveyor apache" I think Thrift and Nifi might be using it too. Are you saying your jobs never run in AppVeyor, or take a long time to schedule? You're welcome to tackle the problem if you can.

what's the status here -- do we still need a positive result from Appveyor before merging this?

HyukjinKwon · 2017-01-17T13:05:12Z

Ah, I see. Yes, I am sorry not for clarifying it. Actually, I meant two problems. One is my jobs in my account never run in AppVeyor as above.

And, the other is, separately I suspect that the same thing happens in the current AFS account (which might possibly be just due to many jobs queued in other Apache projects) because I observed that the jobs for SparkR (for other PRs) seem not running for two days and until now. Let me then try to verify it at my best for this.

For the status of this PR, I wanted to show you all a green from AppVeyor to complete fixing the tests on Windows for now (of course I will keep fixing newly introduced ones in the future though). Let me try to show you all a green after contacting AppVeyor to unlock the status of my account.

shivaram · 2017-01-17T17:12:06Z

I have no information on this as well. We could file an INFRA ticket if we wanted the information

felixcheung · 2017-01-17T17:35:24Z

yea, not sure why but appveyor has been stuck from around 3 days ago

felixcheung · 2017-01-17T17:37:23Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala

      |WITH SERDEPROPERTIES ('serialization.last.column.takes.rest'='true') FROM src;
-    """.stripMargin.replaceAll(System.lineSeparator(), " "))
+    """.stripMargin.replaceAll(System.lineSeparator(), " "),
+    skip = !TestUtils.testCommandAvailable("/bin/bash"))


what's the cause we need to skip this test? perhaps add a comment to help keep a record of this?

Script transformation such as USING 'cat' requires a hard-coded /bin/bash which seems missing or differently located on Windows (with/without Cygwin).

I will add a single de-duplicated comment around the first instance of it (there are many instances of it) if I happen to push more commits.

HyukjinKwon · 2017-01-18T01:59:37Z

Ah, thank you @shivaram and @felixcheung

HyukjinKwon · 2017-01-18T02:34:22Z

(Just FYI, it is now fixed for both mine and AFS account. It was due to the recent issue in AppVeyor - https://appveyor.statuspage.io/incidents/k06ydx9hkhbt)

HyukjinKwon · 2017-01-18T10:19:06Z

Build started: [TESTS] ALL
Build started: [TESTS] ALL
Build started: [TESTS] ALL

SparkQA · 2017-01-18T13:04:27Z

Test build #71595 has finished for PR 16586 at commit 6859569.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-01-20T01:29:45Z

Current status of this PR:

It seems these tests below constantly failing during 6 times build (please check the logs in https://ci.appveyor.com/project/spark-test/spark/history, in particular, https://ci.appveyor.com/project/spark-test/spark/build/606-E7636D1D-41D6-4E36-9E15-B26EBB03B9E1 which looks having fewest test failures) which I think are flaky ones because I remember they were passed.

org.apache.spark.scheduler.SparkListenerSuite:
 - local metrics *** FAILED *** (1 second, 487 milliseconds)

org.apache.spark.sql.hive.execution.HiveQuerySuite:
- constant null testing *** FAILED *** (562 milliseconds)

org.apache.spark.sql.hive.execution.HashAggregationQueryWithControlledFallbackSuite:
- udaf with all data types *** FAILED *** (641 milliseconds)

org.apache.spark.sql.hive.StatisticsSuite:
- verify serialized column stats after analyzing columns *** FAILED *** (1 second, 110 milliseconds)

org.apache.spark.sql.hive.execution.SQLQuerySuite:
- dynamic partition value test *** FAILED *** (547 milliseconds)
- SPARK-6785: HiveQuerySuite - Date cast *** FAILED *** (156 milliseconds)

Let me try to run individual tests for them because it takes too long time for building all multiple times.

HyukjinKwon · 2017-01-20T01:34:26Z

Build started: [TESTS] org.apache.spark.scheduler.SparkListenerSuite
Build started: [TESTS] org.apache.spark.sql.hive.execution.HiveQuerySuite
Build started: [TESTS] org.apache.spark.sql.hive.execution.HashAggregationQueryWithControlledFallbackSuite
Build started: [TESTS] org.apache.spark.sql.hive.StatisticsSuite
Build started: [TESTS] org.apache.spark.sql.hive.execution.SQLQuerySuite

HyukjinKwon · 2017-01-20T06:55:57Z

They all pass in individual tests with test-only (please check the logs above).

org.apache.spark.scheduler.SparkListenerSuite:
 - local metrics (8 seconds, 656 milliseconds)

org.apache.spark.sql.hive.execution.HiveQuerySuite:
 - constant null testing (531 milliseconds)

org.apache.spark.sql.hive.execution.AggregationQuerySuite:
 - udaf with all data types (4 seconds, 285 milliseconds)

org.apache.spark.sql.hive.StatisticsSuite:
 - verify serialized column stats after analyzing columns (2 seconds, 844 milliseconds)

org.apache.spark.sql.hive.execution.SQLQuerySuite:
- dynamic partition value test (1 second, 407 milliseconds)
- SPARK-6785: HiveQuerySuite - Date cast (188 milliseconds)

Although I am wondering how/why those tests seem more flaky in package-level tests e.g.,sbt sql/test than individual tests using e.g., sbt test-only ... (assuming from observations in the builds), I think it is possible to say, at least, Spark tests (in a way I run) are able to pass on Windows.

Let me remove [WIP] and try to make the tests more stable on Windows even in package-level tests in the future if this sounds reasonable.

HyukjinKwon · 2017-01-20T07:11:48Z

core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala


    val numSlices = 16
-    val d = sc.parallelize(0 to 1e3.toInt, numSlices).map(w)
+    val d = sc.parallelize(0 to 10000, numSlices).map(w)


I am pretty sure the deserialization time test is less flaky now assuming from the individual tests as below:

Before - 4 failures out of 5.

1 (failed)
2 (failed)
3 (failed)
4 (passed)
5 (failed)

After - 1 failure out of 7.

1 (passed)
2 (passed)
3 (passed)
4 (passed)
5 (failed)
6 (passed)
7 (passed)

HyukjinKwon · 2017-01-20T07:16:12Z

@srowen, I think it is ready for a second look. In short, the current status is,

there were some test failures ([SPARK-19117][SPARK-18922][TESTS] Fix the rest of flaky, newly introduced and missed test failures on Windows #16586 (comment) & [SPARK-19117][SPARK-18922][TESTS] Fix the rest of flaky, newly introduced and missed test failures on Windows #16586 (comment)) when running each package-level e.g., sbt sql/test, which possibly look flaky
So, these failures were individually tested and passed by sbt test-only ... ([SPARK-19117][SPARK-18922][TESTS] Fix the rest of flaky, newly introduced and missed test failures on Windows #16586 (comment) & [SPARK-19117][SPARK-18922][TESTS] Fix the rest of flaky, newly introduced and missed test failures on Windows #16586 (comment))
local metrics seems still flaky but it seems less flaky in individual tests assuming from the build results in [SPARK-19117][SPARK-18922][TESTS] Fix the rest of flaky, newly introduced and missed test failures on Windows #16586 (comment)

srowen · 2017-01-21T14:08:37Z

Merged to master

…uced and missed test failures on Windows ## What changes were proposed in this pull request? **Failed tests** ``` org.apache.spark.sql.hive.execution.HiveQuerySuite: - transform with SerDe3 *** FAILED *** - transform with SerDe4 *** FAILED *** ``` ``` org.apache.spark.sql.hive.execution.HiveDDLSuite: - create hive serde table with new syntax *** FAILED *** - add/drop partition with location - managed table *** FAILED *** ``` ``` org.apache.spark.sql.hive.ParquetMetastoreSuite: - Explicitly added partitions should be readable after load *** FAILED *** - Non-partitioned table readable after load *** FAILED *** ``` **Aborted tests** ``` Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.execution.HiveSerDeSuite *** ABORTED *** (157 milliseconds) org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: C:projectssparksqlhive argetscala-2.11 est-classesdatafilessales.txt; ``` **Flaky tests(failed 9ish out of 10)** ``` org.apache.spark.scheduler.SparkListenerSuite: - local metrics *** FAILED *** ``` ## How was this patch tested? Manually tested via AppVeyor. **Failed tests** ``` org.apache.spark.sql.hive.execution.HiveQuerySuite: - transform with SerDe3 !!! CANCELED !!! (0 milliseconds) - transform with SerDe4 !!! CANCELED !!! (0 milliseconds) ``` ``` org.apache.spark.sql.hive.execution.HiveDDLSuite: - create hive serde table with new syntax (1 second, 672 milliseconds) - add/drop partition with location - managed table (2 seconds, 391 milliseconds) ``` ``` org.apache.spark.sql.hive.ParquetMetastoreSuite: - Explicitly added partitions should be readable after load (609 milliseconds) - Non-partitioned table readable after load (344 milliseconds) ``` **Aborted tests** ``` spark.sql.hive.execution.HiveSerDeSuite: - Read with RegexSerDe (2 seconds, 142 milliseconds) - Read and write with LazySimpleSerDe (tab separated) (2 seconds) - Read with AvroSerDe (1 second, 47 milliseconds) - Read Partitioned with AvroSerDe (1 second, 422 milliseconds) ``` **Flaky tests (failed 9ish out of 10)** ``` org.apache.spark.scheduler.SparkListenerSuite: - local metrics (4 seconds, 562 milliseconds) ``` Author: hyukjinkwon <gurwls223@gmail.com> Closes apache#16586 from HyukjinKwon/set-path-appveyor.

Fix flaky, newly introduced and missed test failures on Windows

9ce8846

gatorsmile reviewed Jan 15, 2017

View reviewed changes

HyukjinKwon mentioned this pull request Jan 16, 2017

[SPARK-19229] [SQL] Disallow Creating Hive Source Tables when Hive Support is Not Enabled #16587

Closed

srowen approved these changes Jan 16, 2017

View reviewed changes

felixcheung reviewed Jan 17, 2017

View reviewed changes

Address comments and fix another one

6859569

HyukjinKwon commented Jan 20, 2017

View reviewed changes

HyukjinKwon changed the title ~~[WIP][SPARK-19117][SPARK-18922][TESTS] Fix the rest of flaky, newly introduced and missed test failures on Windows~~ [SPARK-19117][SPARK-18922][TESTS] Fix the rest of flaky, newly introduced and missed test failures on Windows Jan 20, 2017

srowen approved these changes Jan 20, 2017

View reviewed changes

asfgit closed this in 6113fe7 Jan 21, 2017

HyukjinKwon deleted the set-path-appveyor branch January 2, 2018 03:38

[SPARK-19117][SPARK-18922][TESTS] Fix the rest of flaky, newly introduced and missed test failures on Windows #16586

[SPARK-19117][SPARK-18922][TESTS] Fix the rest of flaky, newly introduced and missed test failures on Windows #16586

Uh oh!

Conversation

HyukjinKwon commented Jan 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

HyukjinKwon commented Jan 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jan 15, 2017

Uh oh!

gatorsmile Jan 15, 2017

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Jan 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jan 16, 2017

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Jan 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen Jan 16, 2017

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Jan 16, 2017

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Jan 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen commented Jan 17, 2017

Uh oh!

HyukjinKwon commented Jan 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shivaram commented Jan 17, 2017

Uh oh!

felixcheung commented Jan 17, 2017

Uh oh!

felixcheung Jan 17, 2017

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Jan 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Jan 18, 2017

Uh oh!

HyukjinKwon commented Jan 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Jan 18, 2017

Uh oh!

SparkQA commented Jan 18, 2017

Uh oh!

HyukjinKwon commented Jan 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Jan 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Jan 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon Jan 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Jan 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen commented Jan 21, 2017

Uh oh!

Reviewers

HyukjinKwon commented Jan 15, 2017 •

edited

Loading

HyukjinKwon commented Jan 15, 2017 •

edited

Loading

HyukjinKwon Jan 15, 2017 •

edited

Loading

HyukjinKwon commented Jan 16, 2017 •

edited

Loading

HyukjinKwon commented Jan 17, 2017 •

edited

Loading

HyukjinKwon commented Jan 17, 2017 •

edited

Loading

HyukjinKwon Jan 18, 2017 •

edited

Loading

HyukjinKwon commented Jan 18, 2017 •

edited

Loading

HyukjinKwon commented Jan 20, 2017 •

edited

Loading

HyukjinKwon commented Jan 20, 2017 •

edited

Loading

HyukjinKwon commented Jan 20, 2017 •

edited

Loading

HyukjinKwon Jan 20, 2017 •

edited

Loading

HyukjinKwon commented Jan 20, 2017 •

edited

Loading