-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19117][SPARK-18922][TESTS] Fix the rest of flaky, newly introduced and missed test failures on Windows #16586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #71399 has finished for PR 16586 at commit
|
| |ALTER TABLE $tab ADD | ||
| |PARTITION (ds='2008-04-08', hr=11) LOCATION '$part1Path' | ||
| |PARTITION (ds='2008-04-08', hr=12) LOCATION '$part2Path' | ||
| |PARTITION (ds='2008-04-08', hr=11) LOCATION '${part1Path.toURI}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering what is the reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems due to the parser. If the path is something like C:\tmp\b\c, then, it goes like C: mpbc (escaping). To deal with this, we should make it C:\\tmp\\b\\c or URI. The simplest choice seems to use URI unless it is a test dedicated to such case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I will keep it in mind. We are not following this rule when writing the test cases.
|
Hm.. it seems the builds are somehow blocked again.. I think I should contact AppVeyor again if it does not proceed further tomorrow. It happened to me before when I heavily cancel and restart the builds frequently. |
|
|
||
| val numSlices = 16 | ||
| val d = sc.parallelize(0 to 1e3.toInt, numSlices).map(w) | ||
| val d = sc.parallelize(0 to 1e4.toInt, numSlices).map(w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While here, feel free to just write "0 to 10000"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, thanks.
|
@srowen, @shivaram and @felixcheung this is the problem I previously reported to three of you before - I observed that in AFS account at Spark - https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/history. This can be simply just jobs queued up in other projects or same thing happening. To verify this, I should check if any job is running in any project in Apache via AppVeyor. In case of mine, it is easy to check because I only have single project, spark. So, any of you (as a committer or PMC) knows if there is any easy way to retreive any list of projects that use AppVeyor in Apache? I am willing to check each project once I get the list, and report this to AppVeyor if the same thing happens to AFS account too when the jobs are blocked like some days or a couple of weeks. (cc @dongjoon-hyun FYI as REEF uses the same AFS account IIRC) |
|
@HyukjinKwon I have no info on this I'm afraid. From searching the internet for "appveyor apache" I think Thrift and Nifi might be using it too. Are you saying your jobs never run in AppVeyor, or take a long time to schedule? You're welcome to tackle the problem if you can. what's the status here -- do we still need a positive result from Appveyor before merging this? |
|
Ah, I see. Yes, I am sorry not for clarifying it. Actually, I meant two problems. One is my jobs in my account never run in AppVeyor as above. And, the other is, separately I suspect that the same thing happens in the current AFS account (which might possibly be just due to many jobs queued in other Apache projects) because I observed that the jobs for SparkR (for other PRs) seem not running for two days and until now. Let me then try to verify it at my best for this. For the status of this PR, I wanted to show you all a green from AppVeyor to complete fixing the tests on Windows for now (of course I will keep fixing newly introduced ones in the future though). Let me try to show you all a green after contacting AppVeyor to unlock the status of my account. |
|
I have no information on this as well. We could file an INFRA ticket if we wanted the information |
|
yea, not sure why but appveyor has been stuck from around 3 days ago |
| |WITH SERDEPROPERTIES ('serialization.last.column.takes.rest'='true') FROM src; | ||
| """.stripMargin.replaceAll(System.lineSeparator(), " ")) | ||
| """.stripMargin.replaceAll(System.lineSeparator(), " "), | ||
| skip = !TestUtils.testCommandAvailable("/bin/bash")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the cause we need to skip this test? perhaps add a comment to help keep a record of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Script transformation such as USING 'cat' requires a hard-coded /bin/bash which seems missing or differently located on Windows (with/without Cygwin).
I will add a single de-duplicated comment around the first instance of it (there are many instances of it) if I happen to push more commits.
|
Ah, thank you @shivaram and @felixcheung |
|
(Just FYI, it is now fixed for both mine and AFS account. It was due to the recent issue in AppVeyor - https://appveyor.statuspage.io/incidents/k06ydx9hkhbt) |
|
Test build #71595 has finished for PR 16586 at commit
|
|
Current status of this PR: It seems these tests below constantly failing during 6 times build (please check the logs in https://ci.appveyor.com/project/spark-test/spark/history, in particular, https://ci.appveyor.com/project/spark-test/spark/build/606-E7636D1D-41D6-4E36-9E15-B26EBB03B9E1 which looks having fewest test failures) which I think are flaky ones because I remember they were passed. Let me try to run individual tests for them because it takes too long time for building all multiple times. |
|
They all pass in individual tests with Although I am wondering how/why those tests seem more flaky in package-level tests e.g., Let me remove |
|
|
||
| val numSlices = 16 | ||
| val d = sc.parallelize(0 to 1e3.toInt, numSlices).map(w) | ||
| val d = sc.parallelize(0 to 10000, numSlices).map(w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure the deserialization time test is less flaky now assuming from the individual tests as below:
Before - 4 failures out of 5.
1 (failed)
2 (failed)
3 (failed)
4 (passed)
5 (failed)
After - 1 failure out of 7.
1 (passed)
2 (passed)
3 (passed)
4 (passed)
5 (failed)
6 (passed)
7 (passed)
|
Merged to master |
…uced and missed test failures on Windows ## What changes were proposed in this pull request? **Failed tests** ``` org.apache.spark.sql.hive.execution.HiveQuerySuite: - transform with SerDe3 *** FAILED *** - transform with SerDe4 *** FAILED *** ``` ``` org.apache.spark.sql.hive.execution.HiveDDLSuite: - create hive serde table with new syntax *** FAILED *** - add/drop partition with location - managed table *** FAILED *** ``` ``` org.apache.spark.sql.hive.ParquetMetastoreSuite: - Explicitly added partitions should be readable after load *** FAILED *** - Non-partitioned table readable after load *** FAILED *** ``` **Aborted tests** ``` Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.execution.HiveSerDeSuite *** ABORTED *** (157 milliseconds) org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: C:projectssparksqlhive argetscala-2.11 est-classesdatafilessales.txt; ``` **Flaky tests(failed 9ish out of 10)** ``` org.apache.spark.scheduler.SparkListenerSuite: - local metrics *** FAILED *** ``` ## How was this patch tested? Manually tested via AppVeyor. **Failed tests** ``` org.apache.spark.sql.hive.execution.HiveQuerySuite: - transform with SerDe3 !!! CANCELED !!! (0 milliseconds) - transform with SerDe4 !!! CANCELED !!! (0 milliseconds) ``` ``` org.apache.spark.sql.hive.execution.HiveDDLSuite: - create hive serde table with new syntax (1 second, 672 milliseconds) - add/drop partition with location - managed table (2 seconds, 391 milliseconds) ``` ``` org.apache.spark.sql.hive.ParquetMetastoreSuite: - Explicitly added partitions should be readable after load (609 milliseconds) - Non-partitioned table readable after load (344 milliseconds) ``` **Aborted tests** ``` spark.sql.hive.execution.HiveSerDeSuite: - Read with RegexSerDe (2 seconds, 142 milliseconds) - Read and write with LazySimpleSerDe (tab separated) (2 seconds) - Read with AvroSerDe (1 second, 47 milliseconds) - Read Partitioned with AvroSerDe (1 second, 422 milliseconds) ``` **Flaky tests (failed 9ish out of 10)** ``` org.apache.spark.scheduler.SparkListenerSuite: - local metrics (4 seconds, 562 milliseconds) ``` Author: hyukjinkwon <gurwls223@gmail.com> Closes apache#16586 from HyukjinKwon/set-path-appveyor.
…uced and missed test failures on Windows ## What changes were proposed in this pull request? **Failed tests** ``` org.apache.spark.sql.hive.execution.HiveQuerySuite: - transform with SerDe3 *** FAILED *** - transform with SerDe4 *** FAILED *** ``` ``` org.apache.spark.sql.hive.execution.HiveDDLSuite: - create hive serde table with new syntax *** FAILED *** - add/drop partition with location - managed table *** FAILED *** ``` ``` org.apache.spark.sql.hive.ParquetMetastoreSuite: - Explicitly added partitions should be readable after load *** FAILED *** - Non-partitioned table readable after load *** FAILED *** ``` **Aborted tests** ``` Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.execution.HiveSerDeSuite *** ABORTED *** (157 milliseconds) org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: C:projectssparksqlhive argetscala-2.11 est-classesdatafilessales.txt; ``` **Flaky tests(failed 9ish out of 10)** ``` org.apache.spark.scheduler.SparkListenerSuite: - local metrics *** FAILED *** ``` ## How was this patch tested? Manually tested via AppVeyor. **Failed tests** ``` org.apache.spark.sql.hive.execution.HiveQuerySuite: - transform with SerDe3 !!! CANCELED !!! (0 milliseconds) - transform with SerDe4 !!! CANCELED !!! (0 milliseconds) ``` ``` org.apache.spark.sql.hive.execution.HiveDDLSuite: - create hive serde table with new syntax (1 second, 672 milliseconds) - add/drop partition with location - managed table (2 seconds, 391 milliseconds) ``` ``` org.apache.spark.sql.hive.ParquetMetastoreSuite: - Explicitly added partitions should be readable after load (609 milliseconds) - Non-partitioned table readable after load (344 milliseconds) ``` **Aborted tests** ``` spark.sql.hive.execution.HiveSerDeSuite: - Read with RegexSerDe (2 seconds, 142 milliseconds) - Read and write with LazySimpleSerDe (tab separated) (2 seconds) - Read with AvroSerDe (1 second, 47 milliseconds) - Read Partitioned with AvroSerDe (1 second, 422 milliseconds) ``` **Flaky tests (failed 9ish out of 10)** ``` org.apache.spark.scheduler.SparkListenerSuite: - local metrics (4 seconds, 562 milliseconds) ``` Author: hyukjinkwon <gurwls223@gmail.com> Closes apache#16586 from HyukjinKwon/set-path-appveyor.
What changes were proposed in this pull request?
Failed tests
Aborted tests
Flaky tests(failed 9ish out of 10)
How was this patch tested?
Manually tested via AppVeyor.
Failed tests
Aborted tests
Flaky tests (failed 9ish out of 10)