Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jan 15, 2017

What changes were proposed in this pull request?

Failed tests

org.apache.spark.sql.hive.execution.HiveQuerySuite:
 - transform with SerDe3 *** FAILED ***
 - transform with SerDe4 *** FAILED ***
org.apache.spark.sql.hive.execution.HiveDDLSuite:
 - create hive serde table with new syntax *** FAILED ***
 - add/drop partition with location - managed table *** FAILED ***
org.apache.spark.sql.hive.ParquetMetastoreSuite:
 - Explicitly added partitions should be readable after load *** FAILED ***
 - Non-partitioned table readable after load *** FAILED ***

Aborted tests

Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.execution.HiveSerDeSuite *** ABORTED *** (157 milliseconds)
   org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: C:projectssparksqlhive   argetscala-2.11   est-classesdatafilessales.txt;

Flaky tests(failed 9ish out of 10)

org.apache.spark.scheduler.SparkListenerSuite:
 - local metrics *** FAILED ***

How was this patch tested?

Manually tested via AppVeyor.

Failed tests

org.apache.spark.sql.hive.execution.HiveQuerySuite:
 - transform with SerDe3 !!! CANCELED !!! (0 milliseconds)
 - transform with SerDe4 !!! CANCELED !!! (0 milliseconds)
org.apache.spark.sql.hive.execution.HiveDDLSuite:
 - create hive serde table with new syntax (1 second, 672 milliseconds)
 - add/drop partition with location - managed table (2 seconds, 391 milliseconds)
org.apache.spark.sql.hive.ParquetMetastoreSuite:
 - Explicitly added partitions should be readable after load (609 milliseconds)
 - Non-partitioned table readable after load (344 milliseconds)

Aborted tests

spark.sql.hive.execution.HiveSerDeSuite:
 - Read with RegexSerDe (2 seconds, 142 milliseconds)
 - Read and write with LazySimpleSerDe (tab separated) (2 seconds)
 - Read with AvroSerDe (1 second, 47 milliseconds)
 - Read Partitioned with AvroSerDe (1 second, 422 milliseconds)

Flaky tests (failed 9ish out of 10)

org.apache.spark.scheduler.SparkListenerSuite:
 - local metrics (4 seconds, 562 milliseconds)

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jan 15, 2017

Build started: [TESTS] ALL PR-16586

+There is something wrong with AppVeyor. Let me re-triger all the builds

@SparkQA
Copy link

SparkQA commented Jan 15, 2017

Test build #71399 has finished for PR 16586 at commit 9ce8846.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

|ALTER TABLE $tab ADD
|PARTITION (ds='2008-04-08', hr=11) LOCATION '$part1Path'
|PARTITION (ds='2008-04-08', hr=12) LOCATION '$part2Path'
|PARTITION (ds='2008-04-08', hr=11) LOCATION '${part1Path.toURI}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering what is the reason?

Copy link
Member Author

@HyukjinKwon HyukjinKwon Jan 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems due to the parser. If the path is something like C:\tmp\b\c, then, it goes like C: mpbc (escaping). To deal with this, we should make it C:\\tmp\\b\\c or URI. The simplest choice seems to use URI unless it is a test dedicated to such case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I will keep it in mind. We are not following this rule when writing the test cases.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jan 16, 2017

Hm.. it seems the builds are somehow blocked again.. I think I should contact AppVeyor again if it does not proceed further tomorrow. It happened to me before when I heavily cancel and restart the builds frequently.


val numSlices = 16
val d = sc.parallelize(0 to 1e3.toInt, numSlices).map(w)
val d = sc.parallelize(0 to 1e4.toInt, numSlices).map(w)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While here, feel free to just write "0 to 10000"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thanks.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jan 17, 2017

@srowen, @shivaram and @felixcheung this is the problem I previously reported to three of you before - PR-16586 I have jobs in the queue at AppVeyor but it does not start.

I observed that in AFS account at Spark - https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/history. This can be simply just jobs queued up in other projects or same thing happening. To verify this, I should check if any job is running in any project in Apache via AppVeyor.

In case of mine, it is easy to check because I only have single project, spark.

So, any of you (as a committer or PMC) knows if there is any easy way to retreive any list of projects that use AppVeyor in Apache? I am willing to check each project once I get the list, and report this to AppVeyor if the same thing happens to AFS account too when the jobs are blocked like some days or a couple of weeks.

(cc @dongjoon-hyun FYI as REEF uses the same AFS account IIRC)

@srowen
Copy link
Member

srowen commented Jan 17, 2017

@HyukjinKwon I have no info on this I'm afraid. From searching the internet for "appveyor apache" I think Thrift and Nifi might be using it too. Are you saying your jobs never run in AppVeyor, or take a long time to schedule? You're welcome to tackle the problem if you can.

what's the status here -- do we still need a positive result from Appveyor before merging this?

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jan 17, 2017

Ah, I see. Yes, I am sorry not for clarifying it. Actually, I meant two problems. One is my jobs in my account never run in AppVeyor as above.

And, the other is, separately I suspect that the same thing happens in the current AFS account (which might possibly be just due to many jobs queued in other Apache projects) because I observed that the jobs for SparkR (for other PRs) seem not running for two days and until now. Let me then try to verify it at my best for this.

For the status of this PR, I wanted to show you all a green from AppVeyor to complete fixing the tests on Windows for now (of course I will keep fixing newly introduced ones in the future though). Let me try to show you all a green after contacting AppVeyor to unlock the status of my account.

@shivaram
Copy link
Contributor

I have no information on this as well. We could file an INFRA ticket if we wanted the information

@felixcheung
Copy link
Member

yea, not sure why but appveyor has been stuck from around 3 days ago

|WITH SERDEPROPERTIES ('serialization.last.column.takes.rest'='true') FROM src;
""".stripMargin.replaceAll(System.lineSeparator(), " "))
""".stripMargin.replaceAll(System.lineSeparator(), " "),
skip = !TestUtils.testCommandAvailable("/bin/bash"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the cause we need to skip this test? perhaps add a comment to help keep a record of this?

Copy link
Member Author

@HyukjinKwon HyukjinKwon Jan 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Script transformation such as USING 'cat' requires a hard-coded /bin/bash which seems missing or differently located on Windows (with/without Cygwin).

I will add a single de-duplicated comment around the first instance of it (there are many instances of it) if I happen to push more commits.

@HyukjinKwon
Copy link
Member Author

Ah, thank you @shivaram and @felixcheung

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jan 18, 2017

(Just FYI, it is now fixed for both mine and AFS account. It was due to the recent issue in AppVeyor - https://appveyor.statuspage.io/incidents/k06ydx9hkhbt)

@HyukjinKwon
Copy link
Member Author

Build started: [TESTS] ALL PR-16586
Build started: [TESTS] ALL PR-16586
Build started: [TESTS] ALL PR-16586

@SparkQA
Copy link

SparkQA commented Jan 18, 2017

Test build #71595 has finished for PR 16586 at commit 6859569.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jan 20, 2017

Current status of this PR:

It seems these tests below constantly failing during 6 times build (please check the logs in https://ci.appveyor.com/project/spark-test/spark/history, in particular, https://ci.appveyor.com/project/spark-test/spark/build/606-E7636D1D-41D6-4E36-9E15-B26EBB03B9E1 which looks having fewest test failures) which I think are flaky ones because I remember they were passed.

org.apache.spark.scheduler.SparkListenerSuite:
 - local metrics *** FAILED *** (1 second, 487 milliseconds)

org.apache.spark.sql.hive.execution.HiveQuerySuite:
- constant null testing *** FAILED *** (562 milliseconds)

org.apache.spark.sql.hive.execution.HashAggregationQueryWithControlledFallbackSuite:
- udaf with all data types *** FAILED *** (641 milliseconds)

org.apache.spark.sql.hive.StatisticsSuite:
- verify serialized column stats after analyzing columns *** FAILED *** (1 second, 110 milliseconds)

org.apache.spark.sql.hive.execution.SQLQuerySuite:
- dynamic partition value test *** FAILED *** (547 milliseconds)
- SPARK-6785: HiveQuerySuite - Date cast *** FAILED *** (156 milliseconds)

Let me try to run individual tests for them because it takes too long time for building all multiple times.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jan 20, 2017

Build started: [TESTS] org.apache.spark.scheduler.SparkListenerSuite PR-16586
Build started: [TESTS] org.apache.spark.sql.hive.execution.HiveQuerySuite PR-16586
Build started: [TESTS] org.apache.spark.sql.hive.execution.HashAggregationQueryWithControlledFallbackSuite PR-16586
Build started: [TESTS] org.apache.spark.sql.hive.StatisticsSuite PR-16586
Build started: [TESTS] org.apache.spark.sql.hive.execution.SQLQuerySuite PR-16586

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jan 20, 2017

They all pass in individual tests with test-only (please check the logs above).

org.apache.spark.scheduler.SparkListenerSuite:
 - local metrics (8 seconds, 656 milliseconds)

org.apache.spark.sql.hive.execution.HiveQuerySuite:
 - constant null testing (531 milliseconds)

org.apache.spark.sql.hive.execution.AggregationQuerySuite:
 - udaf with all data types (4 seconds, 285 milliseconds)

org.apache.spark.sql.hive.StatisticsSuite:
 - verify serialized column stats after analyzing columns (2 seconds, 844 milliseconds)

org.apache.spark.sql.hive.execution.SQLQuerySuite:
- dynamic partition value test (1 second, 407 milliseconds)
- SPARK-6785: HiveQuerySuite - Date cast (188 milliseconds)

Although I am wondering how/why those tests seem more flaky in package-level tests e.g.,sbt sql/test than individual tests using e.g., sbt test-only ... (assuming from observations in the builds), I think it is possible to say, at least, Spark tests (in a way I run) are able to pass on Windows.

Let me remove [WIP] and try to make the tests more stable on Windows even in package-level tests in the future if this sounds reasonable.


val numSlices = 16
val d = sc.parallelize(0 to 1e3.toInt, numSlices).map(w)
val d = sc.parallelize(0 to 10000, numSlices).map(w)
Copy link
Member Author

@HyukjinKwon HyukjinKwon Jan 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure the deserialization time test is less flaky now assuming from the individual tests as below:

Before - 4 failures out of 5.

1 (failed)
2 (failed)
3 (failed)
4 (passed)
5 (failed)

After - 1 failure out of 7.

1 (passed)
2 (passed)
3 (passed)
4 (passed)
5 (failed)
6 (passed)
7 (passed)

@HyukjinKwon HyukjinKwon changed the title [WIP][SPARK-19117][SPARK-18922][TESTS] Fix the rest of flaky, newly introduced and missed test failures on Windows [SPARK-19117][SPARK-18922][TESTS] Fix the rest of flaky, newly introduced and missed test failures on Windows Jan 20, 2017
@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jan 20, 2017

@srowen
Copy link
Member

srowen commented Jan 21, 2017

Merged to master

@asfgit asfgit closed this in 6113fe7 Jan 21, 2017
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…uced and missed test failures on Windows

## What changes were proposed in this pull request?

**Failed tests**

```
org.apache.spark.sql.hive.execution.HiveQuerySuite:
 - transform with SerDe3 *** FAILED ***
 - transform with SerDe4 *** FAILED ***
```

```
org.apache.spark.sql.hive.execution.HiveDDLSuite:
 - create hive serde table with new syntax *** FAILED ***
 - add/drop partition with location - managed table *** FAILED ***
```

```
org.apache.spark.sql.hive.ParquetMetastoreSuite:
 - Explicitly added partitions should be readable after load *** FAILED ***
 - Non-partitioned table readable after load *** FAILED ***
```

**Aborted tests**

```
Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.execution.HiveSerDeSuite *** ABORTED *** (157 milliseconds)
   org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: C:projectssparksqlhive   argetscala-2.11   est-classesdatafilessales.txt;
```

**Flaky tests(failed 9ish out of 10)**

```
org.apache.spark.scheduler.SparkListenerSuite:
 - local metrics *** FAILED ***
```

## How was this patch tested?

Manually tested via AppVeyor.

**Failed tests**

```
org.apache.spark.sql.hive.execution.HiveQuerySuite:
 - transform with SerDe3 !!! CANCELED !!! (0 milliseconds)
 - transform with SerDe4 !!! CANCELED !!! (0 milliseconds)
```

```
org.apache.spark.sql.hive.execution.HiveDDLSuite:
 - create hive serde table with new syntax (1 second, 672 milliseconds)
 - add/drop partition with location - managed table (2 seconds, 391 milliseconds)
```

```
org.apache.spark.sql.hive.ParquetMetastoreSuite:
 - Explicitly added partitions should be readable after load (609 milliseconds)
 - Non-partitioned table readable after load (344 milliseconds)
```

**Aborted tests**

```
spark.sql.hive.execution.HiveSerDeSuite:
 - Read with RegexSerDe (2 seconds, 142 milliseconds)
 - Read and write with LazySimpleSerDe (tab separated) (2 seconds)
 - Read with AvroSerDe (1 second, 47 milliseconds)
 - Read Partitioned with AvroSerDe (1 second, 422 milliseconds)
```

**Flaky tests (failed 9ish out of 10)**

```
org.apache.spark.scheduler.SparkListenerSuite:
 - local metrics (4 seconds, 562 milliseconds)
```

Author: hyukjinkwon <gurwls223@gmail.com>

Closes apache#16586 from HyukjinKwon/set-path-appveyor.
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
…uced and missed test failures on Windows

## What changes were proposed in this pull request?

**Failed tests**

```
org.apache.spark.sql.hive.execution.HiveQuerySuite:
 - transform with SerDe3 *** FAILED ***
 - transform with SerDe4 *** FAILED ***
```

```
org.apache.spark.sql.hive.execution.HiveDDLSuite:
 - create hive serde table with new syntax *** FAILED ***
 - add/drop partition with location - managed table *** FAILED ***
```

```
org.apache.spark.sql.hive.ParquetMetastoreSuite:
 - Explicitly added partitions should be readable after load *** FAILED ***
 - Non-partitioned table readable after load *** FAILED ***
```

**Aborted tests**

```
Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.execution.HiveSerDeSuite *** ABORTED *** (157 milliseconds)
   org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: C:projectssparksqlhive   argetscala-2.11   est-classesdatafilessales.txt;
```

**Flaky tests(failed 9ish out of 10)**

```
org.apache.spark.scheduler.SparkListenerSuite:
 - local metrics *** FAILED ***
```

## How was this patch tested?

Manually tested via AppVeyor.

**Failed tests**

```
org.apache.spark.sql.hive.execution.HiveQuerySuite:
 - transform with SerDe3 !!! CANCELED !!! (0 milliseconds)
 - transform with SerDe4 !!! CANCELED !!! (0 milliseconds)
```

```
org.apache.spark.sql.hive.execution.HiveDDLSuite:
 - create hive serde table with new syntax (1 second, 672 milliseconds)
 - add/drop partition with location - managed table (2 seconds, 391 milliseconds)
```

```
org.apache.spark.sql.hive.ParquetMetastoreSuite:
 - Explicitly added partitions should be readable after load (609 milliseconds)
 - Non-partitioned table readable after load (344 milliseconds)
```

**Aborted tests**

```
spark.sql.hive.execution.HiveSerDeSuite:
 - Read with RegexSerDe (2 seconds, 142 milliseconds)
 - Read and write with LazySimpleSerDe (tab separated) (2 seconds)
 - Read with AvroSerDe (1 second, 47 milliseconds)
 - Read Partitioned with AvroSerDe (1 second, 422 milliseconds)
```

**Flaky tests (failed 9ish out of 10)**

```
org.apache.spark.scheduler.SparkListenerSuite:
 - local metrics (4 seconds, 562 milliseconds)
```

Author: hyukjinkwon <gurwls223@gmail.com>

Closes apache#16586 from HyukjinKwon/set-path-appveyor.
@HyukjinKwon HyukjinKwon deleted the set-path-appveyor branch January 2, 2018 03:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants