Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Sep 5, 2018

What changes were proposed in this pull request?

HiveExternalCatalogVersionsSuite Scala-2.12 test has been failing due to class path issue. It is marked as ABORTED because it fails at beforeAll during data population stage.

org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED ***
  Exception encountered when invoking run on a nested suite - spark-submit returned with exit code 1.

The root cause of the failure is that runSparkSubmit mixes 2.4.0-SNAPSHOT classes and old Spark (2.1.3/2.2.2/2.3.1) together during spark-submit. This PR aims to provide non-test mode execution mode to runSparkSubmit by removing the followings.

  • SPARK_TESTING
  • SPARK_SQL_TESTING
  • SPARK_PREPEND_CLASSES
  • SPARK_DIST_CLASSPATH

Previously, in the class path, new Spark classes are behind the old Spark classes. So, new ones are unseen. However, Spark 2.4.0 reveals this bug due to the recent data source class changes.

How was this patch tested?

Manual test. After merging, it will be tested via Jenkins.

$ dev/change-scala-version.sh 2.12
$ build/mvn -DskipTests -Phive -Pscala-2.12 clean package
$ build/mvn -Phive -Pscala-2.12 -Dtest=none -DwildcardSuites=org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite test
...
HiveExternalCatalogVersionsSuite:
- backward compatibility
...
Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
All tests passed.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-25337][SQL] runSparkSubmit should provide non-testing mode [SPARK-25337][SQL][TEST] runSparkSubmit should provide non-testing mode Sep 5, 2018
@dongjoon-hyun
Copy link
Member Author

cc @srowen and @cloud-fan

@SparkQA
Copy link

SparkQA commented Sep 5, 2018

Test build #95705 has finished for PR 22340 at commit d451f98.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

How does the non-test mode resolve the class path issue?

@srowen
Copy link
Member

srowen commented Sep 5, 2018

Yeah same question, but I see why that could cause a problem. Is the point here that while this is a test, the spark-submit run by the test should be run 'normally'? I am happy for a solution just what to understand the implications. I wonder why the classpath stuff hasn't caused a problem before if so, but who knows?

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Sep 5, 2018

HiveExternalCatalogVersionsSuite downloads and executes old Spark in their directory. However, it give the childs the followings. None of them is expected here. Especially, SPARK_DIST_CLASSPATH is related to this bug.

  • SPARK_TESTING
  • SPARK_SQL_TESTING
  • SPARK_PREPEND_CLASSES
  • SPARK_DIST_CLASSPATH

Previously, in the class path, new Spark classes are behind the old Spark classes. So, new ones are unseen. However, Spark 2.4.0 reveals this bug due to the recent data source class changes.

@dongjoon-hyun
Copy link
Member Author

Also, cc @gatorsmile .

@srowen
Copy link
Member

srowen commented Sep 5, 2018

I see. It does seem like we don't want to run with test env variables in this context. I was going to ask if we ever do? should this function always strip the env variables for testing? I can see being conservative and restricting it to this case.

It seems like just stripping SPARK_DIST_CLASSPATH and maybe SPARK_PREPEND_CLASSES is the issue?

@dongjoon-hyun
Copy link
Member Author

  • Since runSparkSubmit is a part of SparkSubmitTestUtils, some test case will fail if we always remove the env variables like SPARK_DIST_CLASSPATH. It gives the child the correct test-time class-path.

  • For executing old Spark distribution, yes. SPARK_DIST_CLASSPATH was the direct root cause and SPARK_PREPEND_CLASSES also doesn't look safe to me. I added the others in order to be proactively preventive.

@dongjoon-hyun
Copy link
Member Author

Thank you for approval, @srowen .

@dongjoon-hyun
Copy link
Member Author

Could you review this Scala-2.12 PR again, @cloud-fan and @gatorsmile ?

@HyukjinKwon
Copy link
Member

Seems okay to me too

@dongjoon-hyun
Copy link
Member Author

Thank you, @HyukjinKwon !

@srowen
Copy link
Member

srowen commented Sep 6, 2018

Merged to master (which looks like is still 2.4)

@asfgit asfgit closed this in 0a5a49a Sep 6, 2018
@dongjoon-hyun
Copy link
Member Author

Thank you for merging, @srowen .

@dongjoon-hyun dongjoon-hyun deleted the SPARK-25337 branch September 6, 2018 04:31
@dongjoon-hyun
Copy link
Member Author

It's great to have this in Spark 2.4!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants