Skip to content

Conversation

@tone-zhang
Copy link

What changes were proposed in this pull request?

Check the database warehouse used in Spark UT, and remove the existing database file before run the UT (SPARK-8368).

How was this patch tested?

Run Spark UT with the command for several times:
./build/sbt -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver "test-only HiveSparkSubmitSuit"
Without the patch, the test case can be passed only at the first time, and always failed from the second time.
With the patch the test case always can be passed correctly.

@srowen
Copy link
Member

srowen commented Aug 31, 2016

How would this relate to SPARK-8368?

We already have a Utils method for deleting recursively already; please use that.

Is this the only path that leaves behind spark-warehouse?
If it's definitely cleaned up we should remove it from .gitignore

@tone-zhang
Copy link
Author

@srowen Thanks for the comments.
I wrote SPARK-8368 here just because the UT case name is "SPARK-8368: includes jars passed in through --jars".
For the Utils method you mentioned, could you please share me more hints? Do you mean I just need to remove "spark-warehouse" from .gitignore? If it is, I will have a try, and update the PR.
Thanks a lot!

@srowen
Copy link
Member

srowen commented Aug 31, 2016

OK, but the change isn't actually related to SPARK-8368, so I think you can remove that link in the title and the link that was created in the JIRA.

See Utils.deleteRecursively. You don't need to implement it yourself.

No, I'm saying that if you find all instances of tests that don't clean up spark-warehouse (there may be more, and we should try to fix all of them), then there's also no reason to have to ignore it in git. But that's not the fix, just a cleanup.

@tone-zhang tone-zhang changed the title [SPARK-17330] [SPARK UT] Fix the failure Spark UT (SPARK-8368) case [SPARK-17330] [SPARK UT] Clean up spark-warehouse in UT Sep 1, 2016
@tone-zhang
Copy link
Author

@srowen Thanks a lot for your help.
I have updated the code and used the Utils.deleteRecursively, the Utils method is very helpful. Thanks!
For the time being, I found the UT case is the only one which is impacted.
I will update the title in Jira for #17330, and remove the link in 8368.
For the updated code, could you please have a review?
Thanks a lot!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a bandaid. Where is this directory created and not cleaned up in the first place? I see other tests that create it, so, it's probably best to fix the source?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen Thanks a lot for your help!
Yes, it is better to clean up the temporary path at the source. I will have a check and update the PR.
Thanks!

@tone-zhang
Copy link
Author

In Spark UT, the temporary data file should be stored in spark/target/tmp not in spark/spark-warehouse. The Spark UT suite will clean up the path spark/target/tmp when finish the test. It is the existing mechanism.
In some test cases, the property "spark.sql.warehouse.dir" of SQLConf is not set, as default, the spark/spark-warehouse will be used to store the temporary data file, it is in correct.

@srowen Thanks a lot for your comments. Could you please have a review? Thanks!

@srowen
Copy link
Member

srowen commented Sep 6, 2016

This is better. It looks like DDLSuite tests the default location which is ./spark-warehouse in "Create Database using Default Warehouse Path". That's fine but it fails to clean up the directory. I think it needs to clean up there too. You can see that elsewhere it uses this withTempDir { tempDir => idiom to correctly set this to a temp dir.

Although I agree with your changes, aren't there other places in the same file that need the same treatment?

Ultimately the goal is to see that after a successful test run, no spark-warehouse exists. If that's true then we can remove it from .gitignore and .rat-excludes. CC @liancheng re: 63db2bd

@tone-zhang
Copy link
Author

@srowen I checked DDLSuite.scala, the case "Create Database using Default Warehouse Path" is used to test the default "spark-warehouse" setting. In the doc, the default path should be "("user.dir")/spark-warehouse". I prefer to keep the path, just remove it after finish the case manually.
For the "tempDir" in DDLSuit.scala, I confirm it is point to the temporary path. It is safe enough.
When check the cases in DDLSuite.scala, I found the "spark-warehouse" is always created by the UT frame. At the final, the directory is empty and it will not negative impact on the UT. I tried to set the WAREHOUSE_PATH of SQLConf to redirect to the temporary path, but failed. May I delete the "spark-warehouse" in function afterEach()? Or could you please give me some suggestion? Thanks a lot!

@srowen
Copy link
Member

srowen commented Sep 7, 2016

Yes, I think it wouldn't hurt to always try to delete the default spark-warehouse in this suite after each test. Try running all tests after you're done to see if anything leaves it behind; the goal is to get rid of the entry in gitignore.

When run Spark UT based on the latest version of master branch,
One UT case is failed becaues the temporary data files have not been
cleaned up.
The UT name is "SPARK-8368: includes jars passed in through --jars".
In Spark UT, the temorary data file should be stored in spark/target/tmp
not in spark/spark-warehouse.
@tone-zhang
Copy link
Author

Update the PR to clean up the "spark-warehouse" in DDLSuite.scala.
@srowen I have run the whole Spark UT with the PR, I cannot see "spark/spark-warehoues" after the test. It seems reaches the goal.
Could you please have a review? Thanks a lot!

@srowen
Copy link
Member

srowen commented Sep 10, 2016

Jenkins test this please

@SparkQA
Copy link

SparkQA commented Sep 10, 2016

Test build #65205 has finished for PR 14894 at commit cd8e9a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Sep 11, 2016

I'm OK with this as minor cleanup. It can't hurt.

@srowen
Copy link
Member

srowen commented Sep 11, 2016

Merged to master

@asfgit asfgit closed this in bf22217 Sep 11, 2016
wgtmac pushed a commit to wgtmac/spark that referenced this pull request Sep 19, 2016
## What changes were proposed in this pull request?

Check the database warehouse used in Spark UT, and remove the existing database file before run the UT (SPARK-8368).

## How was this patch tested?

Run Spark UT with the command for several times:
./build/sbt -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver "test-only *HiveSparkSubmitSuit*"
Without the patch, the test case can be passed only at the first time, and always failed from the second time.
With the patch the test case always can be passed correctly.

Author: tone-zhang <tone.zhang@linaro.org>

Closes apache#14894 from tone-zhang/issue1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants