-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-5830][Core]Don't create unnecessary directory for local root dir #4620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Since this appears to be exactly the same as https://issues.apache.org/jira/browse/SPARK-5801 I wonder if you can reassociate to that JIRA? and consider whether it solves the deeper nesting reported there? |
|
ok to test |
|
Test build #27553 timed out for PR 4620 at commit |
|
retest this please |
|
Test build #27560 has finished for PR 4620 at commit
|
|
@srowen yes, this is same as SPARK-5801. In standalone, worker will create temp directories for executor, so if we create an unnecessary directory for local root directory, then when we create temp directory will create too many nested directories. |
For PR "Make sure only owner can read / write to directories created for the …" will create a subdirectory for local root dir, but this will cause nested directories, and the subdirectory for local root dir will not be deleted, which will create too many empty directories on the local root dir.
|
Test build #27604 has finished for PR 4620 at commit
|
|
Test build #27611 has finished for PR 4620 at commit
|
|
Test build #27623 has finished for PR 4620 at commit
|
|
@Sephiroth-Lin yes I think this should be directed at SPARK-5801 then. SPARK-5830 is a duplicate. Does this correct the many levels of extra temp dirs? it sounds like you're addressing a case where there is one extra but I probably misunderstand. |
|
@srowen as in function "getOrCreateLocalRootDirs" will create a subdirectory for root local dir, then if we call "getLocalDir" will create a subdirectory for root local dir who call getOrCreateLocalRootDirs directly. In current master branch, when we create tmp dir will call getLocaalDir first, so it will create nested directories. And in standalone mode, will create tmp dir first when lunch executor, so total it will create 4 levels directories, in other mode it will create 2 levels directories for all tmp dir. |
|
@srowen ok, thank you. If this subdirectory is really needed, may be we can add code to delete this subdirectory after jvm exit or sc.stop(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value of spark.local.dir is /tmp, which should exist on all linux systems at least. If I've read the patch right, that means that getOrCreateLocalRootDirs will simply return /tmp, which is usually readable by everyone. Presumably, the spark worker will then just write a load of stuff into /tmp. It seems that the whole purpose of the previous behaviour was to create an isolated directory inside spark.local.dir, because it was assumed that this value was a shared directory.
Is this not desirable behaviour, or have I missed something?
|
Yeah, the problem was that it was creating a cascade of |
|
Sorry, but this patch is not correct. As @growse mentions, when The correct fix here, if you really care about cleaning up the extra directory, is to export a different env variable from the @srowen the current code shouldn't create a cascade of directories, but it does create a two-level-deep "spark-xxxx" hierarchy for executors in standalone mode. |
|
Ah, wait, there's a second problem (which would result in the cascading directories, I think). |
|
Hi, me again, sorry for the spam. Regarding my last comment, it's probably better if |
|
I think we should close this PR, on the grounds that the proposed change apparently goes a step too far to remove all nested directories. SPARK-5830 is a duplicate of SPARK-5801, which concerns creating more than 1 nested |
|
Mind closing this PR? |
|
@srowen ok, pls help to close this. |
|
@Sephiroth-Lin you will have to close it yourself |
Now will create an unnecessary directory for local root directory, and this directory will not be deleted after application exit.
For example:
before will create tmp dir like "/tmp/spark-UUID"
now will create tmp dir like "/tmp/spark-UUID/spark-UUID"
so the dir "/tmp/spark-UUID" will not be deleted as a local root directory.