-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-6890] [core] Fix launcher lib work with SPARK_PREPEND_CLASSES. #5504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The fix for SPARK-6406 broke the case where sub-processes are launched when SPARK_PREPEND_CLASSES is set, because the code now would only add the launcher's build directory to the sub-process's classpath instead of the complete assembly. This patch fixes the problem by having the launch scripts stash the assembly's location in an environment variable. This is not the prettiest solution, but it avoids having to plumb that location all the way through the Worker code that launches executors. The env variable is always set by the launch scripts, so users cannot override it.
|
I tested on Linux with and without SPARK_PREPEND_CLASSES. I'll try it on Windows tomorrow just to make sure I didn't break anything there. /cc @davies @andrewor14 |
|
Test build #30214 has finished for PR 5504 at commit
|
|
Update: tested on Windows too, looks fine. |
|
Test build #30258 has finished for PR 5504 at commit
|
|
Test output was weird. Jenkins retest this please |
|
The changes LGTM. I'm testing this out locally. |
|
Test build #30271 has finished for PR 5504 at commit
|
|
This works locally. I wonder why it's failing random MLlib tests. |
|
retest this please |
|
Test build #30273 has finished for PR 5504 at commit
|
|
The test results are really flaky, and I don't believe these tests are failing outside of this PR. For this reason I don't think we should merge this patch as is yet, but I haven't dug into why the MLlib tests are failing. |
|
Let me see if I can reproduce it locally. |
|
All tests fail with an NPE in lines that are calling |
|
Actually, it might be caused by this change. |
Ignore fact that assembly location may not be in the environment when running tests. Also handle the case where user code may end up calling this code path, by restoring the code that looks up the spark assembly under SPARK_HOME.
|
The latest change should fix the tests. I realize that now there's some duplication, and that because I had to restore the Java code to look for the assembly (see comment in patch), we don't necessarily need to expose it in an env variable. But being able to control the assembly's location from the shell scripts was the whole point of #5085, so I guess if we want to support that use case, the env variable is still needed. |
|
Test build #30281 has finished for PR 5504 at commit
|
|
Alright, I tested the latest changes locally again and can verify that this patch does fix the problem. Now that it's also passing tests I will merge this into master. Thanks for fixing this so promptly @vanzin. |
|
@vanzin I'm ok with the introduction of an spark assembly env var but it does seem unnecessary. The logic could have been as simple as: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need "&& isEmpty(getenv("SPARK_TESTING")" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has already been pushed, but it's explained in the big comment right before the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (assembly == null) findAssembly() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code has already been pushed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can always push a hot fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But there is nothing to fix.
We don't want to look for the assembly when tests are running because it may not exist. Tests do a lot of "new SparkContext()" with master = local-cluster[blah], and this check ensures those tests work even if the assembly is not there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this check 'ensures' that those tests work, this check 'requires' that those tests work if the assembly is not there (more like an assert). I don't feel strongly for or against it, but it does seem unnecessary.
|
Will submit a hot fix in a bit |
|
I think @vanzin is saying the logic is correct? at least the intent is to not bother searching for the assembly when testing. Do you mean you think that has to change? |
|
Maintaining a non-null value for spark_assembly is a good thing unless there is a good reason to not do so. |
|
The reason is explained in the code comments and in my comments above. Please don't fix what doesn't need fixing. |
The fix for SPARK-6406 broke the case where sub-processes are launched
when SPARK_PREPEND_CLASSES is set, because the code now would only add
the launcher's build directory to the sub-process's classpath instead
of the complete assembly.
This patch fixes the problem by having the launch scripts stash the
assembly's location in an environment variable. This is not the prettiest
solution, but it avoids having to plumb that location all the way through
the Worker code that launches executors. The env variable is always
set by the launch scripts, so users cannot override it.