Skip to content

Conversation

@vanzin
Copy link
Contributor

@vanzin vanzin commented Apr 14, 2015

The fix for SPARK-6406 broke the case where sub-processes are launched
when SPARK_PREPEND_CLASSES is set, because the code now would only add
the launcher's build directory to the sub-process's classpath instead
of the complete assembly.

This patch fixes the problem by having the launch scripts stash the
assembly's location in an environment variable. This is not the prettiest
solution, but it avoids having to plumb that location all the way through
the Worker code that launches executors. The env variable is always
set by the launch scripts, so users cannot override it.

The fix for SPARK-6406 broke the case where sub-processes are launched
when SPARK_PREPEND_CLASSES is set, because the code now would only add
the launcher's build directory to the sub-process's classpath instead
of the complete assembly.

This patch fixes the problem by having the launch scripts stash the
assembly's location in an environment variable. This is not the prettiest
solution, but it avoids having to plumb that location all the way through
the Worker code that launches executors. The env variable is always
set by the launch scripts, so users cannot override it.
@vanzin
Copy link
Contributor Author

vanzin commented Apr 14, 2015

I tested on Linux with and without SPARK_PREPEND_CLASSES. I'll try it on Windows tomorrow just to make sure I didn't break anything there.

/cc @davies @andrewor14

@SparkQA
Copy link

SparkQA commented Apr 14, 2015

Test build #30214 has finished for PR 5504 at commit 31d3ce8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class MinOf(left: Expression, right: Expression) extends Expression
  • This patch does not change any dependencies.

@vanzin
Copy link
Contributor Author

vanzin commented Apr 14, 2015

Update: tested on Windows too, looks fine.

@SparkQA
Copy link

SparkQA commented Apr 14, 2015

Test build #30258 has finished for PR 5504 at commit ff87a60.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@andrewor14
Copy link
Contributor

Test output was weird. Jenkins retest this please

@andrewor14
Copy link
Contributor

The changes LGTM. I'm testing this out locally.

@SparkQA
Copy link

SparkQA commented Apr 14, 2015

Test build #30271 has finished for PR 5504 at commit ff87a60.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@andrewor14
Copy link
Contributor

This works locally. I wonder why it's failing random MLlib tests.

@andrewor14
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Apr 14, 2015

Test build #30273 has finished for PR 5504 at commit ff87a60.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@andrewor14
Copy link
Contributor

The test results are really flaky, and I don't believe these tests are failing outside of this PR. For this reason I don't think we should merge this patch as is yet, but I haven't dug into why the MLlib tests are failing.

@vanzin
Copy link
Contributor Author

vanzin commented Apr 14, 2015

Let me see if I can reproduce it locally.

@vanzin
Copy link
Contributor Author

vanzin commented Apr 14, 2015

All tests fail with an NPE in lines that are calling SparkEnv.get... shouldn't be caused by this change, but well, I'll take a look anyway.

@vanzin
Copy link
Contributor Author

vanzin commented Apr 14, 2015

Actually, it might be caused by this change.

Ignore fact that assembly location may not be in the environment when
running tests. Also handle the case where user code may end up calling
this code path, by restoring the code that looks up the spark assembly
under SPARK_HOME.
@vanzin
Copy link
Contributor Author

vanzin commented Apr 14, 2015

The latest change should fix the tests. I realize that now there's some duplication, and that because I had to restore the Java code to look for the assembly (see comment in patch), we don't necessarily need to expose it in an env variable. But being able to control the assembly's location from the shell scripts was the whole point of #5085, so I guess if we want to support that use case, the env variable is still needed.

@SparkQA
Copy link

SparkQA commented Apr 15, 2015

Test build #30281 has finished for PR 5504 at commit 7aec921.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch adds the following new dependencies:
    • snappy-java-1.1.1.7.jar
  • This patch removes the following dependencies:
    • snappy-java-1.1.1.6.jar

@andrewor14
Copy link
Contributor

Alright, I tested the latest changes locally again and can verify that this patch does fix the problem. Now that it's also passing tests I will merge this into master. Thanks for fixing this so promptly @vanzin.

@asfgit asfgit closed this in 9717389 Apr 15, 2015
@nishkamravi2
Copy link
Contributor

@vanzin I'm ok with the introduction of an spark assembly env var but it does seem unnecessary. The logic could have been as simple as:
if (SPARK_PREPEND_CLASSES)
find_assembly()
else
getLocation().getPath()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need "&& isEmpty(getenv("SPARK_TESTING")" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has already been pushed, but it's explained in the big comment right before the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (assembly == null) findAssembly() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has already been pushed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can always push a hot fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there is nothing to fix.

We don't want to look for the assembly when tests are running because it may not exist. Tests do a lot of "new SparkContext()" with master = local-cluster[blah], and this check ensures those tests work even if the assembly is not there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this check 'ensures' that those tests work, this check 'requires' that those tests work if the assembly is not there (more like an assert). I don't feel strongly for or against it, but it does seem unnecessary.

@vanzin vanzin deleted the SPARK-6890 branch April 16, 2015 16:30
@nishkamravi2
Copy link
Contributor

Will submit a hot fix in a bit

@srowen
Copy link
Member

srowen commented Apr 16, 2015

I think @vanzin is saying the logic is correct? at least the intent is to not bother searching for the assembly when testing. Do you mean you think that has to change?

@nishkamravi2
Copy link
Contributor

Maintaining a non-null value for spark_assembly is a good thing unless there is a good reason to not do so.

@vanzin
Copy link
Contributor Author

vanzin commented Apr 16, 2015

The reason is explained in the code comments and in my comments above. Please don't fix what doesn't need fixing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants