Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2471] remove runtime scope for jets3t #1402

Closed
wants to merge 1 commit into from

Conversation

mengxr
Copy link
Contributor

@mengxr mengxr commented Jul 14, 2014

The assembly jar (built by sbt) doesn't include jets3t if we set it to runtime only, but I don't know whether it was set this way for a particular reason.

CC: @srowen @ScrapCodes

@SparkQA
Copy link

SparkQA commented Jul 14, 2014

QA tests have started for PR 1402. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16618/consoleFull

@srowen
Copy link
Member

srowen commented Jul 14, 2014

It's set that way just because it is only used by Hadoop's FileSystem to access S3. Code shouldn't call it directly. Maven should therefore include it in the runtime classpath and assembled jars, but not the compile classpath.

I seem to distantly recall that sbt doesn't quite do this. I hope there is an equivalent for sbt as that's the right thing to do. It's not the end of the world to make jets3t appear on the compile classpath.

(So I assume the sbt assembly still has to be supported? the Maven one should be OK in this regard)

@mengxr
Copy link
Contributor Author

mengxr commented Jul 14, 2014

I see. sbt-assembly doesn't pick up runtime only jars by default or maybe it doesn't read pom correctly.

@ScrapCodes Do you know whether we can tell sbt-assembly to include runtime jars? It is weird that it doesn't do that by default or maybe it is the sbt-pom-reader's problem.

@ScrapCodes
Copy link
Member

Hi,

I just checked it is correctly included by doing show runtime:full-classpath. Have to check why it is not in assembly.

Prashant Sharma

On Mon, Jul 14, 2014 at 2:35 PM, Xiangrui Meng notifications@github.com
wrote:

I see. sbt-assembly doesn't pick up runtime only jars by default or maybe
it doesn't read pom correctly.

@ScrapCodes https://github.com/ScrapCodes Do you know whether we can
tell sbt-assembly to include runtime jars? It is weird that it doesn't do
that by default or maybe it is the sbt-pom-reader's problem.


Reply to this email directly or view it on GitHub
#1402 (comment).

@ScrapCodes
Copy link
Member

It is actually intended that way, assembly is not intended as a replacement for runtime classpath. One way to go about is to not have it as runtime dependency.

@srowen
Copy link
Member

srowen commented Jul 14, 2014

What is the assembly then? I always took this term to mean "all of the runtime dependencies together". How would I make a runnable JAR in SBT in general?

@ScrapCodes
Copy link
Member

assembly should just include compile scope. Take for example the case of slf4j where end user can bring in any of the log4j or logback. So they are supposed to be at runtime classpath. And putting them in assembly jar will kill the purpose of having this pluggability at runtime.

@srowen
Copy link
Member

srowen commented Jul 14, 2014

The SLF4J binding is a runtime dependency, and it may be one that a down-stream consumer wants to override. But leaving it out entirely yields no runtime binding at all -- no logging. There is an argument for not including any SLF4J runtime dependency in Spark at all, and which requires users to add their own to get any log messages. Debatable -- but a different question.

SLF4J is a bit of a funny example though. What about other runtime dependencies? jets3t is a good one. It has to be on the runtime classpath, so needs to go into a runtime assembly JAR. It does not need to appear on the compile classpath, and shouldn't ideally.

This is not the same as "provided" scope, where the user is expected to bring the artifact from the local environment.

What's the standard SBT magic for this?

@SparkQA
Copy link

SparkQA commented Jul 14, 2014

QA results for PR 1402:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16618/consoleFull

@mengxr
Copy link
Contributor Author

mengxr commented Jul 14, 2014

@ScrapCodes I agree with @srowen that the assembly jar should include runtime libraries except those marked "provided". If we cannot find a way to let sbt understand maven runtime scope, we should merge this change because without jets3t all S3 operations are broken.

@pwendell
Copy link
Contributor

LGTM for now - I opened a separate JIRA to improve the sbt plug-in to fix this.

@asfgit asfgit closed this in a21f9a7 Jul 15, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
The assembly jar (built by sbt) doesn't include jets3t if we set it to runtime only, but I don't know whether it was set this way for a particular reason.

CC: srowen ScrapCodes

Author: Xiangrui Meng <meng@databricks.com>

Closes apache#1402 from mengxr/jets3t and squashes the following commits:

bfa2d17 [Xiangrui Meng] remove runtime scope for jets3t
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants