-
Notifications
You must be signed in to change notification settings - Fork 29k
SPARK-2192 [BUILD] Examples Data Not in Binary Distribution #3480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #23894 has started for PR 3480 at commit
|
|
Test build #23894 has finished for PR 3480 at commit
|
|
Test PASSed. |
|
Hey sean - any reason not to put these in |
|
Generally yes I'd put resources in |
|
Hey Sean - I don't quite understand. The only use of the examples project is to produce the assembly jar for use in distributions, so it seems legitimate to include them as resources for that project. Putting them in the jar would not increase the total size of the distribution, it would just relocate them to being inside of the jar. The examples jar is not used outside of this context, so embedding more data in there doesn't matter, from what I can tell. We actually removed examples from the set of published jars for 1.2. It was sort of weird that we were publishing it since there is no public API in there and just standalone programs. |
|
@pwendell Sure, I agree that the size isn't a big deal, and there's not really a case where you would use the distribution without the examples .jar. Forget the size issue. Really it is that none of the code would read the data files from If there's interest, of course I can make another PR to move these files and update all the docs to look for |
|
Oh I see - yeah I meant we'd also re-write the examples to correctly load example data from the classpath. If something is in |
|
@pwendell The example data do not need to be on the classpath. They are sample data files used by mllib examples, e.g., BinaryClassification, MovieLensALS. Usually the example code is the starting point for users. @srowen 's change makes it easy to run exmaples:
The change looks good to me. |
Simply, add data/ to distributions. This adds about 291KB (compressed) to the tarball, FYI. Author: Sean Owen <sowen@cloudera.com> Closes #3480 from srowen/SPARK-2192 and squashes the following commits: 47688f1 [Sean Owen] Add data/ to distributions (cherry picked from commit 6384f42) Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
Merged into master and branch-1.2. Thanks! |
Simply, add data/ to distributions. This adds about 291KB (compressed) to the tarball, FYI.