-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to download Spark from a custom URL #125
Conversation
'git_commit': git_commit, | ||
'git_repository': git_repository} | ||
|
||
def install( | ||
self, | ||
ssh_client: paramiko.client.SSHClient, | ||
cluster: FlintrockCluster): | ||
# TODO: Allow users to specify the Spark "distribution". (?) | ||
distribution = 'hadoop2.6' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow up we could support a {d}
template in download_source
as is done with the version with {v}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, and that would address #88, though it seems like with this PR you can already choose your distribution at will, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, you can choose your distribution if you specify your own download source.
However, we might want to support the use case of someone only specifying the spark version and distribution. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, for now let's leave it like this. I have some vague concerns about "officially" supporting other distributions, in case they have annoying problems that we would have to work around. With the download source option, people who really want a different distribution can get it, and we have a bit more of an excuse to deflect support if there are serious issues.
It's definitely something I am open to revisiting in the future, though.
Thank you for this PR @BenFradet. Looks good to me! I left some minor comments. |
Great, thanks for your review, will update accordingly. |
This is slick -- S3 support via the Spark hadoop-2.4 binary is pretty convenient. Is there anything remaining to get this merged in? I confirmed this PR works by merging into master and doing the following: download-source: "http://mirror.cc.columbia.edu/pub/software/apache/spark/spark-1.6.2/spark-1.6.2-bin-hadoop2.4.tgz"
Then successfully fetching some data from S3:
|
Took a second look at this. Looks good to me. And thanks @ereed-tesla for testing it out. It speeds things up for me since I can skip on testing it myself if I am already comfortable with the PR. Merging this in. |
* master: 0.6.0 dev begins add some minor steps update standalone version in example this is 0.5.0 upgrade dependencies (nchammas#128) use latest Amazon Linux AMI rephrase note about future Windows support remove note about squashing PR commits up default Spark version to 1.6.2 add CHANGES for spark download source and additional security groups rename some internals related to security groups Resolve nchammas#72 add --ec2-security-group flag support (nchammas#112) added HADOOP_LIBEXEC_DIR env var (nchammas#127) Add option to download Spark from a custom URL (nchammas#125) add custom Hadoop URL change; reformat Markdown links
This PR adds a
download_source
to the Spark service as has been done in #118.I created clusters with both with and without a
download_source
to test the new feature.Fixes #101, fixes #88.
Closes #104.