-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to download Spark from a custom URL #125
Changes from all commits
6d8169a
d43012a
93227fc
ee69d99
ee9c00c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,22 +2,20 @@ | |
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Out of curiosity, why is the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is an unfortunately inconsistency, and eventually I think both scripts should be in Python. The reason |
||
set -e | ||
|
||
spark_version="$1" | ||
distribution="$2" | ||
url="$1" | ||
|
||
echo "Installing Spark..." | ||
echo " version: ${spark_version}" | ||
echo " distribution: ${distribution}" | ||
echo " from: ${url}" | ||
|
||
file="spark-${spark_version}-bin-${distribution}.tgz" | ||
file="$(basename ${url})" | ||
|
||
# S3 is generally reliable, but sometimes when launching really large | ||
# clusters it can hiccup on us, in which case we'll need to retry the | ||
# download. | ||
set +e | ||
tries=1 | ||
while true; do | ||
curl --remote-name "https://s3.amazonaws.com/spark-related-packages/${file}" | ||
curl --remote-name "${url}" | ||
curl_ret=$? | ||
|
||
if ((curl_ret == 0)); then | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -202,28 +202,29 @@ def health_check(self, master_host: str): | |
|
||
|
||
class Spark(FlintrockService): | ||
def __init__(self, version: str=None, git_commit: str=None, git_repository: str=None): | ||
def __init__(self, version: str=None, download_source: str=None, | ||
git_commit: str=None, git_repository: str=None): | ||
# TODO: Convert these checks into something that throws a proper exception. | ||
# Perhaps reuse logic from CLI. | ||
assert bool(version) ^ bool(git_commit) | ||
if git_commit: | ||
assert git_repository | ||
|
||
self.version = version | ||
self.download_source = download_source | ||
self.git_commit = git_commit | ||
self.git_repository = git_repository | ||
|
||
self.manifest = { | ||
'version': version, | ||
'download_source': download_source, | ||
'git_commit': git_commit, | ||
'git_repository': git_repository} | ||
|
||
def install( | ||
self, | ||
ssh_client: paramiko.client.SSHClient, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a follow up we could support a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, and that would address #88, though it seems like with this PR you can already choose your distribution at will, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup, you can choose your distribution if you specify your own download source. However, we might want to support the use case of someone only specifying the spark version and distribution. What do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, for now let's leave it like this. I have some vague concerns about "officially" supporting other distributions, in case they have annoying problems that we would have to work around. With the download source option, people who really want a different distribution can get it, and we have a bit more of an excuse to deflect support if there are serious issues. It's definitely something I am open to revisiting in the future, though. |
||
cluster: FlintrockCluster): | ||
# TODO: Allow users to specify the Spark "distribution". (?) | ||
distribution = 'hadoop2.6' | ||
|
||
print("[{h}] Installing Spark...".format( | ||
h=ssh_client.get_transport().getpeername()[0])) | ||
|
@@ -235,15 +236,14 @@ def install( | |
localpath=os.path.join(SCRIPTS_DIR, 'install-spark.sh'), | ||
remotepath='/tmp/install-spark.sh') | ||
sftp.chmod(path='/tmp/install-spark.sh', mode=0o755) | ||
url = self.download_source.format(v=self.version) | ||
ssh_check_output( | ||
client=ssh_client, | ||
command=""" | ||
set -e | ||
/tmp/install-spark.sh {spark_version} {distribution} | ||
/tmp/install-spark.sh {url} | ||
rm -f /tmp/install-spark.sh | ||
""".format( | ||
spark_version=shlex.quote(self.version), | ||
distribution=shlex.quote(distribution))) | ||
""".format(url=shlex.quote(url))) | ||
else: | ||
ssh_check_output( | ||
client=ssh_client, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should as well mention that the
download_source
has to be prebuilt?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps something like this?
And then we should update the matching comment for the Hadoop download source to follow similar formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this seems clearer