-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new option for an alternate mirror for spark binaries #104
Conversation
@rmessner thank you for tackling this. As you experienced, it is useful in times when, for whatever reason, the default Spark packages on S3 that Flintrock uses are corrupt. We will probably want to use this pattern here to also solve #71. As such, I'd like the option name and URL template to be consistent with what's proposed there. Is that OK with you? |
@@ -1,6 +1,7 @@ | |||
services: | |||
spark: | |||
version: 1.6.0 | |||
# preferred-mirror: # optional; default to 'https://s3.amazonaws.com/spark-related-packages/${file}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As proposed in #71, I prefer {v}
or even {version}
instead of ${file}
, since the latter's exact meaning to the user is not clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so to be quite clear, i will use these variables in my default value in the script :
spark_version
hadoop_distribution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're talking about the config template, I think we just need a {version}
variable. It seems unnecessary to say spark_...
inside the Spark config.
As for the Hadoop distribution, we currently don't support specifying that, unless you are also intending to tackle #88.
Okay, i will use mirror instead of preferred_mirror to be consistent across flintrock |
#71 uses |
Okay, i'll use download-source instead of mirror. For the variable names, it's okay for you as well ? |
Not sure what you're referring to. I think the proposal laid out in #71 for Hadoop should give you a good template for what to name things. If you're still not sure, it might be easier to just update the PR and I'll comment on the line items as necessary. |
@@ -1,6 +1,8 @@ | |||
services: | |||
spark: | |||
version: 1.6.0 | |||
# distribution: # optional; default to '2.6' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nitpick: Two spaces before the #
; "defaults" and not "default"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, can we leave out the ability to specify distribution for now? I'm not sure about how best to name this option (e.g. there are non-Hadoop distributions like CDH, but we are assuming Hadoop) and, more importantly, I haven't fully considered the implications of supporting user-specified distributions.
@rmessner are you still working on this? |
Fixes #101.