-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-5242]: Add --private-ips flag to EC2 script #5244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
You could take this all the way to implement what's described in SPARK-6220 as a more general version of this. And I think it would also be considered a resolution to SPARK-5242 and SPARK-5246. This overlaps with existing work at #4038 (see also mesos/spark-ec2#91 and mesos/spark-ec2#92) It'd be great to get a resolution to all of this since it keeps coming up in duplicate. |
|
Ooops. Didn't realize this was a duplicate. I don't think I'm going to have time in the immediate future to make a general version, but the private IP stuff is a total blocker for us. |
|
@nchammas @voukka what do you think about this as a solution for SPARK-5242 and/or SPARK-5246? |
|
I'll look into this next week, but @mdagost do you mind sharing what you are using spark-ec2 for? It's always good to hear about real-life use cases from users. |
|
@nchammas At the moment, we're using Spark for large scale ALS. In the future probably other things too. And we use spark-ec2 to spin up and tear down clusters as we need them. |
|
@nchammas Did you get a chance to take a look? |
ec2/spark_ec2.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: No possession or contraction, so "IPs" and not "IP's".
Use private instead of public IPs for instances if the VPC or subnet requires that.
|
This is a well thought-out change. I prefer the explicit @mdagost Did you try the patch in #4038? Did it not meet your needs? Also, while this change looks good, it touches a lot of code. Unfortunately, we don't have any automated test suites for spark-ec2 at this time. In lieu of that, could you use Sorry to put this extra work on you, but that will help us merge this in with some confidence. cc @shivaram |
ec2/spark_ec2.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@srowen my 2 cents: I like https://issues.apache.org/jira/browse/SPARK-6220 as a general solution better as it is naturally extensible, but it is a lot less user-friendly. Probably they need to complement each other. For example, one can use --private-ips as parameter on cli, but for more options to pass one can use feature of https://issues.apache.org/jira/browse/SPARK-6220 and for cleaner cli feature of https://issues.apache.org/jira/browse/SPARK-925 could be used. I admit it looks like overkill. Still, I'd like to say that explicit parameters are easier to use. |
|
@mdagost how about addressing the minor comment typo and considering wrapping up the conditional logic -- up to you whether that is worthwhile. Otherwise looking good to go. |
|
Okay. I fixed the IP's mis-spelling and refactored the conditional logic into a couple of helper functions, which should be more readable now. @nchammas I wrote this code and submitted the PR before I realized it was a dupe, so I never tried #4038 . In terms of testing, I've verified that this works with our AWS setup to Do you actually want me to run the |
|
Oh, if you've tested out the relevant commands then that's great. I just wanted to know that we checked nothing was broken with this change. :) |
|
Yep. I've been using it for the last week :) On Mon, Apr 6, 2015 at 10:45 AM, Nicholas Chammas notifications@github.com
|
|
Sorry I missed some of the earlier discussion, but the change looks pretty simple and seems like a useful addition. Does this also address #4038 ? |
Yes, I believe it does. |
|
@mdagost Do you mind rebase this one? it looks good but doesn't merge at the moment. |
|
Also, since this was a duplicate of SPARK-5242, edit the title please to point it at the non-duplicate JIRA. |
c67a1a9 to
a4a2eac
Compare
|
ok to test |
|
Test build #29853 has started for PR 5244 at commit |
|
Test build #29853 has finished for PR 5244 at commit
|
|
Test FAILed. |
|
@srowen @nchammas Sorry about that. The lint checker should pass now. One thing I forgot to mention yesterday: apparently someone got rid of the last place in the code where an instance was referred to by IP rather than DNS name. That means the helper function |
|
Test build #29862 has started for PR 5244 at commit |
|
Test build #29862 has finished for PR 5244 at commit
|
|
Test PASSed. |
The
spark_ec2.pyscript currently references theip_addressandpublic_dns_nameattributes of an instance. On private networks, these fields aren't set, so we have problems.This PR introduces a
--private-ipsflag that instead refers to theprivate_ip_addressattribute in both cases.