-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-3405] add subnet-id and vpc-id options to spark_ec2.py #2872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
Awesome! I am glad to see that this was a priority to someone with the time. 👍 |
|
Can one of the admins verify this patch? |
|
+1 |
|
I would be interested in merging this patch as well :) |
|
We had a couple of issues with this patch, in particular the script depends on instances having a public dns name or ip. I had to modify the script a little to get our cluster started, but of course we didn't invest in making it general. See https://github.com/relateiq/spark/commit/48ab2d1c8cccc00a5d26145b4d19a414c17f62c2 Does boto have a best practice for handling VPC? Feels from the bugs I've read that boto+VPC is something everyone has their own workaround for... |
|
I'd love to see this as well. We have a strict vpc policy. |
|
I'll buy anyone willing to take care of this merge coffee via @changetip :) |
|
Hi mvj101, dreid93 sent you a Bitcoin tip worth 1 lunch (21,255 bits/$8.00), and I'm here to deliver it ➔ collect your tip at ChangeTip.com. |
|
Is VPC support slated for the next maintenance release? Support for VPCs is definitely needed for a lot of us, and it'd be great if we didn't have to patch it ourselves. |
Conflicts: ec2/spark_ec2.py
|
The EC2 docs could also be updated to include these new switches. |
|
We at Radius would really love to see this merged in as well. |
|
@jontg I'm using this patch with your modifications (private_ip_address), but I'm getting the following errors when the script tries and starts the master: SHUTDOWN_MSG: Shutting down NameNode at java.net.UnknownHostException: ip-10-0-2-213: ip-10-0-2-213 10.0.2.213 is the master's ip in this case, but it looks like it's picking up ip-10-0-2-213 as the hostname and that isn't resolving. Did you run into anything like this, and if so, how'd you resolve it? Thanks! |
|
Just a heads up / bump. I am buying everyone a coffee ( |
|
@tylerprete That might occur if your VPC is not set up to auto-assign DNS records. If you can, that is where I would suggest beginning an investigation. |
|
@jontg thanks for the help. Turned on dns and now everything is working. |
ec2/spark_ec2.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that we need to have separate logic for the VPC / non-VPC security group rule creation, since according to Amazon's Differences Between Security Groups for EC2-Classic and EC2-VPC guide, when using EC2-VPC:
When you add a rule to a security group, you must specify a protocol, and it can be any protocol with a standard protocol number, or all protocols (see Protocol Numbers).
In the non-VPC case, I guess that the
master_group.authorize(src_group=master_group)
master_group.authorize(src_group=slave_group)
lines are authorizing all inbound traffic from instances belonging to master group, regardless of the protocol / ports of that traffic.
However, it looks like the VPC case here only authorizes TCP traffic. I don't think that we rely on UDP traffic anywhere, but for consistency's sake it would be good if both the VPC and non-VPC branches here created equivalent rules.
|
Since this is an often-requested feature, we should mention this in the EC2 documentation page: https://github.com/apache/spark/blob/master/docs/ec2-scripts.md |
ec2/spark_ec2.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor grammar / style nit, but what do you think about "VPC subnet to launch instances in" as the help text?
|
Overall, this looks good to me. I left a couple of nitpicky comments, but besides that + documentation, I'd be happy to merge this. To address a question asked upthread:
This probably won't be merged into Spark 1.2.1/1.1.2 since our policy is to not add new features in maintenance releases. However, newer versions of Spark EC2 are capable of launching clusters with older Spark versions, so you'd be able to use Spark 1.3.0's scripts to launch clusters in your VPC using, say, Spark 1.2.0. |
|
Thanks, I believe I've updated the code according to your comments. Mike |
|
Even though we don't have Jenkins tests for the EC2 scripts, I'm just going to have Jenkins run this so that I can avoid an inadvertent build break. Jenkins, this is ok to test. (Edit: I guess Jenkins only picks up commands if they're the only content in a comment?) |
|
Jenkins, this is ok to test. |
|
Test build #24486 has started for PR 2872 at commit
|
|
Test build #24486 has finished for PR 2872 at commit
|
|
Test FAILed. |
|
Fixing style issues now. |
|
Test build #24488 has started for PR 2872 at commit
|
|
Test build #24488 has finished for PR 2872 at commit
|
|
Test PASSed. |
|
Thanks for fixing up the style issue. This looks good to me, so I'll merge this into |
|
@amar-analytx here's a coffee for making the gist that @mvj101 based his initial PR on. @changetip |
|
Hi @amar-analytx, @dreid93 sent you a Bitcoin tip worth a coffee (4,526 bits/$1.50), and I'm here to deliver it ➔ collect your tip. |
|
@mvj101 a coffee for you sir @changetip |
|
@jontg may I buy you a coffee for your work helping people with this issue? @changetip |
|
@JoshRosen thanks for merging this in. Here's a coffee @changetip |
|
@dreid93 I certainly wouldn't say no... ;-) |
|
@jontg a coffee for you sir @changetip |
|
@changetip does not appear to be picking up my mentions and sending the appropriate tip. :/ |
|
It looks like this PR may have broken the ability to launch spot clusters: Traceback (most recent call last):
File "./spark_ec2.py", line 1147, in <module>
main()
File "./spark_ec2.py", line 1139, in main
real_main()
File "./spark_ec2.py", line 988, in real_main
(master_nodes, slave_nodes) = launch_cluster(conn, opts, cluster_name)
File "./spark_ec2.py", line 437, in launch_cluster
user_data=user_data_content)
TypeError: request_spot_instances() got an unexpected keyword argument 'security_group_ids'It looks like the latest version of Boto supports this argument (http://boto.readthedocs.org/en/latest/ref/ec2.html#boto.ec2.connection.EC2Connection.request_spot_instances), but not ours. We're using boto 2.4.1, which was released on May 16, 2012, but this feature was only added August 2012: boto/boto@145a899 I might be able to fix this by just upgrading to a newer version of Boto. |
|
Oops, apologies for this breakage. I haven't worked with spot instances. Feel free to revert this pull request and I or someone else can address that corner case as time allows. Thanks, Mike |
Let's just update boto. I'll submit a PR for this shortly. |
|
Actually, I'm going to revert this for now. Looks like the |
|
Ugh, this doesn't revert cleanly due to another patch that I merged. I've got to go, so I'm just going to leave this for now. Someone else can deal with this if it's urgent, otherwise I'll do it tomorrow. |
|
Ok, I'll send a PR to revert in a few minutes. Thanks, |
|
I've opened a PR to upgrade the Boto version, which fixes this issue: #3737 |
|
Couple of quick questions about this, just to confirm: do I always need to specify the |
|
Haven't worked with this in a while and different versions of boto may alter things, but
|
Based on this gist:
https://gist.github.com/amar-analytx/0b62543621e1f246c0a2
We use security group ids instead of security group to get around this issue:
boto/boto#350