Skip to content

Conversation

@ghost
Copy link

@ghost ghost commented Oct 21, 2014

Based on this gist:
https://gist.github.com/amar-analytx/0b62543621e1f246c0a2

We use security group ids instead of security group to get around this issue:
boto/boto#350

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@ddaws
Copy link

ddaws commented Oct 21, 2014

Awesome! I am glad to see that this was a priority to someone with the time. 👍

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@jontg
Copy link

jontg commented Nov 19, 2014

+1

@Nypias
Copy link

Nypias commented Nov 22, 2014

I would be interested in merging this patch as well :)

@jontg
Copy link

jontg commented Nov 22, 2014

We had a couple of issues with this patch, in particular the script depends on instances having a public dns name or ip. I had to modify the script a little to get our cluster started, but of course we didn't invest in making it general. See https://github.com/relateiq/spark/commit/48ab2d1c8cccc00a5d26145b4d19a414c17f62c2 Does boto have a best practice for handling VPC?  Feels from the bugs I've read that boto+VPC is something everyone has their own workaround for...

@brdw
Copy link

brdw commented Dec 5, 2014

I'd love to see this as well. We have a strict vpc policy.

@ddaws
Copy link

ddaws commented Dec 5, 2014

I'll buy anyone willing to take care of this merge coffee via @changetip :)

@changetip
Copy link

Hi mvj101, dreid93 sent you a Bitcoin tip worth 1 lunch (21,255 bits/$8.00), and I'm here to deliver it ➔ collect your tip at ChangeTip.com.

Learn more about ChangeTip

@evanvolgas
Copy link

Is VPC support slated for the next maintenance release? Support for VPCs is definitely needed for a lot of us, and it'd be great if we didn't have to patch it ourselves.

Conflicts:
	ec2/spark_ec2.py
@jeffsteinmetz
Copy link
Contributor

The EC2 docs could also be updated to include these new switches.

@tylerprete
Copy link

We at Radius would really love to see this merged in as well.

@tylerprete
Copy link

@jontg I'm using this patch with your modifications (private_ip_address), but I'm getting the following errors when the script tries and starts the master:

SHUTDOWN_MSG: Shutting down NameNode at java.net.UnknownHostException: ip-10-0-2-213: ip-10-0-2-213

10.0.2.213 is the master's ip in this case, but it looks like it's picking up ip-10-0-2-213 as the hostname and that isn't resolving. Did you run into anything like this, and if so, how'd you resolve it?

Thanks!

@ddaws
Copy link

ddaws commented Dec 12, 2014

Just a heads up / bump. I am buying everyone a coffee (
in bitcoin ;) ) who contributes to getting this merged in!

@jontg
Copy link

jontg commented Dec 12, 2014

@tylerprete That might occur if your VPC is not set up to auto-assign DNS records. If you can, that is where I would suggest beginning an investigation.

@tylerprete
Copy link

@jontg thanks for the help. Turned on dns and now everything is working.

ec2/spark_ec2.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that we need to have separate logic for the VPC / non-VPC security group rule creation, since according to Amazon's Differences Between Security Groups for EC2-Classic and EC2-VPC guide, when using EC2-VPC:

When you add a rule to a security group, you must specify a protocol, and it can be any protocol with a standard protocol number, or all protocols (see Protocol Numbers).

In the non-VPC case, I guess that the

master_group.authorize(src_group=master_group)
master_group.authorize(src_group=slave_group)

lines are authorizing all inbound traffic from instances belonging to master group, regardless of the protocol / ports of that traffic.

However, it looks like the VPC case here only authorizes TCP traffic. I don't think that we rely on UDP traffic anywhere, but for consistency's sake it would be good if both the VPC and non-VPC branches here created equivalent rules.

@JoshRosen
Copy link
Contributor

Since this is an often-requested feature, we should mention this in the EC2 documentation page: https://github.com/apache/spark/blob/master/docs/ec2-scripts.md

ec2/spark_ec2.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor grammar / style nit, but what do you think about "VPC subnet to launch instances in" as the help text?

@JoshRosen
Copy link
Contributor

Overall, this looks good to me. I left a couple of nitpicky comments, but besides that + documentation, I'd be happy to merge this.

To address a question asked upthread:

Is VPC support slated for the next maintenance release? Support for VPCs is definitely needed for a lot of us, and it'd be great if we didn't have to patch it ourselves.

This probably won't be merged into Spark 1.2.1/1.1.2 since our policy is to not add new features in maintenance releases. However, newer versions of Spark EC2 are capable of launching clusters with older Spark versions, so you'd be able to use Spark 1.3.0's scripts to launch clusters in your VPC using, say, Spark 1.2.0.

@ghost
Copy link
Author

ghost commented Dec 16, 2014

Thanks, I believe I've updated the code according to your comments.

Mike

@JoshRosen
Copy link
Contributor

Even though we don't have Jenkins tests for the EC2 scripts, I'm just going to have Jenkins run this so that I can avoid an inadvertent build break.

Jenkins, this is ok to test.

(Edit: I guess Jenkins only picks up commands if they're the only content in a comment?)

@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@SparkQA
Copy link

SparkQA commented Dec 16, 2014

Test build #24486 has started for PR 2872 at commit 4dc6756.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 16, 2014

Test build #24486 has finished for PR 2872 at commit 4dc6756.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24486/
Test FAILed.

@ghost
Copy link
Author

ghost commented Dec 16, 2014

Fixing style issues now.

@SparkQA
Copy link

SparkQA commented Dec 16, 2014

Test build #24488 has started for PR 2872 at commit be9cb43.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 16, 2014

Test build #24488 has finished for PR 2872 at commit be9cb43.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24488/
Test PASSed.

@JoshRosen
Copy link
Contributor

Thanks for fixing up the style issue. This looks good to me, so I'll merge this into master. Thanks for your patience with the slow review!

@asfgit asfgit closed this in d12c071 Dec 16, 2014
@ddaws
Copy link

ddaws commented Dec 17, 2014

@amar-analytx here's a coffee for making the gist that @mvj101 based his initial PR on. @changetip

@changetip
Copy link

Hi @amar-analytx, @dreid93 sent you a Bitcoin tip worth a coffee (4,526 bits/$1.50), and I'm here to deliver it ➔ collect your tip.

Learn more about ChangeTip

@ddaws
Copy link

ddaws commented Dec 17, 2014

@mvj101 a coffee for you sir @changetip

@ddaws
Copy link

ddaws commented Dec 17, 2014

@jontg may I buy you a coffee for your work helping people with this issue? @changetip

@ddaws
Copy link

ddaws commented Dec 17, 2014

@JoshRosen thanks for merging this in. Here's a coffee @changetip

@jontg
Copy link

jontg commented Dec 17, 2014

@dreid93 I certainly wouldn't say no... ;-)

@ddaws
Copy link

ddaws commented Dec 18, 2014

@jontg a coffee for you sir @changetip

@ddaws
Copy link

ddaws commented Dec 18, 2014

@changetip does not appear to be picking up my mentions and sending the appropriate tip. :/

@JoshRosen
Copy link
Contributor

It looks like this PR may have broken the ability to launch spot clusters:

Traceback (most recent call last):
  File "./spark_ec2.py", line 1147, in <module>
    main()
  File "./spark_ec2.py", line 1139, in main
    real_main()
  File "./spark_ec2.py", line 988, in real_main
    (master_nodes, slave_nodes) = launch_cluster(conn, opts, cluster_name)
  File "./spark_ec2.py", line 437, in launch_cluster
    user_data=user_data_content)
TypeError: request_spot_instances() got an unexpected keyword argument 'security_group_ids'

It looks like the latest version of Boto supports this argument (http://boto.readthedocs.org/en/latest/ref/ec2.html#boto.ec2.connection.EC2Connection.request_spot_instances), but not ours. We're using boto 2.4.1, which was released on May 16, 2012, but this feature was only added August 2012: boto/boto@145a899

I might be able to fix this by just upgrading to a newer version of Boto.

@ghost
Copy link
Author

ghost commented Dec 18, 2014

Oops, apologies for this breakage. I haven't worked with spot instances. Feel free to revert this pull request and I or someone else can address that corner case as time allows.

Thanks,

Mike

@JoshRosen
Copy link
Contributor

I or someone else can address that corner case as time allows.

Let's just update boto. I'll submit a PR for this shortly.

@JoshRosen
Copy link
Contributor

Actually, I'm going to revert this for now. Looks like the boto update will take a bit more work than I thought.

@JoshRosen
Copy link
Contributor

Ugh, this doesn't revert cleanly due to another patch that I merged. I've got to go, so I'm just going to leave this for now. Someone else can deal with this if it's urgent, otherwise I'll do it tomorrow.

@ghost
Copy link
Author

ghost commented Dec 18, 2014

Ok, I'll send a PR to revert in a few minutes.

Thanks,
Mike

@ghost
Copy link
Author

ghost commented Dec 18, 2014

#3728

@JoshRosen
Copy link
Contributor

I've opened a PR to upgrade the Boto version, which fixes this issue: #3737

@JoshRosen
Copy link
Contributor

Couple of quick questions about this, just to confirm: do I always need to specify the --subnet-id option when using the --vpc-id option? Do I have to run ./spark-ec2 from a node that's inside the VPC?

@ghost
Copy link
Author

ghost commented Dec 19, 2014

Haven't worked with this in a while and different versions of boto may alter things, but

  1. You might not need to specify a subnet if you have a default VPC configured. In general I think if you don't specify a subnet boto ignores the vpc argument. Since VPC/subnet can be tied to different regions it's a good practice to specify everything as a sanity check.
  2. Should be able to to run ./spark-ec2 from any box connected to the internet - doesn't need to be run within ec2 or within a vpc/subnet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.