Add support for launching in private vpc #123

jperezdiaz · 2016-05-26T05:06:56Z

This PR makes the following changes:

Queries map_public_ip_on_launch to figure out if the subnet auto-assigns public ips

Fixes #14 .

BenFradet · 2016-05-26T12:32:24Z

You can run py.test tests/test_static.py to check style issues as detailed in the test guide.

nchammas · 2016-05-26T19:18:46Z

flintrock/core.py

+            command="""
+                set -e
+
+                fullname=`hostname`.ec2.internal


core.py should not have any logic that's specific to a provider like EC2. If someone added GCE support to Flintrock tomorrow, we'd want to be able to reuse all the logic in core.py mostly as-is.

nchammas · 2016-05-26T19:27:48Z

Thanks for taking this on @jorgito1167!

This looks like a good start. I think this PR can be made simpler and better by making vpc_is_private a @property of the EC2Cluster class. That will eliminate the need for many of the little changes that have been made to helper methods in ec2.py.

jperezdiaz · 2016-05-26T20:11:05Z

flintrock/ec2.py

-    flintrock_client_ip = (
-        urllib.request.urlopen('http://checkip.amazonaws.com/')
-        .read().decode('utf-8').strip())
+    if use_private_vpc:


If the host running Flintrock does not have a public ip, the past method for getting the ip address won't return the private ip. The current solution assumes that the user will always are launch a cluster into a private subnet from a host with no public ip. Is there a better way of handling this?

Good question. I think whether the Flintrock client has a private IP is distinct from whether the cluster is in a private VPC (though often the two will go together). So we need a way of determining what IP to use for the Flintrock client that doesn't depend on the cluster.

Perhaps we can simply query checkip first, and if that fails fallback to gethostbyname()?

The problem is that it does not fail. It simply returns a public IP as seen by the aws checkip server.

Ah, right, and that won't be the IP that the cluster sees when Flintrock tries to connect, if both the client and the cluster are together on a private subnet.

Perhaps then we should always authorize both addresses, public and private?

That might be dangerous since the IP that results from querying the checkip server is common to all of the computers under the NAT instance. I'm not sure if it will allow access from other computers.

That seems fairly innocuous to me, since those other machines would be under your account. Or are you saying machines from different AWS accounts may share the same public IP address?

I'm not sure to be honest. I would like to think that it doesn't happen but it would be great to come up with a solution that doesn't authorize both IPs. For the purposes of this PR we can just go with authorizing both. Later we can include an option to authorize only a specific security group or list or ip addresses that the user provides.

I've reviewed the relevant docs here:

http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Introduction.html

http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-nat.html

It sounds like your VPC has an internet gateway attached, otherwise the call to checkip would fail, or perhaps would return a private address.

Even with the gateway attached, from my reading of the docs, it sounds like only instances from the same VPC can ever get the same public IP address. So I don't think it's an issue to authorize both addresses.

Sounds good. I'll add a try except for handling the case when it fails.

nchammas · 2016-05-30T01:47:41Z

Hey @jperezdiaz (did you recently change your username?), I took a quick look through your latest changes and things generally look good.

There are still some open items, though:

core.py still includes some logic that is specific to EC2. We need to factor that out, or possibly do away with it entirely. Do you have any suggestions in that regard? Is instance.private_dns_name not giving us something usable "out-of-the-box"?
I just tried launching a cluster off of this PR using my current config with a public VPC and it failed. Well, the launch technically succeeded but Spark couldn't come up. It looks like the master had trouble binding to an address. Does this work for you?

jperezdiaz · 2016-05-30T03:28:36Z

Yes, I changed my username recently. Sorry for the confusion.

It seems like public VPC did use not have problems with the hostname. We can condition the /etc/hosts script on the subnet being private. I also like the idea of using getting the instance private DNS to replace the hardcoded .ec2.internal.
How can I test if the master successfully bound to an address? I'm pretty sure I'm having a similar problem even with the private VPC.

nchammas · 2016-05-30T15:08:19Z

How can I test if the master successfully bound to an address? I'm pretty sure I'm having a similar problem even with the private VPC.

When you launch a cluster and the master fails to start properly, the Spark health check should show 0 workers. Then, if you login to the cluster and start a shell (either spark-shell or pyspark), you'll also get an error. It should be pretty obvious.

jperezdiaz · 2016-05-31T18:14:13Z

It is non-trivial for me to test in a public vpc. However, it seems to launch fine in the private vpc. The health check reports 1 worker and I'm able to start pyspark. Could the problem be due to the change in the /etc/hosts file? In that case, we can just condition the change on the subnet being private.

On an unrelated note, the problem I see is that when I start pyspark I get the following message for the Spark UI:

16/05/31 17:55:17 INFO SparkUI: Started SparkUI at http://<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head>
  <title>404 - Not Found</title>
 </head>
 <body>
  <h1>404 - Not Found</h1>
 </body>
</html>:4040

nchammas · 2016-06-01T14:29:49Z

Could the problem be due to the change in the /etc/hosts file? In that case, we can just condition the change on the subnet being private.

I'm not sure, but I am suspicious of that change.

On an unrelated note, the problem I see is that when I start pyspark I get the following message for the Spark UI:

You're seeing this when you launch in a private VPC?

jperezdiaz · 2016-06-01T17:30:18Z

Is there a good way of figuring out the private DNS of the instances from inside the provision_node function? The instances attribute is specific to the EC2 cluster so it should not be used. I could get the slave_ips list, get the index of the current ip and then use it to index the private DNS list. Otherwise, I can create a method in for the FlintrockCluster object that does this for us.
Yes, the Spark UI problem happens when I launch in a private VPC.

nchammas · 2016-06-04T19:06:15Z

Is there a good way of figuring out the private DNS of the instances from inside the provision_node function?

I think provision_node() and the rest of the code in core.py should ideally just know about IP addresses, and not distinguish whether they are public or private.

Yes, the Spark UI problem happens when I launch in a private VPC.

Does Spark work otherwise?

I'm trying to replicate your setup so I can help you test this. Can you lay out the VPC, subnet, routing table, etc. you have and how they are configured so I can setup a parallel environment?

Ideally we should capture this setup as code and use it in an acceptance test (e.g. setup private VPC, test launch/Spark/HDFS, tear down cluster and VPC), but that's probably a bit much for now. We can add that in later, unless you feel like having a go at it now.

jperezdiaz · 2016-06-08T01:39:26Z

I like the idea of setting up the test. I'm really busy this week but I'll try to implement it once I get some time.

nchammas · 2016-08-30T16:19:46Z

Hey @jperezdiaz are you interested in updating this PR?

Looking through the history, it looks like we agreed on the basic approach you took here, but there were 2 unresolved issues:

EC2-specific logic was added to core.py. core.py should be completely provider-agnostic.
You were experiencing some issues with the launch and/or Spark UI.

If you aren't planning to update it anytime soon, we can close the PR for now and you can revisit it when you are ready.

nchammas · 2016-09-02T22:54:47Z

Closing this PR. Feel free to open a new one if you are interested in continuing this work!

Jorge Perez added 3 commits May 25, 2016 21:36

Deploy in private vpc

eaeba59

Figures out private ip of host

62f96ce

Remove print ip

8ac925d

Fix style issues. Passes test_static.py

f8a9958

nchammas reviewed May 26, 2016
View reviewed changes

jperezdiaz reviewed May 26, 2016
View reviewed changes

Jorge Perez added 3 commits May 27, 2016 02:20

Remove use_private_vpc from signature

bf58ffb

Authorize public and private ip

7144d79

Bug fix

e4d71dc

condition the /etc/hosts file change on subnet being private

07360df

nchammas mentioned this pull request Aug 30, 2016

Feature deploy in private vpc #108

Closed

nchammas closed this Sep 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for launching in private vpc #123

Add support for launching in private vpc #123

jperezdiaz commented May 26, 2016

BenFradet commented May 26, 2016 •

edited

Loading

nchammas May 26, 2016

nchammas commented May 26, 2016

jperezdiaz May 26, 2016

nchammas May 26, 2016

jperezdiaz May 26, 2016

nchammas May 26, 2016

jperezdiaz May 26, 2016

nchammas May 26, 2016

jperezdiaz May 27, 2016 •

edited

Loading

nchammas May 27, 2016

jperezdiaz May 27, 2016

nchammas commented May 30, 2016

jperezdiaz commented May 30, 2016 •

edited

Loading

nchammas commented May 30, 2016

jperezdiaz commented May 31, 2016

nchammas commented Jun 1, 2016

jperezdiaz commented Jun 1, 2016

nchammas commented Jun 4, 2016

jperezdiaz commented Jun 8, 2016

nchammas commented Aug 30, 2016

nchammas commented Sep 2, 2016

Add support for launching in private vpc #123

Add support for launching in private vpc #123

Conversation

jperezdiaz commented May 26, 2016

BenFradet commented May 26, 2016 • edited Loading

Choose a reason for hiding this comment

nchammas commented May 26, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jperezdiaz May 27, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nchammas commented May 30, 2016

jperezdiaz commented May 30, 2016 • edited Loading

nchammas commented May 30, 2016

jperezdiaz commented May 31, 2016

nchammas commented Jun 1, 2016

jperezdiaz commented Jun 1, 2016

nchammas commented Jun 4, 2016

jperezdiaz commented Jun 8, 2016

nchammas commented Aug 30, 2016

nchammas commented Sep 2, 2016

BenFradet commented May 26, 2016 •

edited

Loading

jperezdiaz May 27, 2016 •

edited

Loading

jperezdiaz commented May 30, 2016 •

edited

Loading