-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for launching in private vpc #123
Conversation
You can run |
command=""" | ||
set -e | ||
|
||
fullname=`hostname`.ec2.internal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
core.py
should not have any logic that's specific to a provider like EC2. If someone added GCE support to Flintrock tomorrow, we'd want to be able to reuse all the logic in core.py
mostly as-is.
Thanks for taking this on @jorgito1167! This looks like a good start. I think this PR can be made simpler and better by making |
flintrock_client_ip = ( | ||
urllib.request.urlopen('http://checkip.amazonaws.com/') | ||
.read().decode('utf-8').strip()) | ||
if use_private_vpc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the host running Flintrock does not have a public ip, the past method for getting the ip address won't return the private ip. The current solution assumes that the user will always are launch a cluster into a private subnet from a host with no public ip. Is there a better way of handling this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I think whether the Flintrock client has a private IP is distinct from whether the cluster is in a private VPC (though often the two will go together). So we need a way of determining what IP to use for the Flintrock client that doesn't depend on the cluster.
Perhaps we can simply query checkip
first, and if that fails fallback to gethostbyname()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that it does not fail. It simply returns a public IP as seen by the aws checkip server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, and that won't be the IP that the cluster sees when Flintrock tries to connect, if both the client and the cluster are together on a private subnet.
Perhaps then we should always authorize both addresses, public and private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be dangerous since the IP that results from querying the checkip server is common to all of the computers under the NAT instance. I'm not sure if it will allow access from other computers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems fairly innocuous to me, since those other machines would be under your account. Or are you saying machines from different AWS accounts may share the same public IP address?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure to be honest. I would like to think that it doesn't happen but it would be great to come up with a solution that doesn't authorize both IPs. For the purposes of this PR we can just go with authorizing both. Later we can include an option to authorize only a specific security group or list or ip addresses that the user provides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed the relevant docs here:
- http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Introduction.html
- http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-nat.html
It sounds like your VPC has an internet gateway attached, otherwise the call to checkip
would fail, or perhaps would return a private address.
Even with the gateway attached, from my reading of the docs, it sounds like only instances from the same VPC can ever get the same public IP address. So I don't think it's an issue to authorize both addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I'll add a try except
for handling the case when it fails.
Hey @jperezdiaz (did you recently change your username?), I took a quick look through your latest changes and things generally look good. There are still some open items, though:
|
Yes, I changed my username recently. Sorry for the confusion.
|
When you launch a cluster and the master fails to start properly, the Spark health check should show 0 workers. Then, if you login to the cluster and start a shell (either |
It is non-trivial for me to test in a public vpc. However, it seems to launch fine in the private vpc. The health check reports 1 worker and I'm able to start pyspark. Could the problem be due to the change in the On an unrelated note, the problem I see is that when I start pyspark I get the following message for the Spark UI:
|
I'm not sure, but I am suspicious of that change.
You're seeing this when you launch in a private VPC? |
|
I think
Does Spark work otherwise? I'm trying to replicate your setup so I can help you test this. Can you lay out the VPC, subnet, routing table, etc. you have and how they are configured so I can setup a parallel environment? Ideally we should capture this setup as code and use it in an acceptance test (e.g. setup private VPC, test launch/Spark/HDFS, tear down cluster and VPC), but that's probably a bit much for now. We can add that in later, unless you feel like having a go at it now. |
I like the idea of setting up the test. I'm really busy this week but I'll try to implement it once I get some time. |
Hey @jperezdiaz are you interested in updating this PR? Looking through the history, it looks like we agreed on the basic approach you took here, but there were 2 unresolved issues:
If you aren't planning to update it anytime soon, we can close the PR for now and you can revisit it when you are ready. |
Closing this PR. Feel free to open a new one if you are interested in continuing this work! |
This PR makes the following changes:
Fixes #14 .