-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Lambda] Fix missing private ip #4635
[Lambda] Fix missing private ip #4635
Conversation
fixes #4634 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bend-works for submitting the fix! Please find a quick comment below. : )
def _get_private_ip(cluster_name_on_cloud: str, | ||
instance_info: Dict[str, Any]) -> str: | ||
if len(_filter_instances(cluster_name_on_cloud, None)) == 1: | ||
return instance_info.get('private_ip', '127.0.0.1') | ||
return instance_info['private_ip'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we just use a argument called single_node
for the function, so we don't have to call the cloud API multiple times for initializing the cluster_info
for better performance? We can set single_node=(len(running_instances) > 1)
def _get_private_ip(cluster_name_on_cloud: str, | |
instance_info: Dict[str, Any]) -> str: | |
if len(_filter_instances(cluster_name_on_cloud, None)) == 1: | |
return instance_info.get('private_ip', '127.0.0.1') | |
return instance_info['private_ip'] | |
def _get_private_ip(instance_info: Dict[str, Any], single_node: bool) -> str: | |
private_ip = instance_info.get('private_ip') | |
if private_ip is None: | |
if single_node: | |
# The Lambda cloud API may return an instance info without | |
# private IP. It does not align with their docs, but we still | |
# allow single-node cluster to proceed with the provisioning, by using | |
# 127.0.0.1, as private IP is not critical for single-node case. | |
return '127.0.0.1' | |
msg = f'Failed to retrieve private IP for instance {instance_info}.' | |
logger.error(msg) | |
raise RuntimeError(msg) | |
return private_ip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it and it’s easier to unit test! I introduced the extra API call because I wasn’t sure about the assumption that all nodes of a cluster would be running at the same time but you know better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bend-works! This is great! Merging now.
Handle the setup failure when single-node A6000 VMs on Lambda Cloud do not return a private IP: in this case, cluster info now returns
127.0.0.1
.Tested:
sky launch -c private-ip-bug --cloud lambda --gpus A6000
is successful.