You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there a way to check which VM corresponds to which rank for a given job in the sky cmd line tool? When running a job the logs will show up by the node rank; I will need to know its corresponding IP / ssh name if I need to ssh into it and debug.
It seems that sometimes it shows IP and sometimes it doesn’t, and these IPs are the private IPs rather than public ones.
One usability improvement is if there can be an easier way to map from the IP to the name I should use for ssh. Right now I will need to manually lookup whether this is head or one of the workers.
Difficulty of logging into node-0:
In general, node-0 may not refer to the head node, whether num_nodes is equal to or smaller than num cluster nodes. (cc @iojw to confirm)
Therefore the user cannot look at the output above and do ssh <myclus> to log into node-0.
Difficulty of logging into node-i where i > 0:
Output shows private IP. From the client machine, user doesn't know what to put in ssh <myclus>-worker<i> or ssh <node-i's public ip>.
The text was updated successfully, but these errors were encountered:
Yup, node-0 is based on the ip_list_str array we create in add_gang_scheduling_placement_group and may not refer to the head node when num_nodes is smaller than the number of nodes in a cluster. However, when num_nodes is equal to the number of nodes in a cluster, node-0 will always be the head node. Also, something to note is that for the logs where IP does not show, these belong to the head node.
User asked
Difficulty of logging into
node-0
:node-0
may not refer to the head node, whethernum_nodes
is equal to or smaller than num cluster nodes. (cc @iojw to confirm)ssh <myclus>
to log intonode-0
.Difficulty of logging into
node-i
where i > 0:ssh <myclus>-worker<i>
orssh <node-i's public ip>
.The text was updated successfully, but these errors were encountered: