Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug UX: easily know SSH name or public IP from log outputs #1369

Closed
concretevitamin opened this issue Nov 3, 2022 · 1 comment · Fixed by #1380
Closed

Debug UX: easily know SSH name or public IP from log outputs #1369

concretevitamin opened this issue Nov 3, 2022 · 1 comment · Fixed by #1380

Comments

@concretevitamin
Copy link
Member

concretevitamin commented Nov 3, 2022

Screen Shot 2022-11-02 at 11 27 21 PM

User asked

Is there a way to check which VM corresponds to which rank for a given job in the sky cmd line tool? When running a job the logs will show up by the node rank; I will need to know its corresponding IP / ssh name if I need to ssh into it and debug.
It seems that sometimes it shows IP and sometimes it doesn’t, and these IPs are the private IPs rather than public ones.

One usability improvement is if there can be an easier way to map from the IP to the name I should use for ssh. Right now I will need to manually lookup whether this is head or one of the workers.


Difficulty of logging into node-0:

  • In general, node-0 may not refer to the head node, whether num_nodes is equal to or smaller than num cluster nodes. (cc @iojw to confirm)
  • Therefore the user cannot look at the output above and do ssh <myclus> to log into node-0.

Difficulty of logging into node-i where i > 0:

  • Output shows private IP. From the client machine, user doesn't know what to put in ssh <myclus>-worker<i> or ssh <node-i's public ip>.
@iojw
Copy link
Collaborator

iojw commented Nov 3, 2022

Yup, node-0 is based on the ip_list_str array we create in add_gang_scheduling_placement_group and may not refer to the head node when num_nodes is smaller than the number of nodes in a cluster. However, when num_nodes is equal to the number of nodes in a cluster, node-0 will always be the head node. Also, something to note is that for the logs where IP does not show, these belong to the head node.

@iojw iojw linked a pull request Nov 7, 2022 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants