Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cost Report] Add some UI fixes #1788

Merged

Conversation

sumanthgenz
Copy link
Collaborator

Limit lines in each table to 5 lines (this number is up for debate).
Display total cost across all reports in cost-report (even costs that are not displayed due to the above limit).

(sky) sumanth@MacBook-Pro-5 skypilot % sky cost-report   
Clusters
NAME     LAUNCHED      DURATION  RESOURCES                          STATUS      HOURLY_PRICE  COST (est.)  
cluster  1 month ago   6m 19s    1x GCP(n1-highmem-8, {'V100': 1})  TERMINATED  $ 2.953       $0.311       
cluster  1 month ago   22m 31s   1x GCP(n1-highmem-8, {'V100': 1})  TERMINATED  $ 2.953       $1.108       
cluster  2 months ago  3m 41s    1x GCP(n1-highmem-8)               TERMINATED  $ 0.473       $0.029       
cluster  2 months ago  10m 13s   1x GCP(n1-highmem-8)               TERMINATED  $ 0.473       $0.081       

Managed spot controller (will be autostopped if idle for 10min)
NAME                          LAUNCHED     DURATION  RESOURCES                          STATUS      HOURLY_PRICE  COST (est.)  
sky-spot-controller-d16122f1  1 month ago  10m 25s   1x AWS(m6i.2xlarge, disk_size=50)  TERMINATED  $ 0.384       $0.067       

TOTAL COST: $189.105
NOTE: This feature is experimental. Costs for clusters with auto{stop,down} scheduled may not be accurate.

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting the PR @sumanthgenz! Left several comments.

sky/cli.py Outdated Show resolved Hide resolved
sky/cli.py Outdated Show resolved Hide resolved
Comment on lines 142 to 179
num_lines_to_display = _NUM_COST_REPORT_LINES
if show_all:
num_lines_to_display = len(cluster_records)

for record in cluster_records[:num_lines_to_display]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost records seems to be in the order of launch time, but it might be better to place the clusters that have not been terminated yet at the beginning.

sky/utils/cli_utils/status_utils.py Outdated Show resolved Hide resolved
sky/cli.py Show resolved Hide resolved
@sumanthgenz
Copy link
Collaborator Author

(sky) sumanth@MacBook-Pro-5 skypilot % sky cost-report                    
Clusters
NAME     LAUNCHED      DURATION  RESOURCES                          STATUS      HOURLY_PRICE  COST (est.)  
cluster  1 month ago   6m 19s    1x GCP(n1-highmem-8, {'V100': 1})  TERMINATED  $ 2.953       $0.311       
cluster  1 month ago   22m 31s   1x GCP(n1-highmem-8, {'V100': 1})  TERMINATED  $ 2.953       $1.108       
cluster  3 months ago  3m 41s    1x GCP(n1-highmem-8)               TERMINATED  $ 0.473       $0.029       
cluster  3 months ago  10m 13s   1x GCP(n1-highmem-8)               TERMINATED  $ 0.473       $0.081       
min      3 months ago  6m 25s    1x AWS(m6i.2xlarge)                TERMINATED  $ 0.384       $0.041       
min      3 months ago  24m 40s   1x AWS(m6i.2xlarge)                TERMINATED  $ 0.384       $0.158       
mount    3 months ago  7m 26s    1x GCP(n1-highmem-8)               TERMINATED  $ 0.473       $0.059       
mount2   3 months ago  12m 28s   1x GCP(n1-highmem-8)               TERMINATED  $ 0.473       $0.098       
mount    3 months ago  16m 26s   1x GCP(n1-highmem-8)               TERMINATED  $ 0.473       $0.130       

Managed spot controller (will be autostopped if idle for 10min)
NAME                          LAUNCHED     DURATION  RESOURCES                          STATUS      HOURLY_PRICE  COST (est.)  
sky-spot-controller-d16122f1  1 month ago  10m 25s   1x AWS(m6i.2xlarge, disk_size=50)  TERMINATED  $ 0.384       $0.067       

Total Cost: $219.710
NOTE: Since --all is not set, not all cost report records may be displayed above.
NOTE: This feature is experimental. Costs for clusters with auto{stop,down} scheduled may not be accurate.

@concretevitamin concretevitamin added this to the 0.3 milestone Mar 22, 2023
Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sumanthgenz. This is great.

UX comments:

Clusters
NAME             LAUNCHED    DURATION           RESOURCES                             STATUS      HOURLY_PRICE  COST (est.)
...

Managed spot controller (will be autostopped if idle for 10min)
NAME                          LAUNCHED      DURATION            RESOURCES                           STATUS      HOURLY_PRICE  COST (est.)
sky-spot-controller-8a3968f2  2 months ago  4 days 12h 41m 44s  1x GCP(n1-highmem-8, disk_size=50)  TERMINATED  $ 0.473       $51.436

Total Cost: $16524.966
NOTE: Since --all is not set, not all cost report records may be displayed above.
NOTE: This feature is experimental. Costs for clusters with auto{stop,down} scheduled may not be accurate.
  1. Can we make Total Cost sum up what's being displayed, rather than everything in history? In the table above my COST (est.) column adds up to <$100, so it took me a while to realize why Total Cost displays a 5-figure number (which I'm not sure is accurate).

  2. Consider removing NOTE: since we've yellow'd the font.

  3. Consider Since --all is not set, not all cost report records may be displayed above. -> Showing the N most recent clusters. To see all clusters in history, pass the --all flag.

sky/cli.py Outdated Show resolved Hide resolved
sky/utils/cli_utils/status_utils.py Outdated Show resolved Hide resolved
sky/utils/cli_utils/status_utils.py Outdated Show resolved Hide resolved
Comment on lines +339 to +371
if status is None:
return -1
return 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we sorting a bunch of -1's and 1's? From the output, seems like we are sorting by launch-time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wanted to bubble up clusters that were active and un-terminated to the top since they are actively incurring cost. In addition, we also sort by launch time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this comment as comment before L172?

@sumanthgenz
Copy link
Collaborator Author

(sky) sumanth@MacBook-Pro-5 skypilot % sky cost-report
Clusters
NAME     LAUNCHED      DURATION  RESOURCES                          STATUS      HOURLY_PRICE  COST (est.)  
cluster  2 months ago  6m 19s    1x GCP(n1-highmem-8, {'V100': 1})  TERMINATED  $ 2.953       $0.311       
cluster  2 months ago  22m 31s   1x GCP(n1-highmem-8, {'V100': 1})  TERMINATED  $ 2.953       $1.108       
cluster  3 months ago  3m 41s    1x GCP(n1-highmem-8)               TERMINATED  $ 0.473       $0.029       
cluster  3 months ago  10m 13s   1x GCP(n1-highmem-8)               TERMINATED  $ 0.473       $0.081       
min      4 months ago  6m 25s    1x AWS(m6i.2xlarge)                TERMINATED  $ 0.384       $0.041       
min      4 months ago  24m 40s   1x AWS(m6i.2xlarge)                TERMINATED  $ 0.384       $0.158       
mount    4 months ago  7m 26s    1x GCP(n1-highmem-8)               TERMINATED  $ 0.473       $0.059       
mount2   4 months ago  12m 28s   1x GCP(n1-highmem-8)               TERMINATED  $ 0.473       $0.098       
mount    4 months ago  16m 26s   1x GCP(n1-highmem-8)               TERMINATED  $ 0.473       $0.130       

Managed spot controller (will be autostopped if idle for 10min)
NAME                          LAUNCHED      DURATION  RESOURCES                          STATUS      HOURLY_PRICE  COST (est.)  
sky-spot-controller-d16122f1  2 months ago  10m 25s   1x AWS(m6i.2xlarge, disk_size=50)  TERMINATED  $ 0.384       $0.067       

Total Cost: $2.08
Showing the N most recent clusters. To see all clusters in history, pass the --all flag.
This feature is experimental. Costs for clusters with auto{stop,down} scheduled may not be accurate.

@concretevitamin
Copy link
Member

After switching to this branch, now seeing

» sky cost-report
...
  File "/Users/zongheng/Dropbox/workspace/riselab/sky-computing/sky/core.py", line 143, in cost_report
    cluster_reports = global_user_state.get_clusters_from_history()
  File "/Users/zongheng/Dropbox/workspace/riselab/sky-computing/sky/global_user_state.py", line 608, in get_clusters_from_history
    'duration': _get_cluster_duration(cluster_hash),
  File "/Users/zongheng/Dropbox/workspace/riselab/sky-computing/sky/global_user_state.py", line 449, in _get_cluster_duration
    start_time, end_time = int(start_time), int(end_time)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

Expected?

@sumanthgenz
Copy link
Collaborator Author

sumanthgenz commented Apr 6, 2023

Hi @concretevitamin, thanks for showing this. @Michaelvll was seeing this issue too, but I myself have not been encountering it. In another branch I've added a fix where it skips entries where start_time is None (which it should never be given the way the durations are tracked, so I am not sure where the bug originates from). I can add that fix here too.

Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sumanthgenz. It works now. A few comments.

Also:

» sky cost-report
Clusters
NAME                  LAUNCHED     DURATION                   RESOURCES                                 STATUS      HOURLY_PRICE  COST (est.)
llama-ckpts           3 weeks ago  2 weeks 2 days 12h 54m...  1x GCP(n2-standard-4, disk_size=512)      STOPPED     $ 0.194       $77.094
dbg                   1 day ago    19m 25s                    1x Lambda(gpu_1x_a100_sxm4, {'A100': 1})  TERMINATED  $ 1.100       $0.356
...

Managed spot controller (will be autostopped if idle for 10min)
NAME                          LAUNCHED      DURATION            RESOURCES                           STATUS      HOURLY_PRICE  COST (est.)
sky-spot-controller-8a3968f2  2 months ago  4 days 12h 41m 44s  1x GCP(n1-highmem-8, disk_size=50)  TERMINATED  $ 0.473       $51.436

Total Cost: $53.12
...

Two questions:

  1. Why does total cost show $53.12, while the COST (est.) add up to something bigger?
  2. For the first cluster, llama-ckpts, the duration 2 weeks 2 days 12h 54m... seems too long (I don't remember its uptime being this long). Is it a known issue for clusters that might have utilized autostop?

sky/utils/cli_utils/status_utils.py Outdated Show resolved Hide resolved
sky/cli.py Outdated Show resolved Hide resolved
Comment on lines +339 to +371
if status is None:
return -1
return 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this comment as comment before L172?

sky/cli.py Outdated Show resolved Hide resolved
@sumanthgenz
Copy link
Collaborator Author

@concretevitamin

  1. I tried sorting by cluster status as well in get_total_cost_of_displayed_records, perhaps this should fix the costs to align with the displayed records.
  2. Yes, in another PR I nullify cost for autostop clusters since their durations and associated costs.

@concretevitamin
Copy link
Member

Thanks @sumanthgenz.

  1. Now sky cost-report's total is right. Passing -a flag however gives the same total, but it should be higher since there are many more terminated clusters?

  2. My status table has a stopped controller

sky-spot-controller-8a3968f2  10 hrs ago  1x GCP(n2-standard-4, disk_size=50)   STOPPED  10m       sky spot launch -dy --cloud...

but cost-report doesn't show it and it shows a terminated one. Is this a known issue?

sky-spot-controller-8a3968f2  2 months ago  4 days 12h 41m 44s  1x GCP(n1-highmem-8, disk_size=50)  TERMINATED  $ 0.473       $51.436

@sumanthgenz
Copy link
Collaborator Author

@concretevitamin

  1. Fixed
  2. In [Spot] [Cost Report] Add CLI support for getting VM cost for spot jobs #1669, this will be fully fixed since there is logic to aggregate record entries there. In this current PR, for reserved_clusters, we know display the most recent entry for each reserved_cluster.

Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @sumanthgenz.

sky/cli.py Outdated Show resolved Hide resolved
sky/cli.py Show resolved Hide resolved
@sumanthgenz sumanthgenz merged commit 2380776 into skypilot-org:master Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants