Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a cluster detailed status method #140

Closed
wants to merge 4 commits into from

Conversation

guillaumeeb
Copy link
Member

Closes #11.

Main concern here: is this really useful in addition to cluster.repr or client.repr.
Personnaly, I like to have kind of detailed status, but find it only useful in some specific cases.

Any other feedback quite welcome.

@guillaumeeb
Copy link
Member Author

Here is how it looks:

image

@mrocklin
Copy link
Member

I like the idea here. Some thoughts:

  1. Is this something that we want to coordinate with other cluster managers (dask-yarn, dask-kubernetes). If so, how? cc @jcrist @jacobtomlinson
  2. If yes, then what kind of data would we standardize around that is common to all three. Should we also provide this data in a structured way?
  3. Would this be better as a dashboard plot or ipywidget that updated in real time?

@jhamman
Copy link
Member

jhamman commented Aug 28, 2018

Is this something that we want to coordinate with other cluster managers (dask-yarn, dask-kubernetes). If so, how? cc @jcrist @jacobtomlinson

I'd argue that the running/pending/finished job/worker classification is going to be widely applicable (even for a LocalCluster). That said, I don't know enough about the other systems to generalize these classifications.

Also, +1 on adding something like this to the dashboard.

@jacobtomlinson
Copy link
Member

These would definitely be useful in dask-kubernetes. Running and Pending would be easy to implement, completed workers get cleaned up automatically so that doesn't really work for that.

@jhamman
Copy link
Member

jhamman commented Nov 3, 2018

Is this worth reviving? I still think this would be a worthwhile addition.

@guillaumeeb
Copy link
Member Author

I guess I didn't know what to do with @mrocklin comments... I feel this is applicable to other cluster managers, but I don't know how to standardize it, so that it can be easily migrated upstream and then in other projects. How to represent job vs pod in dask-kubernetes vs only processes in LocalCluster...

And probably was waiting for some more discussion on the form it should take.

@mrocklin
Copy link
Member

mrocklin commented Nov 3, 2018

I'm happy to not standardize things for a while. Please don't consider my comments as blockers on any progress here.

@guillaumeeb
Copy link
Member Author

So what do we want here to do here:

  • I'm not sure the simple html representation is really useful, is it?
  • Do anyone has a proposal on the format? Dashboard (I don't know how to do dashboard addition, and this may have to go upstream?), Widget, simple HTML, some table like Dataframes?
  • What can we use to make the job/pod/other notion generic? Use some key if it exist, like our worker_key that adaptive uses? Anything else?

I guess I need more precise feedback 🙂.

@mrocklin
Copy link
Member

mrocklin commented Nov 3, 2018

My first thought was to suggest that the cluster object have a cluster.status() method or something similar return a standarized data structure that could be placed into a table. A dashboard page could then be built that called that function periodically.

However, then I thought about how we might separate the cluster manager from the scheduler, at which point we'll no longer have access to the dashboard (except by explicitly passing messages). It might be that this information might also be expressible through something like a JupyterLab extension, similar to what @ian-r-rose is building at dask/dask-labextension#31

@lesteve lesteve force-pushed the master branch 2 times, most recently from 4d181fe to 26a0e70 Compare December 8, 2020 12:40
Base automatically changed from master to main February 10, 2021 07:12
@jacobtomlinson
Copy link
Member

It's been a long time since this PR had any activity so I'm going to close it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement a cluster status method, to know if workers are really running
4 participants