Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify scope of tensorflow/k8s #150

Closed
jlewi opened this issue Nov 15, 2017 · 9 comments
Closed

Clarify scope of tensorflow/k8s #150

jlewi opened this issue Nov 15, 2017 · 9 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Nov 15, 2017

We should update README.md to clarify the scope of tensorflow/k8s.

@jimexist
Copy link
Member

jimexist commented Nov 16, 2017

a list of questions in my mind that can be better answered if the scope is clearly defined:

  • does this repo cover (simple) support for more general data pipelining? (can be via Airflow jobs though)
  • does the UI in this project include metrics and statistics visualization across a trail of TensorFlow experiments
  • what are the assumed compatible storage systems (if they work with k8s) with this repo? apart from GCE's PD, e.g.
  • would there be a Go (and, apparently, maybe Node/Java/Python) client for the APIs defined? for the UI APIs?

@jlewi
Copy link
Contributor Author

jlewi commented Nov 16, 2017

I hope this repo will be part of a broader effort to create a loosely coupled stack of components covering the full ML lifecycle on K8s. @aronchick has suggested calling this broader effort KubeFlow.

This will be just one repo in that effort. The focus of this repo will be on TensorFlow on K8s. TensorFlow is itself a growing ecosystem of components aimed at the full life cycle of ML. So this repo will likely be focused on closing any gaps in making those components run well on K8s.

In most cases, I expect there will be more than one reasonable location to host particular code. As an example, if we wanted to add tooling to make TensorFlow Serving easy to spin up on K8s that could live in this repo or in tensorflow/serving

does this repo cover (simple) support for more general data pipelining? (can be via Airflow jobs though)

General data pipelining isn't in scope. However, tooling relevant to TensorFlow could live in this repo. For example, an Airflow operator to launch TfJobs could live in this repo (or Airflow contrib).

does the UI in this project include metrics and statistics visualization across a trail of TensorFlow experiments

I suspect the UI will ultimately live in its own repo. Its not clear which component (or components) will provide the functionality to compare across experiments. One option is for TensorBoard to add this functionality; tensorflow/tensorboard#92

what are the assumed compatible storage systems (if they work with k8s) with this repo? apart from GCE's PD, e.g.

The hope is to use K8s as an abstraction layer so any storage system that works with K8s can work with these components. The TfJob CRD doesn't make any assumption about the underlying storage system because it uses K8s storage layer (volumes) to hide the details.

Relying on K8s breaks down when K8s doesn't have an appropriate abstraction and you have to expose the details of the underlying cluster. An example is logging (#128). Right now K8s doesn't provide a logging API that fetches logs from durable storage (e.g. StackDriver). As a result, when using TfJob I don't think there's a cloud agnostic way to fetch logs after the job finishes. This isn't just an issue for TensorFlow. Any system running batch jobs (e.g. Spark, Airflow, etc...) has this problem. My hope is that K8s will evolve APIs to solve this.

would there be a Go (and, apparently, maybe Node/Java/Python) client for the APIs defined? for the UI APIs?

I'm open to client libraries in other languages if people think they are useful and are willing to contribute. Perhaps we can reuse K8s client generation code so we can auto-generate them.

/cc @aronchick @foxish @vishh

@aronchick
Copy link

@jimexist did that help? Happy to drop into mail and explain our overall efforts if you'd like

@bhack
Copy link

bhack commented Nov 17, 2017

The UI part is a little bit obscure. The upstream ticket seems quite stalled and the last two comments are unreplied.

@bhack
Copy link

bhack commented Nov 17, 2017

Also I think that an overview on this Google Gradient co-financied effort could be useful to outline the perimeter of this project.

@jlewi
Copy link
Contributor Author

jlewi commented Nov 20, 2017

Also I think that an overview on this Google Gradient co-financied effort could be useful to outline the perimeter of this project.

@bhack Algorithmia isn't currently involved although it would be great to work with them if they are interested.

The UI part is a little bit obscure. The upstream ticket seems quite stalled and the last two comments are unreplied.

Its my understanding that @wbuchwalter is working on a minimal UI just to make TensorFlow on K8s more accessible to folks who feel more comfortable with a UI. I think the long term direction for the UI is unclear. Feel free to chime in on the issue with your oppinions or on slack kubeflow.slack.com

@DjangoPeng
Copy link
Member

DjangoPeng commented Nov 23, 2017

@jlewi To make TensorFlow tasks well running on Kubernetes. I think we have to implement some other tools, such as monitor(real-time processing and monitoring the status of all TensorFlow tasks). An ease-to-use command line tool is sort of necessary.

@jlewi
Copy link
Contributor Author

jlewi commented Dec 1, 2017

@DjangoPeng I agree with you. Do you want to open up issues for those items?

@DjangoPeng
Copy link
Member

Sure thing. Maybe next Monday or Tuesday, I'll open an issue to explain and clarify our proposal. At the same time, I'll give a developing schedule including feature and due date.

@jlewi jlewi closed this as completed Jan 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants